All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
@ 2022-06-24 17:36 James Houghton
  2022-06-24 17:36 ` [RFC PATCH 01/26] hugetlb: make hstate accessor functions const James Houghton
                   ` (28 more replies)
  0 siblings, 29 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This RFC introduces the concept of HugeTLB high-granularity mapping
(HGM)[1].  In broad terms, this series teaches HugeTLB how to map HugeTLB
pages at different granularities, and, more importantly, to partially map
a HugeTLB page.  This cover letter will go over
 - the motivation for these changes
 - userspace API
 - some of the changes to HugeTLB to make this work
 - limitations & future enhancements

High-granularity mapping does *not* involve dissolving the hugepages
themselves; it only affects how they are mapped.

---- Motivation ----

Being able to map HugeTLB memory with PAGE_SIZE PTEs has important use
cases in post-copy live migration and memory failure handling.

- Live Migration (userfaultfd)
For post-copy live migration, using userfaultfd, currently we have to
install an entire hugepage before we can allow a guest to access that page.
This is because, right now, either the WHOLE hugepage is mapped or NONE of
it is.  So either the guest can access the WHOLE hugepage or NONE of it.
This makes post-copy live migration for 1G HugeTLB-backed VMs completely
infeasible.

With high-granularity mapping, we can map PAGE_SIZE pieces of a hugepage,
thereby allowing the guest to access only PAGE_SIZE chunks, and getting
page faults on the rest (and triggering another demand-fetch). This gives
userspace the flexibility to install PAGE_SIZE chunks of memory into a
hugepage, making migration of 1G-backed VMs perfectly feasible, and it
vastly reduces the vCPU stall time during post-copy for 2M-backed VMs.

At Google, for a 48 vCPU VM in post-copy, we can expect these approximate
per-page median fetch latencies:
     4K: <100us
     2M: >10ms
Being able to unpause a vCPU 100x quicker is helpful for guest stability,
and being able to use 1G pages at all can significant improve steady-state
guest performance.

After fully copying a hugepage over the network, we will want to collapse
the mapping down to what it would normally be (e.g., one PUD for a 1G
page). Rather than having the kernel do this automatically, we leave it up
to userspace to tell us to collapse a range (via MADV_COLLAPSE, co-opting
the API that is being introduced for THPs[2]).

- Memory Failure
When a memory error is found within a HugeTLB page, it would be ideal if we
could unmap only the PAGE_SIZE section that contained the error. This is
what THPs are able to do. Using high-granularity mapping, we could do this,
but this isn't tackled in this patch series.

---- Userspace API ----

This patch series introduces a single way to take advantage of
high-granularity mapping: via UFFDIO_CONTINUE. UFFDIO_CONTINUE allows
userspace to resolve MINOR page faults on shared VMAs.

To collapse a HugeTLB address range that has been mapped with several
UFFDIO_CONTINUE operations, userspace can issue MADV_COLLAPSE. We expect
userspace to know when all pages (that they care about) have been fetched.

---- HugeTLB Changes ----

- Mapcount
The way mapcount is handled is different from the way that it was handled
before. If the PUD for a hugepage is not none, a hugepage's mapcount will
be increased. This scheme means that, for hugepages that aren't mapped at
high granularity, their mapcounts will remain the same as what they would
have been pre-HGM.

- Page table walking and manipulation
A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
high-granularity mappings. Eventually, it's possible to merge
hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.

We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
This is because we generally need to know the "size" of a PTE (previously
always just huge_page_size(hstate)).

For every page table manipulation function that has a huge version (e.g.
huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
PTE really is "huge".

- Synchronization
For existing bits of HugeTLB, synchronization is unchanged. For splitting
and collapsing HugeTLB PTEs, we require that the i_mmap_rw_sem is held for
writing, and for doing high-granularity page table walks, we require it to
be held for reading.

---- Limitations & Future Changes ----

This patch series only implements high-granularity mapping for VM_SHARED
VMAs.  I intend to implement enough HGM to support 4K unmapping for memory
failure recovery for both shared and private mappings.

The memory failure use case poses its own challenges that can be
addressed, but I will do so in a separate RFC.

Performance has not been heavily scrutinized with this patch series. There
are places where lock contention can significantly reduce performance. This
will be addressed later.

The patch series, as it stands right now, is compatible with the VMEMMAP
page struct optimization[3], as we do not need to modify data contained
in the subpage page structs.

Other omissions:
 - Compatibility with userfaultfd write-protect (will be included in v1).
 - Support for mremap() (will be included in v1). This looks a lot like
   the support we have for fork().
 - Documentation changes (will be included in v1).
 - Completely ignores PMD sharing and hugepage migration (will be included
   in v1).
 - Implementations for architectures that don't use GENERAL_HUGETLB other
   than arm64.

---- Patch Breakdown ----

Patch 1     - Preliminary changes
Patch 2-10  - HugeTLB HGM core changes
Patch 11-13 - HugeTLB HGM page table walking functionality
Patch 14-19 - HugeTLB HGM compatibility with other bits
Patch 20-23 - Userfaultfd and collapse changes
Patch 24-26 - arm64 support and selftests

[1] This used to be called HugeTLB double mapping, a bad and confusing
    name. "High-granularity mapping" is not a great name either. I am open
    to better names.
[2] https://lore.kernel.org/linux-mm/20220604004004.954674-10-zokeefe@google.com/
[3] commit f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")

James Houghton (26):
  hugetlb: make hstate accessor functions const
  hugetlb: sort hstates in hugetlb_init_hstates
  hugetlb: add make_huge_pte_with_shift
  hugetlb: make huge_pte_lockptr take an explicit shift argument.
  hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
  mm: make free_p?d_range functions public
  hugetlb: add hugetlb_pte to track HugeTLB page table entries
  hugetlb: add hugetlb_free_range to free PT structures
  hugetlb: add hugetlb_hgm_enabled
  hugetlb: add for_each_hgm_shift
  hugetlb: add hugetlb_walk_to to do PT walks
  hugetlb: add HugeTLB splitting functionality
  hugetlb: add huge_pte_alloc_high_granularity
  hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page
  hugetlb: make unmapping compatible with high-granularity mappings
  hugetlb: make hugetlb_change_protection compatible with HGM
  hugetlb: update follow_hugetlb_page to support HGM
  hugetlb: use struct hugetlb_pte for walk_hugetlb_range
  hugetlb: add HGM support for copy_hugetlb_page_range
  hugetlb: add support for high-granularity UFFDIO_CONTINUE
  hugetlb: add hugetlb_collapse
  madvise: add uapi for HugeTLB HGM collapse: MADV_COLLAPSE
  userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM
  arm64/hugetlb: add support for high-granularity mappings
  selftests: add HugeTLB HGM to userfaultfd selftest
  selftests: add HugeTLB HGM to KVM demand paging selftest

 arch/arm64/Kconfig                            |   1 +
 arch/arm64/mm/hugetlbpage.c                   |  63 ++
 arch/powerpc/mm/pgtable.c                     |   3 +-
 arch/s390/mm/gmap.c                           |   8 +-
 fs/Kconfig                                    |   7 +
 fs/proc/task_mmu.c                            |  35 +-
 fs/userfaultfd.c                              |  10 +-
 include/asm-generic/tlb.h                     |   6 +-
 include/linux/hugetlb.h                       | 177 +++-
 include/linux/mm.h                            |   7 +
 include/linux/pagewalk.h                      |   3 +-
 include/uapi/asm-generic/mman-common.h        |   2 +
 include/uapi/linux/userfaultfd.h              |   2 +
 mm/damon/vaddr.c                              |  34 +-
 mm/hmm.c                                      |   7 +-
 mm/hugetlb.c                                  | 987 +++++++++++++++---
 mm/madvise.c                                  |  23 +
 mm/memory.c                                   |   8 +-
 mm/mempolicy.c                                |  11 +-
 mm/migrate.c                                  |   3 +-
 mm/mincore.c                                  |   4 +-
 mm/mprotect.c                                 |   6 +-
 mm/page_vma_mapped.c                          |   3 +-
 mm/pagewalk.c                                 |  18 +-
 mm/userfaultfd.c                              |  57 +-
 .../testing/selftests/kvm/include/test_util.h |   2 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |   2 +-
 tools/testing/selftests/kvm/lib/test_util.c   |  14 +
 tools/testing/selftests/vm/userfaultfd.c      |  61 +-
 29 files changed, 1314 insertions(+), 250 deletions(-)

-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [RFC PATCH 01/26] hugetlb: make hstate accessor functions const
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 18:43   ` Mina Almasry
                     ` (2 more replies)
  2022-06-24 17:36 ` [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
                   ` (27 subsequent siblings)
  28 siblings, 3 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This is just a const-correctness change so that the new hugetlb_pte
changes can be const-correct too.

Acked-by: David Rientjes <rientjes@google.com>

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/hugetlb.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e4cff27d1198..498a4ae3d462 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -715,7 +715,7 @@ static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
 	return hstate_file(vma->vm_file);
 }
 
-static inline unsigned long huge_page_size(struct hstate *h)
+static inline unsigned long huge_page_size(const struct hstate *h)
 {
 	return (unsigned long)PAGE_SIZE << h->order;
 }
@@ -729,27 +729,27 @@ static inline unsigned long huge_page_mask(struct hstate *h)
 	return h->mask;
 }
 
-static inline unsigned int huge_page_order(struct hstate *h)
+static inline unsigned int huge_page_order(const struct hstate *h)
 {
 	return h->order;
 }
 
-static inline unsigned huge_page_shift(struct hstate *h)
+static inline unsigned huge_page_shift(const struct hstate *h)
 {
 	return h->order + PAGE_SHIFT;
 }
 
-static inline bool hstate_is_gigantic(struct hstate *h)
+static inline bool hstate_is_gigantic(const struct hstate *h)
 {
 	return huge_page_order(h) >= MAX_ORDER;
 }
 
-static inline unsigned int pages_per_huge_page(struct hstate *h)
+static inline unsigned int pages_per_huge_page(const struct hstate *h)
 {
 	return 1 << h->order;
 }
 
-static inline unsigned int blocks_per_huge_page(struct hstate *h)
+static inline unsigned int blocks_per_huge_page(const struct hstate *h)
 {
 	return huge_page_size(h) / 512;
 }
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
  2022-06-24 17:36 ` [RFC PATCH 01/26] hugetlb: make hstate accessor functions const James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 18:51   ` Mina Almasry
                     ` (2 more replies)
  2022-06-24 17:36 ` [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift James Houghton
                   ` (26 subsequent siblings)
  28 siblings, 3 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

When using HugeTLB high-granularity mapping, we need to go through the
supported hugepage sizes in decreasing order so that we pick the largest
size that works. Consider the case where we're faulting in a 1G hugepage
for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
a PUD. By going through the sizes in decreasing order, we will find that
PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a57e1be41401..5df838d86f32 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -33,6 +33,7 @@
 #include <linux/migrate.h>
 #include <linux/nospec.h>
 #include <linux/delayacct.h>
+#include <linux/sort.h>
 
 #include <asm/page.h>
 #include <asm/pgalloc.h>
@@ -48,6 +49,10 @@
 
 int hugetlb_max_hstate __read_mostly;
 unsigned int default_hstate_idx;
+/*
+ * After hugetlb_init_hstates is called, hstates will be sorted from largest
+ * to smallest.
+ */
 struct hstate hstates[HUGE_MAX_HSTATE];
 
 #ifdef CONFIG_CMA
@@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 	kfree(node_alloc_noretry);
 }
 
+static int compare_hstates_decreasing(const void *a, const void *b)
+{
+	const int shift_a = huge_page_shift((const struct hstate *)a);
+	const int shift_b = huge_page_shift((const struct hstate *)b);
+
+	if (shift_a < shift_b)
+		return 1;
+	if (shift_a > shift_b)
+		return -1;
+	return 0;
+}
+
+static void sort_hstates(void)
+{
+	unsigned long default_hstate_sz = huge_page_size(&default_hstate);
+
+	/* Sort from largest to smallest. */
+	sort(hstates, hugetlb_max_hstate, sizeof(*hstates),
+	     compare_hstates_decreasing, NULL);
+
+	/*
+	 * We may have changed the location of the default hstate, so we need to
+	 * update it.
+	 */
+	default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz));
+}
+
 static void __init hugetlb_init_hstates(void)
 {
 	struct hstate *h, *h2;
 
-	for_each_hstate(h) {
-		if (minimum_order > huge_page_order(h))
-			minimum_order = huge_page_order(h);
+	sort_hstates();
 
+	/* The last hstate is now the smallest. */
+	minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]);
+
+	for_each_hstate(h) {
 		/* oversize hugepages were init'ed in early boot */
 		if (!hstate_is_gigantic(h))
 			hugetlb_hstate_alloc_pages(h);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
  2022-06-24 17:36 ` [RFC PATCH 01/26] hugetlb: make hstate accessor functions const James Houghton
  2022-06-24 17:36 ` [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 19:01   ` Mina Almasry
  2022-06-27 12:13   ` manish.mishra
  2022-06-24 17:36 ` [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
                   ` (25 subsequent siblings)
  28 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This allows us to make huge PTEs at shifts other than the hstate shift,
which will be necessary for high-granularity mappings.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 mm/hugetlb.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5df838d86f32..0eec34edf3b2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4686,23 +4686,30 @@ const struct vm_operations_struct hugetlb_vm_ops = {
 	.pagesize = hugetlb_vm_op_pagesize,
 };
 
+static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma,
+				      struct page *page, int writable,
+				      int shift)
+{
+	bool huge = shift > PAGE_SHIFT;
+	pte_t entry = huge ? mk_huge_pte(page, vma->vm_page_prot)
+			   : mk_pte(page, vma->vm_page_prot);
+
+	if (writable)
+		entry = huge ? huge_pte_mkwrite(entry) : pte_mkwrite(entry);
+	else
+		entry = huge ? huge_pte_wrprotect(entry) : pte_wrprotect(entry);
+	pte_mkyoung(entry);
+	if (huge)
+		entry = arch_make_huge_pte(entry, shift, vma->vm_flags);
+	return entry;
+}
+
 static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
-				int writable)
+			   int writable)
 {
-	pte_t entry;
 	unsigned int shift = huge_page_shift(hstate_vma(vma));
 
-	if (writable) {
-		entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page,
-					 vma->vm_page_prot)));
-	} else {
-		entry = huge_pte_wrprotect(mk_huge_pte(page,
-					   vma->vm_page_prot));
-	}
-	entry = pte_mkyoung(entry);
-	entry = arch_make_huge_pte(entry, shift, vma->vm_flags);
-
-	return entry;
+	return make_huge_pte_with_shift(vma, page, writable, shift);
 }
 
 static void set_huge_ptep_writable(struct vm_area_struct *vma,
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (2 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-27 12:26   ` manish.mishra
  2022-06-27 20:51   ` Mike Kravetz
  2022-06-24 17:36 ` [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
                   ` (24 subsequent siblings)
  28 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This is needed to handle PTL locking with high-granularity mapping. We
won't always be using the PMD-level PTL even if we're using the 2M
hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
case, we need to lock the PTL for the 4K PTE.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 arch/powerpc/mm/pgtable.c |  3 ++-
 include/linux/hugetlb.h   | 19 ++++++++++++++-----
 mm/hugetlb.c              |  9 +++++----
 mm/migrate.c              |  3 ++-
 mm/page_vma_mapped.c      |  3 ++-
 5 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index e6166b71d36d..663d591a8f08 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -261,7 +261,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 
 		psize = hstate_get_psize(h);
 #ifdef CONFIG_DEBUG_VM
-		assert_spin_locked(huge_pte_lockptr(h, vma->vm_mm, ptep));
+		assert_spin_locked(huge_pte_lockptr(huge_page_shift(h),
+						    vma->vm_mm, ptep));
 #endif
 
 #else
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 498a4ae3d462..5fe1db46d8c9 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -868,12 +868,11 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
 	return modified_mask;
 }
 
-static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
+static inline spinlock_t *huge_pte_lockptr(unsigned int shift,
 					   struct mm_struct *mm, pte_t *pte)
 {
-	if (huge_page_size(h) == PMD_SIZE)
+	if (shift == PMD_SHIFT)
 		return pmd_lockptr(mm, (pmd_t *) pte);
-	VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
 	return &mm->page_table_lock;
 }
 
@@ -1076,7 +1075,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
 	return 0;
 }
 
-static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
+static inline spinlock_t *huge_pte_lockptr(unsigned int shift,
 					   struct mm_struct *mm, pte_t *pte)
 {
 	return &mm->page_table_lock;
@@ -1116,7 +1115,17 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
 {
 	spinlock_t *ptl;
 
-	ptl = huge_pte_lockptr(h, mm, pte);
+	ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte);
+	spin_lock(ptl);
+	return ptl;
+}
+
+static inline spinlock_t *huge_pte_lock_shift(unsigned int shift,
+					      struct mm_struct *mm, pte_t *pte)
+{
+	spinlock_t *ptl;
+
+	ptl = huge_pte_lockptr(shift, mm, pte);
 	spin_lock(ptl);
 	return ptl;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0eec34edf3b2..d6d0d4c03def 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4817,7 +4817,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			continue;
 
 		dst_ptl = huge_pte_lock(h, dst, dst_pte);
-		src_ptl = huge_pte_lockptr(h, src, src_pte);
+		src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte);
 		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
 		entry = huge_ptep_get(src_pte);
 		dst_entry = huge_ptep_get(dst_pte);
@@ -4894,7 +4894,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 
 				/* Install the new huge page if src pte stable */
 				dst_ptl = huge_pte_lock(h, dst, dst_pte);
-				src_ptl = huge_pte_lockptr(h, src, src_pte);
+				src_ptl = huge_pte_lockptr(huge_page_shift(h),
+							   src, src_pte);
 				spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
 				entry = huge_ptep_get(src_pte);
 				if (!pte_same(src_pte_old, entry)) {
@@ -4948,7 +4949,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr,
 	pte_t pte;
 
 	dst_ptl = huge_pte_lock(h, mm, dst_pte);
-	src_ptl = huge_pte_lockptr(h, mm, src_pte);
+	src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte);
 
 	/*
 	 * We don't have to worry about the ordering of src and dst ptlocks
@@ -6024,7 +6025,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 		page_in_pagecache = true;
 	}
 
-	ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
+	ptl = huge_pte_lockptr(huge_page_shift(h), dst_mm, dst_pte);
 	spin_lock(ptl);
 
 	/*
diff --git a/mm/migrate.c b/mm/migrate.c
index e51588e95f57..a8a960992373 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -318,7 +318,8 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 void migration_entry_wait_huge(struct vm_area_struct *vma,
 		struct mm_struct *mm, pte_t *pte)
 {
-	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
+	spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)),
+					   mm, pte);
 	__migration_entry_wait(mm, pte, ptl);
 }
 
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index c10f839fc410..8921dd4e41b1 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -174,7 +174,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 		if (!pvmw->pte)
 			return false;
 
-		pvmw->ptl = huge_pte_lockptr(hstate, mm, pvmw->pte);
+		pvmw->ptl = huge_pte_lockptr(huge_page_shift(hstate),
+					     mm, pvmw->pte);
 		spin_lock(pvmw->ptl);
 		if (!check_pte(pvmw))
 			return not_found(pvmw);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (3 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-27 12:28   ` manish.mishra
  2022-06-24 17:36 ` [RFC PATCH 06/26] mm: make free_p?d_range functions public James Houghton
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This adds the Kconfig to enable or disable high-granularity mapping. It
is enabled by default for architectures that use
ARCH_WANT_GENERAL_HUGETLB.

There is also an arch-specific config ARCH_HAS_SPECIAL_HUGETLB_HGM which
controls whether or not the architecture has been updated to support
HGM if it doesn't use general HugeTLB.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 fs/Kconfig | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/Kconfig b/fs/Kconfig
index 5976eb33535f..d76c7d812656 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -268,6 +268,13 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON
 	  to enable optimizing vmemmap pages of HugeTLB by default. It can then
 	  be disabled on the command line via hugetlb_free_vmemmap=off.
 
+config ARCH_HAS_SPECIAL_HUGETLB_HGM
+	bool
+
+config HUGETLB_HIGH_GRANULARITY_MAPPING
+	def_bool ARCH_WANT_GENERAL_HUGETLB || ARCH_HAS_SPECIAL_HUGETLB_HGM
+	depends on HUGETLB_PAGE
+
 config MEMFD_CREATE
 	def_bool TMPFS || HUGETLBFS
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 06/26] mm: make free_p?d_range functions public
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (4 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-27 12:31   ` manish.mishra
  2022-06-28 20:35   ` Mike Kravetz
  2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
                   ` (22 subsequent siblings)
  28 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This makes them usable for HugeTLB page table freeing operations.
After HugeTLB high-granularity mapping, the page table for a HugeTLB VMA
can get more complex, and these functions handle freeing page tables
generally.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/mm.h | 7 +++++++
 mm/memory.c        | 8 ++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..07f5da512147 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1847,6 +1847,13 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
 
 struct mmu_notifier_range;
 
+void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, unsigned long addr);
+void free_pmd_range(struct mmu_gather *tlb, pud_t *pud, unsigned long addr,
+		unsigned long end, unsigned long floor, unsigned long ceiling);
+void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d, unsigned long addr,
+		unsigned long end, unsigned long floor, unsigned long ceiling);
+void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd, unsigned long addr,
+		unsigned long end, unsigned long floor, unsigned long ceiling);
 void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
 		unsigned long end, unsigned long floor, unsigned long ceiling);
 int
diff --git a/mm/memory.c b/mm/memory.c
index 7a089145cad4..bb3b9b5b94fb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -227,7 +227,7 @@ static void check_sync_rss_stat(struct task_struct *task)
  * Note: this doesn't free the actual pages themselves. That
  * has been handled earlier when unmapping all the memory regions.
  */
-static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
+void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
 			   unsigned long addr)
 {
 	pgtable_t token = pmd_pgtable(*pmd);
@@ -236,7 +236,7 @@ static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
 	mm_dec_nr_ptes(tlb->mm);
 }
 
-static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
+inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
 				unsigned long addr, unsigned long end,
 				unsigned long floor, unsigned long ceiling)
 {
@@ -270,7 +270,7 @@ static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
 	mm_dec_nr_pmds(tlb->mm);
 }
 
-static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
+inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
 				unsigned long addr, unsigned long end,
 				unsigned long floor, unsigned long ceiling)
 {
@@ -304,7 +304,7 @@ static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
 	mm_dec_nr_puds(tlb->mm);
 }
 
-static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
+inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
 				unsigned long addr, unsigned long end,
 				unsigned long floor, unsigned long ceiling)
 {
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (5 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 06/26] mm: make free_p?d_range functions public James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-25  2:59   ` kernel test robot
                     ` (5 more replies)
  2022-06-24 17:36 ` [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures James Houghton
                   ` (21 subsequent siblings)
  28 siblings, 6 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

After high-granularity mapping, page table entries for HugeTLB pages can
be of any size/type. (For example, we can have a 1G page mapped with a
mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
PTE after we have done a page table walk.

Without this, we'd have to pass around the "size" of the PTE everywhere.
We effectively did this before; it could be fetched from the hstate,
which we pass around pretty much everywhere.

This commit includes definitions for some basic helper functions that
are used later. These helper functions wrap existing PTE
inspection/modification functions, where the correct version is picked
depending on if the HugeTLB PTE is actually "huge" or not. (Previously,
all HugeTLB PTEs were "huge").

For example, hugetlb_ptep_get wraps huge_ptep_get and ptep_get, where
ptep_get is used when the HugeTLB PTE is PAGE_SIZE, and huge_ptep_get is
used in all other cases.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/hugetlb.h | 84 +++++++++++++++++++++++++++++++++++++++++
 mm/hugetlb.c            | 57 ++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 5fe1db46d8c9..1d4ec9dfdebf 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -46,6 +46,68 @@ enum {
 	__NR_USED_SUBPAGE,
 };
 
+struct hugetlb_pte {
+	pte_t *ptep;
+	unsigned int shift;
+};
+
+static inline
+void hugetlb_pte_init(struct hugetlb_pte *hpte)
+{
+	hpte->ptep = NULL;
+}
+
+static inline
+void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep,
+			  unsigned int shift)
+{
+	BUG_ON(!ptep);
+	hpte->ptep = ptep;
+	hpte->shift = shift;
+}
+
+static inline
+unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte)
+{
+	BUG_ON(!hpte->ptep);
+	return 1UL << hpte->shift;
+}
+
+static inline
+unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte)
+{
+	BUG_ON(!hpte->ptep);
+	return ~(hugetlb_pte_size(hpte) - 1);
+}
+
+static inline
+unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte)
+{
+	BUG_ON(!hpte->ptep);
+	return hpte->shift;
+}
+
+static inline
+bool hugetlb_pte_huge(const struct hugetlb_pte *hpte)
+{
+	return !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) ||
+		hugetlb_pte_shift(hpte) > PAGE_SHIFT;
+}
+
+static inline
+void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
+{
+	dest->ptep = src->ptep;
+	dest->shift = src->shift;
+}
+
+bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte);
+bool hugetlb_pte_none(const struct hugetlb_pte *hpte);
+bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
+pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
+void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
+		       unsigned long address);
+
 struct hugepage_subpool {
 	spinlock_t lock;
 	long count;
@@ -1130,6 +1192,28 @@ static inline spinlock_t *huge_pte_lock_shift(unsigned int shift,
 	return ptl;
 }
 
+static inline
+spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
+{
+
+	BUG_ON(!hpte->ptep);
+	// Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
+	// the regular page table lock.
+	if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
+		return huge_pte_lockptr(hugetlb_pte_shift(hpte),
+				mm, hpte->ptep);
+	return &mm->page_table_lock;
+}
+
+static inline
+spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
+{
+	spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
+
+	spin_lock(ptl);
+	return ptl;
+}
+
 #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
 extern void __init hugetlb_cma_reserve(int order);
 extern void __init hugetlb_cma_check(void);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d6d0d4c03def..1a1434e29740 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1120,6 +1120,63 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
 	return false;
 }
 
+bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
+{
+	pgd_t pgd;
+	p4d_t p4d;
+	pud_t pud;
+	pmd_t pmd;
+
+	BUG_ON(!hpte->ptep);
+	if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {
+		pgd = *(pgd_t *)hpte->ptep;
+		return pgd_present(pgd) && pgd_leaf(pgd);
+	} else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
+		p4d = *(p4d_t *)hpte->ptep;
+		return p4d_present(p4d) && p4d_leaf(p4d);
+	} else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
+		pud = *(pud_t *)hpte->ptep;
+		return pud_present(pud) && pud_leaf(pud);
+	} else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
+		pmd = *(pmd_t *)hpte->ptep;
+		return pmd_present(pmd) && pmd_leaf(pmd);
+	} else if (hugetlb_pte_size(hpte) >= PAGE_SIZE)
+		return pte_present(*hpte->ptep);
+	BUG();
+}
+
+bool hugetlb_pte_none(const struct hugetlb_pte *hpte)
+{
+	if (hugetlb_pte_huge(hpte))
+		return huge_pte_none(huge_ptep_get(hpte->ptep));
+	return pte_none(ptep_get(hpte->ptep));
+}
+
+bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte)
+{
+	if (hugetlb_pte_huge(hpte))
+		return huge_pte_none_mostly(huge_ptep_get(hpte->ptep));
+	return pte_none_mostly(ptep_get(hpte->ptep));
+}
+
+pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte)
+{
+	if (hugetlb_pte_huge(hpte))
+		return huge_ptep_get(hpte->ptep);
+	return ptep_get(hpte->ptep);
+}
+
+void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
+		       unsigned long address)
+{
+	BUG_ON(!hpte->ptep);
+	unsigned long sz = hugetlb_pte_size(hpte);
+
+	if (sz > PAGE_SIZE)
+		return huge_pte_clear(mm, address, hpte->ptep, sz);
+	return pte_clear(mm, address, hpte->ptep);
+}
+
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (6 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-27 12:52   ` manish.mishra
  2022-06-28 20:27   ` Mina Almasry
  2022-06-24 17:36 ` [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled James Houghton
                   ` (20 subsequent siblings)
  28 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This is a helper function for freeing the bits of the page table that
map a particular HugeTLB PTE.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/hugetlb.h |  2 ++
 mm/hugetlb.c            | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 1d4ec9dfdebf..33ba48fac551 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -107,6 +107,8 @@ bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
 pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
 void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
 		       unsigned long address);
+void hugetlb_free_range(struct mmu_gather *tlb, const struct hugetlb_pte *hpte,
+			unsigned long start, unsigned long end);
 
 struct hugepage_subpool {
 	spinlock_t lock;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1a1434e29740..a2d2ffa76173 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1120,6 +1120,23 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
 	return false;
 }
 
+void hugetlb_free_range(struct mmu_gather *tlb, const struct hugetlb_pte *hpte,
+			unsigned long start, unsigned long end)
+{
+	unsigned long floor = start & hugetlb_pte_mask(hpte);
+	unsigned long ceiling = floor + hugetlb_pte_size(hpte);
+
+	if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {
+		free_p4d_range(tlb, (pgd_t *)hpte->ptep, start, end, floor, ceiling);
+	} else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
+		free_pud_range(tlb, (p4d_t *)hpte->ptep, start, end, floor, ceiling);
+	} else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
+		free_pmd_range(tlb, (pud_t *)hpte->ptep, start, end, floor, ceiling);
+	} else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
+		free_pte_range(tlb, (pmd_t *)hpte->ptep, start);
+	}
+}
+
 bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
 {
 	pgd_t pgd;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (7 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-27 12:55   ` manish.mishra
                     ` (2 more replies)
  2022-06-24 17:36 ` [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift James Houghton
                   ` (19 subsequent siblings)
  28 siblings, 3 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

Currently, this is always true if the VMA is shared. In the future, it's
possible that private mappings will get some or all HGM functionality.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/hugetlb.h | 10 ++++++++++
 mm/hugetlb.c            |  8 ++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 33ba48fac551..e7a6b944d0cc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1174,6 +1174,16 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 }
 #endif	/* CONFIG_HUGETLB_PAGE */
 
+#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
+/* If HugeTLB high-granularity mappings are enabled for this VMA. */
+bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
+#else
+static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
+{
+	return false;
+}
+#endif
+
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
 					struct mm_struct *mm, pte_t *pte)
 {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a2d2ffa76173..8b10b941458d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6983,6 +6983,14 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 
 #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 
+#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
+bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
+{
+	/* All shared VMAs have HGM enabled. */
+	return vma->vm_flags & VM_SHARED;
+}
+#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
+
 /*
  * These functions are overwritable if your architecture needs its own
  * behavior.
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (8 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-27 13:01   ` manish.mishra
  2022-06-28 21:58   ` Mina Almasry
  2022-06-24 17:36 ` [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks James Houghton
                   ` (18 subsequent siblings)
  28 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This is a helper macro to loop through all the usable page sizes for a
high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will
loop, in descending order, through the page sizes that HugeTLB supports
for this architecture; it always includes PAGE_SIZE.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 mm/hugetlb.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8b10b941458d..557b0afdb503 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6989,6 +6989,16 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
 	/* All shared VMAs have HGM enabled. */
 	return vma->vm_flags & VM_SHARED;
 }
+static unsigned int __shift_for_hstate(struct hstate *h)
+{
+	if (h >= &hstates[hugetlb_max_hstate])
+		return PAGE_SHIFT;
+	return huge_page_shift(h);
+}
+#define for_each_hgm_shift(hstate, tmp_h, shift) \
+	for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
+			       (tmp_h) <= &hstates[hugetlb_max_hstate]; \
+			       (tmp_h)++)
 #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
 
 /*
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (9 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-27 13:07   ` manish.mishra
  2022-09-08 18:20   ` Peter Xu
  2022-06-24 17:36 ` [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality James Houghton
                   ` (17 subsequent siblings)
  28 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This adds it for architectures that use GENERAL_HUGETLB, including x86.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/hugetlb.h |  2 ++
 mm/hugetlb.c            | 45 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e7a6b944d0cc..605aa19d8572 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -258,6 +258,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long addr, unsigned long sz);
 pte_t *huge_pte_offset(struct mm_struct *mm,
 		       unsigned long addr, unsigned long sz);
+int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
+		    unsigned long addr, unsigned long sz, bool stop_at_none);
 int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
 				unsigned long *addr, pte_t *ptep);
 void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 557b0afdb503..3ec2a921ee6f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6981,6 +6981,51 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	return (pte_t *)pmd;
 }
 
+int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
+		    unsigned long addr, unsigned long sz, bool stop_at_none)
+{
+	pte_t *ptep;
+
+	if (!hpte->ptep) {
+		pgd_t *pgd = pgd_offset(mm, addr);
+
+		if (!pgd)
+			return -ENOMEM;
+		ptep = (pte_t *)p4d_alloc(mm, pgd, addr);
+		if (!ptep)
+			return -ENOMEM;
+		hugetlb_pte_populate(hpte, ptep, P4D_SHIFT);
+	}
+
+	while (hugetlb_pte_size(hpte) > sz &&
+			!hugetlb_pte_present_leaf(hpte) &&
+			!(stop_at_none && hugetlb_pte_none(hpte))) {
+		if (hpte->shift == PMD_SHIFT) {
+			ptep = pte_alloc_map(mm, (pmd_t *)hpte->ptep, addr);
+			if (!ptep)
+				return -ENOMEM;
+			hpte->shift = PAGE_SHIFT;
+			hpte->ptep = ptep;
+		} else if (hpte->shift == PUD_SHIFT) {
+			ptep = (pte_t *)pmd_alloc(mm, (pud_t *)hpte->ptep,
+						  addr);
+			if (!ptep)
+				return -ENOMEM;
+			hpte->shift = PMD_SHIFT;
+			hpte->ptep = ptep;
+		} else if (hpte->shift == P4D_SHIFT) {
+			ptep = (pte_t *)pud_alloc(mm, (p4d_t *)hpte->ptep,
+						  addr);
+			if (!ptep)
+				return -ENOMEM;
+			hpte->shift = PUD_SHIFT;
+			hpte->ptep = ptep;
+		} else
+			BUG();
+	}
+	return 0;
+}
+
 #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 
 #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (10 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-27 13:50   ` manish.mishra
  2022-06-29 14:33   ` manish.mishra
  2022-06-24 17:36 ` [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity James Houghton
                   ` (16 subsequent siblings)
  28 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

The new function, hugetlb_split_to_shift, will optimally split the page
table to map a particular address at a particular granularity.

This is useful for punching a hole in the mapping and for mapping small
sections of a HugeTLB page (via UFFDIO_CONTINUE, for example).

Signed-off-by: James Houghton <jthoughton@google.com>
---
 mm/hugetlb.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3ec2a921ee6f..eaffe7b4f67c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -102,6 +102,18 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
 /* Forward declaration */
 static int hugetlb_acct_memory(struct hstate *h, long delta);
 
+/*
+ * Find the subpage that corresponds to `addr` in `hpage`.
+ */
+static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hpage,
+				 unsigned long addr)
+{
+	size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE;
+
+	BUG_ON(idx >= pages_per_huge_page(h));
+	return &hpage[idx];
+}
+
 static inline bool subpool_is_free(struct hugepage_subpool *spool)
 {
 	if (spool->count)
@@ -7044,6 +7056,116 @@ static unsigned int __shift_for_hstate(struct hstate *h)
 	for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
 			       (tmp_h) <= &hstates[hugetlb_max_hstate]; \
 			       (tmp_h)++)
+
+/*
+ * Given a particular address, split the HugeTLB PTE that currently maps it
+ * so that, for the given address, the PTE that maps it is `desired_shift`.
+ * This function will always split the HugeTLB PTE optimally.
+ *
+ * For example, given a HugeTLB 1G page that is mapped from VA 0 to 1G. If we
+ * call this function with addr=0 and desired_shift=PAGE_SHIFT, will result in
+ * these changes to the page table:
+ * 1. The PUD will be split into 2M PMDs.
+ * 2. The first PMD will be split again into 4K PTEs.
+ */
+static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma,
+			   const struct hugetlb_pte *hpte,
+			   unsigned long addr, unsigned long desired_shift)
+{
+	unsigned long start, end, curr;
+	unsigned long desired_sz = 1UL << desired_shift;
+	struct hstate *h = hstate_vma(vma);
+	int ret;
+	struct hugetlb_pte new_hpte;
+	struct mmu_notifier_range range;
+	struct page *hpage = NULL;
+	struct page *subpage;
+	pte_t old_entry;
+	struct mmu_gather tlb;
+
+	BUG_ON(!hpte->ptep);
+	BUG_ON(hugetlb_pte_size(hpte) == desired_sz);
+
+	start = addr & hugetlb_pte_mask(hpte);
+	end = start + hugetlb_pte_size(hpte);
+
+	i_mmap_assert_write_locked(vma->vm_file->f_mapping);
+
+	BUG_ON(!hpte->ptep);
+	/* This function only works if we are looking at a leaf-level PTE. */
+	BUG_ON(!hugetlb_pte_none(hpte) && !hugetlb_pte_present_leaf(hpte));
+
+	/*
+	 * Clear the PTE so that we will allocate the PT structures when
+	 * walking the page table.
+	 */
+	old_entry = huge_ptep_get_and_clear(mm, start, hpte->ptep);
+
+	if (!huge_pte_none(old_entry))
+		hpage = pte_page(old_entry);
+
+	BUG_ON(!IS_ALIGNED(start, desired_sz));
+	BUG_ON(!IS_ALIGNED(end, desired_sz));
+
+	for (curr = start; curr < end;) {
+		struct hstate *tmp_h;
+		unsigned int shift;
+
+		for_each_hgm_shift(h, tmp_h, shift) {
+			unsigned long sz = 1UL << shift;
+
+			if (!IS_ALIGNED(curr, sz) || curr + sz > end)
+				continue;
+			/*
+			 * If we are including `addr`, we need to make sure
+			 * splitting down to the correct size. Go to a smaller
+			 * size if we are not.
+			 */
+			if (curr <= addr && curr + sz > addr &&
+					shift > desired_shift)
+				continue;
+
+			/*
+			 * Continue the page table walk to the level we want,
+			 * allocate PT structures as we go.
+			 */
+			hugetlb_pte_copy(&new_hpte, hpte);
+			ret = hugetlb_walk_to(mm, &new_hpte, curr, sz,
+					      /*stop_at_none=*/false);
+			if (ret)
+				goto err;
+			BUG_ON(hugetlb_pte_size(&new_hpte) != sz);
+			if (hpage) {
+				pte_t new_entry;
+
+				subpage = hugetlb_find_subpage(h, hpage, curr);
+				new_entry = make_huge_pte_with_shift(vma, subpage,
+								     huge_pte_write(old_entry),
+								     shift);
+				set_huge_pte_at(mm, curr, new_hpte.ptep, new_entry);
+			}
+			curr += sz;
+			goto next;
+		}
+		/* We couldn't find a size that worked. */
+		BUG();
+next:
+		continue;
+	}
+
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+				start, end);
+	mmu_notifier_invalidate_range_start(&range);
+	return 0;
+err:
+	tlb_gather_mmu(&tlb, mm);
+	/* Free any newly allocated page table entries. */
+	hugetlb_free_range(&tlb, hpte, start, curr);
+	/* Restore the old entry. */
+	set_huge_pte_at(mm, start, hpte->ptep, old_entry);
+	tlb_finish_mmu(&tlb);
+	return ret;
+}
 #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
 
 /*
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (11 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-29 14:11   ` manish.mishra
  2022-06-24 17:36 ` [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
                   ` (15 subsequent siblings)
  28 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This function is to be used to do a HugeTLB page table walk where we may
need to split a leaf-level huge PTE into a new page table level.

Consider the case where we want to install 4K inside an empty 1G page:
1. We walk to the PUD and notice that it is pte_none.
2. We split the PUD by calling `hugetlb_split_to_shift`, creating a
   standard PUD that points to PMDs that are all pte_none.
3. We continue the PT walk to find the PMD. We split it just like we
   split the PUD.
4. We find the PTE and give it back to the caller.

To avoid concurrent splitting operations on the same page table entry,
we require that the mapping rwsem is held for writing while collapsing
and for reading when doing a high-granularity PT walk.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/hugetlb.h | 23 ++++++++++++++
 mm/hugetlb.c            | 67 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 605aa19d8572..321f5745d87f 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1176,14 +1176,37 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 }
 #endif	/* CONFIG_HUGETLB_PAGE */
 
+enum split_mode {
+	HUGETLB_SPLIT_NEVER   = 0,
+	HUGETLB_SPLIT_NONE    = 1 << 0,
+	HUGETLB_SPLIT_PRESENT = 1 << 1,
+	HUGETLB_SPLIT_ALWAYS  = HUGETLB_SPLIT_NONE | HUGETLB_SPLIT_PRESENT,
+};
 #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
 /* If HugeTLB high-granularity mappings are enabled for this VMA. */
 bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
+int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
+				    struct mm_struct *mm,
+				    struct vm_area_struct *vma,
+				    unsigned long addr,
+				    unsigned int desired_sz,
+				    enum split_mode mode,
+				    bool write_locked);
 #else
 static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
 {
 	return false;
 }
+static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
+					   struct mm_struct *mm,
+					   struct vm_area_struct *vma,
+					   unsigned long addr,
+					   unsigned int desired_sz,
+					   enum split_mode mode,
+					   bool write_locked)
+{
+	return -EINVAL;
+}
 #endif
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index eaffe7b4f67c..6e0c5fbfe32c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7166,6 +7166,73 @@ static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *v
 	tlb_finish_mmu(&tlb);
 	return ret;
 }
+
+/*
+ * Similar to huge_pte_alloc except that this can be used to create or walk
+ * high-granularity mappings. It will automatically split existing HugeTLB PTEs
+ * if required by @mode. The resulting HugeTLB PTE will be returned in @hpte.
+ *
+ * There are three options for @mode:
+ *  - HUGETLB_SPLIT_NEVER   - Never split.
+ *  - HUGETLB_SPLIT_NONE    - Split empty PTEs.
+ *  - HUGETLB_SPLIT_PRESENT - Split present PTEs.
+ *  - HUGETLB_SPLIT_ALWAYS  - Split both empty and present PTEs.
+ */
+int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
+				    struct mm_struct *mm,
+				    struct vm_area_struct *vma,
+				    unsigned long addr,
+				    unsigned int desired_shift,
+				    enum split_mode mode,
+				    bool write_locked)
+{
+	struct address_space *mapping = vma->vm_file->f_mapping;
+	bool has_write_lock = write_locked;
+	unsigned long desired_sz = 1UL << desired_shift;
+	int ret;
+
+	BUG_ON(!hpte);
+
+	if (has_write_lock)
+		i_mmap_assert_write_locked(mapping);
+	else
+		i_mmap_assert_locked(mapping);
+
+retry:
+	ret = 0;
+	hugetlb_pte_init(hpte);
+
+	ret = hugetlb_walk_to(mm, hpte, addr, desired_sz,
+			      !(mode & HUGETLB_SPLIT_NONE));
+	if (ret || hugetlb_pte_size(hpte) == desired_sz)
+		goto out;
+
+	if (
+		((mode & HUGETLB_SPLIT_NONE) && hugetlb_pte_none(hpte)) ||
+		((mode & HUGETLB_SPLIT_PRESENT) &&
+		  hugetlb_pte_present_leaf(hpte))
+	   ) {
+		if (!has_write_lock) {
+			i_mmap_unlock_read(mapping);
+			i_mmap_lock_write(mapping);
+			has_write_lock = true;
+			goto retry;
+		}
+		ret = hugetlb_split_to_shift(mm, vma, hpte, addr,
+					     desired_shift);
+	}
+
+out:
+	if (has_write_lock && !write_locked) {
+		/* Drop the write lock. */
+		i_mmap_unlock_write(mapping);
+		i_mmap_lock_read(mapping);
+		has_write_lock = false;
+		goto retry;
+	}
+
+	return ret;
+}
 #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
 
 /*
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (12 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-29 14:40   ` manish.mishra
  2022-06-24 17:36 ` [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This CL is the first main functional HugeTLB change. Together, these
changes allow the HugeTLB fault path to handle faults on HGM-enabled
VMAs. The two main behaviors that can be done now:
  1. Faults can be passed to handle_userfault. (Userspace will want to
     use UFFD_FEATURE_REAL_ADDRESS to get the real address to know which
     region they should be call UFFDIO_CONTINUE on later.)
  2. Faults on pages that have been partially mapped (and userfaultfd is
     not being used) will get mapped at the largest possible size.
     For example, if a 1G page has been partially mapped at 2M, and we
     fault on an unmapped 2M section, hugetlb_no_page will create a 2M
     PMD to map the faulting address.

This commit does not handle hugetlb_wp right now, and it doesn't handle
HugeTLB page migration and swap entries.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/hugetlb.h |  12 ++++
 mm/hugetlb.c            | 121 +++++++++++++++++++++++++++++++---------
 2 files changed, 106 insertions(+), 27 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 321f5745d87f..ac4ac8fbd901 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1185,6 +1185,9 @@ enum split_mode {
 #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
 /* If HugeTLB high-granularity mappings are enabled for this VMA. */
 bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
+int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
+			      struct vm_area_struct *vma, unsigned long start,
+			      unsigned long end);
 int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
 				    struct mm_struct *mm,
 				    struct vm_area_struct *vma,
@@ -1197,6 +1200,15 @@ static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
 {
 	return false;
 }
+
+static inline
+int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
+			      struct vm_area_struct *vma, unsigned long start,
+			      unsigned long end)
+{
+		BUG();
+}
+
 static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
 					   struct mm_struct *mm,
 					   struct vm_area_struct *vma,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6e0c5fbfe32c..da30621656b8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5605,18 +5605,24 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma,
 static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 			struct vm_area_struct *vma,
 			struct address_space *mapping, pgoff_t idx,
-			unsigned long address, pte_t *ptep,
+			unsigned long address, struct hugetlb_pte *hpte,
 			pte_t old_pte, unsigned int flags)
 {
 	struct hstate *h = hstate_vma(vma);
 	vm_fault_t ret = VM_FAULT_SIGBUS;
 	int anon_rmap = 0;
 	unsigned long size;
-	struct page *page;
+	struct page *page, *subpage;
 	pte_t new_pte;
 	spinlock_t *ptl;
 	unsigned long haddr = address & huge_page_mask(h);
+	unsigned long haddr_hgm = address & hugetlb_pte_mask(hpte);
 	bool new_page, new_pagecache_page = false;
+	/*
+	 * This page is getting mapped for the first time, in which case we
+	 * want to increment its mapcount.
+	 */
+	bool new_mapping = hpte->shift == huge_page_shift(h);
 
 	/*
 	 * Currently, we are forced to kill the process in the event the
@@ -5665,9 +5671,9 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 			 * here.  Before returning error, get ptl and make
 			 * sure there really is no pte entry.
 			 */
-			ptl = huge_pte_lock(h, mm, ptep);
+			ptl = hugetlb_pte_lock(mm, hpte);
 			ret = 0;
-			if (huge_pte_none(huge_ptep_get(ptep)))
+			if (hugetlb_pte_none(hpte))
 				ret = vmf_error(PTR_ERR(page));
 			spin_unlock(ptl);
 			goto out;
@@ -5731,18 +5737,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 		vma_end_reservation(h, vma, haddr);
 	}
 
-	ptl = huge_pte_lock(h, mm, ptep);
+	ptl = hugetlb_pte_lock(mm, hpte);
 	ret = 0;
 	/* If pte changed from under us, retry */
-	if (!pte_same(huge_ptep_get(ptep), old_pte))
+	if (!pte_same(hugetlb_ptep_get(hpte), old_pte))
 		goto backout;
 
-	if (anon_rmap) {
-		ClearHPageRestoreReserve(page);
-		hugepage_add_new_anon_rmap(page, vma, haddr);
-	} else
-		page_dup_file_rmap(page, true);
-	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
+	if (new_mapping) {
+		/* Only increment this page's mapcount if we are mapping it
+		 * for the first time.
+		 */
+		if (anon_rmap) {
+			ClearHPageRestoreReserve(page);
+			hugepage_add_new_anon_rmap(page, vma, haddr);
+		} else
+			page_dup_file_rmap(page, true);
+	}
+
+	subpage = hugetlb_find_subpage(h, page, haddr_hgm);
+	new_pte = make_huge_pte(vma, subpage, ((vma->vm_flags & VM_WRITE)
 				&& (vma->vm_flags & VM_SHARED)));
 	/*
 	 * If this pte was previously wr-protected, keep it wr-protected even
@@ -5750,12 +5763,13 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 	 */
 	if (unlikely(pte_marker_uffd_wp(old_pte)))
 		new_pte = huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte));
-	set_huge_pte_at(mm, haddr, ptep, new_pte);
+	set_huge_pte_at(mm, haddr_hgm, hpte->ptep, new_pte);
 
-	hugetlb_count_add(pages_per_huge_page(h), mm);
+	hugetlb_count_add(hugetlb_pte_size(hpte) / PAGE_SIZE, mm);
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
+		BUG_ON(hugetlb_pte_size(hpte) != huge_page_size(h));
 		/* Optimization, do the COW without a second fault */
-		ret = hugetlb_wp(mm, vma, address, ptep, flags, page, ptl);
+		ret = hugetlb_wp(mm, vma, address, hpte->ptep, flags, page, ptl);
 	}
 
 	spin_unlock(ptl);
@@ -5816,11 +5830,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	u32 hash;
 	pgoff_t idx;
 	struct page *page = NULL;
+	struct page *subpage = NULL;
 	struct page *pagecache_page = NULL;
 	struct hstate *h = hstate_vma(vma);
 	struct address_space *mapping;
 	int need_wait_lock = 0;
 	unsigned long haddr = address & huge_page_mask(h);
+	unsigned long haddr_hgm;
+	bool hgm_enabled = hugetlb_hgm_enabled(vma);
+	struct hugetlb_pte hpte;
 
 	ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
 	if (ptep) {
@@ -5866,11 +5884,22 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	hash = hugetlb_fault_mutex_hash(mapping, idx);
 	mutex_lock(&hugetlb_fault_mutex_table[hash]);
 
-	entry = huge_ptep_get(ptep);
+	hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h));
+
+	if (hgm_enabled) {
+		ret = hugetlb_walk_to(mm, &hpte, address,
+				      PAGE_SIZE, /*stop_at_none=*/true);
+		if (ret) {
+			ret = vmf_error(ret);
+			goto out_mutex;
+		}
+	}
+
+	entry = hugetlb_ptep_get(&hpte);
 	/* PTE markers should be handled the same way as none pte */
-	if (huge_pte_none_mostly(entry)) {
-		ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
-				      entry, flags);
+	if (hugetlb_pte_none_mostly(&hpte)) {
+		ret = hugetlb_no_page(mm, vma, mapping, idx, address, &hpte,
+				entry, flags);
 		goto out_mutex;
 	}
 
@@ -5908,14 +5937,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 								vma, haddr);
 	}
 
-	ptl = huge_pte_lock(h, mm, ptep);
+	ptl = hugetlb_pte_lock(mm, &hpte);
 
 	/* Check for a racing update before calling hugetlb_wp() */
-	if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
+	if (unlikely(!pte_same(entry, hugetlb_ptep_get(&hpte))))
 		goto out_ptl;
 
+	/* haddr_hgm is the base address of the region that hpte maps. */
+	haddr_hgm = address & hugetlb_pte_mask(&hpte);
+
 	/* Handle userfault-wp first, before trying to lock more pages */
-	if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
+	if (userfaultfd_wp(vma) && huge_pte_uffd_wp(hugetlb_ptep_get(&hpte)) &&
 	    (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
 		struct vm_fault vmf = {
 			.vma = vma,
@@ -5939,7 +5971,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * pagecache_page, so here we need take the former one
 	 * when page != pagecache_page or !pagecache_page.
 	 */
-	page = pte_page(entry);
+	subpage = pte_page(entry);
+	page = compound_head(subpage);
 	if (page != pagecache_page)
 		if (!trylock_page(page)) {
 			need_wait_lock = 1;
@@ -5950,7 +5983,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
 		if (!huge_pte_write(entry)) {
-			ret = hugetlb_wp(mm, vma, address, ptep, flags,
+			BUG_ON(hugetlb_pte_size(&hpte) != huge_page_size(h));
+			ret = hugetlb_wp(mm, vma, address, hpte.ptep, flags,
 					 pagecache_page, ptl);
 			goto out_put_page;
 		} else if (likely(flags & FAULT_FLAG_WRITE)) {
@@ -5958,9 +5992,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		}
 	}
 	entry = pte_mkyoung(entry);
-	if (huge_ptep_set_access_flags(vma, haddr, ptep, entry,
+	if (huge_ptep_set_access_flags(vma, haddr_hgm, hpte.ptep, entry,
 						flags & FAULT_FLAG_WRITE))
-		update_mmu_cache(vma, haddr, ptep);
+		update_mmu_cache(vma, haddr_hgm, hpte.ptep);
 out_put_page:
 	if (page != pagecache_page)
 		unlock_page(page);
@@ -6951,7 +6985,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 				pte = (pte_t *)pmd_alloc(mm, pud, addr);
 		}
 	}
-	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
+	if (!hugetlb_hgm_enabled(vma))
+		BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
 
 	return pte;
 }
@@ -7057,6 +7092,38 @@ static unsigned int __shift_for_hstate(struct hstate *h)
 			       (tmp_h) <= &hstates[hugetlb_max_hstate]; \
 			       (tmp_h)++)
 
+/*
+ * Allocate a HugeTLB PTE that maps as much of [start, end) as possible with a
+ * single page table entry. The allocated HugeTLB PTE is returned in hpte.
+ */
+int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
+			      struct vm_area_struct *vma, unsigned long start,
+			      unsigned long end)
+{
+	struct hstate *h = hstate_vma(vma), *tmp_h;
+	unsigned int shift;
+	int ret;
+
+	for_each_hgm_shift(h, tmp_h, shift) {
+		unsigned long sz = 1UL << shift;
+
+		if (!IS_ALIGNED(start, sz) || start + sz > end)
+			continue;
+		ret = huge_pte_alloc_high_granularity(hpte, mm, vma, start,
+						      shift, HUGETLB_SPLIT_NONE,
+						      /*write_locked=*/false);
+		if (ret)
+			return ret;
+
+		if (hpte->shift > shift)
+			return -EEXIST;
+
+		BUG_ON(hpte->shift != shift);
+		return 0;
+	}
+	return -EINVAL;
+}
+
 /*
  * Given a particular address, split the HugeTLB PTE that currently maps it
  * so that, for the given address, the PTE that maps it is `desired_shift`.
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (13 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-07-19 10:19   ` manish.mishra
  2022-06-24 17:36 ` [RFC PATCH 16/26] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
                   ` (13 subsequent siblings)
  28 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This enlightens __unmap_hugepage_range to deal with high-granularity
mappings. This doesn't change its API; it still must be called with
hugepage alignment, but it will correctly unmap hugepages that have been
mapped at high granularity.

Analogous to the mapcount rules introduced by hugetlb_no_page, we only
drop mapcount in this case if we are unmapping an entire hugepage in one
operation. This is the case when a VMA is destroyed.

Eventually, functionality here can be expanded to allow users to call
MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is
not done here.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/asm-generic/tlb.h |  6 +--
 mm/hugetlb.c              | 85 ++++++++++++++++++++++++++-------------
 2 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index ff3e82553a76..8daa3ae460d9 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -562,9 +562,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb,
 		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
+#define tlb_remove_huge_tlb_entry(tlb, hpte, address)	\
 	do {							\
-		unsigned long _sz = huge_page_size(h);		\
+		unsigned long _sz = hugetlb_pte_size(&hpte);	\
 		if (_sz >= P4D_SIZE)				\
 			tlb_flush_p4d_range(tlb, address, _sz);	\
 		else if (_sz >= PUD_SIZE)			\
@@ -573,7 +573,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb,
 			tlb_flush_pmd_range(tlb, address, _sz);	\
 		else						\
 			tlb_flush_pte_range(tlb, address, _sz);	\
-		__tlb_remove_tlb_entry(tlb, ptep, address);	\
+		__tlb_remove_tlb_entry(tlb, hpte.ptep, address);\
 	} while (0)
 
 /**
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index da30621656b8..51fc1d3f122f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5120,24 +5120,20 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
-	pte_t *ptep;
+	struct hugetlb_pte hpte;
 	pte_t pte;
 	spinlock_t *ptl;
-	struct page *page;
+	struct page *hpage, *subpage;
 	struct hstate *h = hstate_vma(vma);
 	unsigned long sz = huge_page_size(h);
 	struct mmu_notifier_range range;
 	bool force_flush = false;
+	bool hgm_enabled = hugetlb_hgm_enabled(vma);
 
 	WARN_ON(!is_vm_hugetlb_page(vma));
 	BUG_ON(start & ~huge_page_mask(h));
 	BUG_ON(end & ~huge_page_mask(h));
 
-	/*
-	 * This is a hugetlb vma, all the pte entries should point
-	 * to huge page.
-	 */
-	tlb_change_page_size(tlb, sz);
 	tlb_start_vma(tlb, vma);
 
 	/*
@@ -5148,25 +5144,43 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
 	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
 	mmu_notifier_invalidate_range_start(&range);
 	address = start;
-	for (; address < end; address += sz) {
-		ptep = huge_pte_offset(mm, address, sz);
-		if (!ptep)
+
+	while (address < end) {
+		pte_t *ptep = huge_pte_offset(mm, address, sz);
+
+		if (!ptep) {
+			address += sz;
 			continue;
+		}
+		hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h));
+		if (hgm_enabled) {
+			int ret = huge_pte_alloc_high_granularity(
+					&hpte, mm, vma, address, PAGE_SHIFT,
+					HUGETLB_SPLIT_NEVER,
+					/*write_locked=*/true);
+			/*
+			 * We will never split anything, so this should always
+			 * succeed.
+			 */
+			BUG_ON(ret);
+		}
 
-		ptl = huge_pte_lock(h, mm, ptep);
-		if (huge_pmd_unshare(mm, vma, &address, ptep)) {
+		ptl = hugetlb_pte_lock(mm, &hpte);
+		if (!hgm_enabled && huge_pmd_unshare(
+					mm, vma, &address, hpte.ptep)) {
 			spin_unlock(ptl);
 			tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
 			force_flush = true;
-			continue;
+			goto next_hpte;
 		}
 
-		pte = huge_ptep_get(ptep);
-		if (huge_pte_none(pte)) {
+		if (hugetlb_pte_none(&hpte)) {
 			spin_unlock(ptl);
-			continue;
+			goto next_hpte;
 		}
 
+		pte = hugetlb_ptep_get(&hpte);
+
 		/*
 		 * Migrating hugepage or HWPoisoned hugepage is already
 		 * unmapped and its refcount is dropped, so just clear pte here.
@@ -5180,24 +5194,27 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
 			 */
 			if (pte_swp_uffd_wp_any(pte) &&
 			    !(zap_flags & ZAP_FLAG_DROP_MARKER))
-				set_huge_pte_at(mm, address, ptep,
+				set_huge_pte_at(mm, address, hpte.ptep,
 						make_pte_marker(PTE_MARKER_UFFD_WP));
 			else
-				huge_pte_clear(mm, address, ptep, sz);
+				huge_pte_clear(mm, address, hpte.ptep,
+						hugetlb_pte_size(&hpte));
 			spin_unlock(ptl);
-			continue;
+			goto next_hpte;
 		}
 
-		page = pte_page(pte);
+		subpage = pte_page(pte);
+		BUG_ON(!subpage);
+		hpage = compound_head(subpage);
 		/*
 		 * If a reference page is supplied, it is because a specific
 		 * page is being unmapped, not a range. Ensure the page we
 		 * are about to unmap is the actual page of interest.
 		 */
 		if (ref_page) {
-			if (page != ref_page) {
+			if (hpage != ref_page) {
 				spin_unlock(ptl);
-				continue;
+				goto next_hpte;
 			}
 			/*
 			 * Mark the VMA as having unmapped its page so that
@@ -5207,25 +5224,35 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
 			set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
 		}
 
-		pte = huge_ptep_get_and_clear(mm, address, ptep);
-		tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
+		pte = huge_ptep_get_and_clear(mm, address, hpte.ptep);
+		tlb_change_page_size(tlb, hugetlb_pte_size(&hpte));
+		tlb_remove_huge_tlb_entry(tlb, hpte, address);
 		if (huge_pte_dirty(pte))
-			set_page_dirty(page);
+			set_page_dirty(hpage);
 		/* Leave a uffd-wp pte marker if needed */
 		if (huge_pte_uffd_wp(pte) &&
 		    !(zap_flags & ZAP_FLAG_DROP_MARKER))
-			set_huge_pte_at(mm, address, ptep,
+			set_huge_pte_at(mm, address, hpte.ptep,
 					make_pte_marker(PTE_MARKER_UFFD_WP));
-		hugetlb_count_sub(pages_per_huge_page(h), mm);
-		page_remove_rmap(page, vma, true);
+
+		hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm);
+
+		/*
+		 * If we are unmapping the entire page, remove it from the
+		 * rmap.
+		 */
+		if (IS_ALIGNED(address, sz) && address + sz <= end)
+			page_remove_rmap(hpage, vma, true);
 
 		spin_unlock(ptl);
-		tlb_remove_page_size(tlb, page, huge_page_size(h));
+		tlb_remove_page_size(tlb, subpage, hugetlb_pte_size(&hpte));
 		/*
 		 * Bail out after unmapping reference page if supplied
 		 */
 		if (ref_page)
 			break;
+next_hpte:
+		address += hugetlb_pte_size(&hpte);
 	}
 	mmu_notifier_invalidate_range_end(&range);
 	tlb_end_vma(tlb, vma);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 16/26] hugetlb: make hugetlb_change_protection compatible with HGM
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (14 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 17:36 ` [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM James Houghton
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

HugeTLB is now able to change the protection of hugepages that are
mapped at high granularity.

I need to add more of the HugeTLB PTE wrapper functions to clean up this
patch. I'll do this in the next version.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 mm/hugetlb.c | 91 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 62 insertions(+), 29 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 51fc1d3f122f..f9c7daa6c090 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6476,14 +6476,15 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long start = address;
-	pte_t *ptep;
 	pte_t pte;
 	struct hstate *h = hstate_vma(vma);
-	unsigned long pages = 0, psize = huge_page_size(h);
+	unsigned long base_pages = 0, psize = huge_page_size(h);
 	bool shared_pmd = false;
 	struct mmu_notifier_range range;
 	bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
 	bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
+	struct hugetlb_pte hpte;
+	bool hgm_enabled = hugetlb_hgm_enabled(vma);
 
 	/*
 	 * In the case of shared PMDs, the area to flush could be beyond
@@ -6499,28 +6500,38 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 
 	mmu_notifier_invalidate_range_start(&range);
 	i_mmap_lock_write(vma->vm_file->f_mapping);
-	for (; address < end; address += psize) {
+	while (address < end) {
 		spinlock_t *ptl;
-		ptep = huge_pte_offset(mm, address, psize);
-		if (!ptep)
+		pte_t *ptep = huge_pte_offset(mm, address, huge_page_size(h));
+
+		if (!ptep) {
+			address += huge_page_size(h);
 			continue;
-		ptl = huge_pte_lock(h, mm, ptep);
-		if (huge_pmd_unshare(mm, vma, &address, ptep)) {
+		}
+		hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h));
+		if (hgm_enabled) {
+			int ret = hugetlb_walk_to(mm, &hpte, address, PAGE_SIZE,
+						  /*stop_at_none=*/true);
+			BUG_ON(ret);
+		}
+
+		ptl = hugetlb_pte_lock(mm, &hpte);
+		if (huge_pmd_unshare(mm, vma, &address, hpte.ptep)) {
 			/*
 			 * When uffd-wp is enabled on the vma, unshare
 			 * shouldn't happen at all.  Warn about it if it
 			 * happened due to some reason.
 			 */
 			WARN_ON_ONCE(uffd_wp || uffd_wp_resolve);
-			pages++;
+			base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE;
 			spin_unlock(ptl);
 			shared_pmd = true;
-			continue;
+			goto next_hpte;
 		}
-		pte = huge_ptep_get(ptep);
+		pte = hugetlb_ptep_get(&hpte);
 		if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) {
 			spin_unlock(ptl);
-			continue;
+			goto next_hpte;
 		}
 		if (unlikely(is_hugetlb_entry_migration(pte))) {
 			swp_entry_t entry = pte_to_swp_entry(pte);
@@ -6540,12 +6551,13 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 					newpte = pte_swp_mkuffd_wp(newpte);
 				else if (uffd_wp_resolve)
 					newpte = pte_swp_clear_uffd_wp(newpte);
-				set_huge_swap_pte_at(mm, address, ptep,
-						     newpte, psize);
-				pages++;
+				set_huge_swap_pte_at(mm, address, hpte.ptep,
+						     newpte,
+						     hugetlb_pte_size(&hpte));
+				base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE;
 			}
 			spin_unlock(ptl);
-			continue;
+			goto next_hpte;
 		}
 		if (unlikely(pte_marker_uffd_wp(pte))) {
 			/*
@@ -6553,21 +6565,40 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 			 * no need for huge_ptep_modify_prot_start/commit().
 			 */
 			if (uffd_wp_resolve)
-				huge_pte_clear(mm, address, ptep, psize);
+				huge_pte_clear(mm, address, hpte.ptep, psize);
 		}
-		if (!huge_pte_none(pte)) {
+		if (!hugetlb_pte_none(&hpte)) {
 			pte_t old_pte;
-			unsigned int shift = huge_page_shift(hstate_vma(vma));
-
-			old_pte = huge_ptep_modify_prot_start(vma, address, ptep);
-			pte = huge_pte_modify(old_pte, newprot);
-			pte = arch_make_huge_pte(pte, shift, vma->vm_flags);
-			if (uffd_wp)
-				pte = huge_pte_mkuffd_wp(huge_pte_wrprotect(pte));
-			else if (uffd_wp_resolve)
-				pte = huge_pte_clear_uffd_wp(pte);
-			huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
-			pages++;
+			unsigned int shift = hpte.shift;
+			/*
+			 * This is ugly. This will be cleaned up in a future
+			 * version of this series.
+			 */
+			if (shift > PAGE_SHIFT) {
+				old_pte = huge_ptep_modify_prot_start(
+						vma, address, hpte.ptep);
+				pte = huge_pte_modify(old_pte, newprot);
+				pte = arch_make_huge_pte(
+						pte, shift, vma->vm_flags);
+				if (uffd_wp)
+					pte = huge_pte_mkuffd_wp(huge_pte_wrprotect(pte));
+				else if (uffd_wp_resolve)
+					pte = huge_pte_clear_uffd_wp(pte);
+				huge_ptep_modify_prot_commit(
+						vma, address, hpte.ptep,
+						old_pte, pte);
+			} else {
+				old_pte = ptep_modify_prot_start(
+						vma, address, hpte.ptep);
+				pte = pte_modify(old_pte, newprot);
+				if (uffd_wp)
+					pte = pte_mkuffd_wp(pte_wrprotect(pte));
+				else if (uffd_wp_resolve)
+					pte = pte_clear_uffd_wp(pte);
+				ptep_modify_prot_commit(
+						vma, address, hpte.ptep, old_pte, pte);
+			}
+			base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE;
 		} else {
 			/* None pte */
 			if (unlikely(uffd_wp))
@@ -6576,6 +6607,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 						make_pte_marker(PTE_MARKER_UFFD_WP));
 		}
 		spin_unlock(ptl);
+next_hpte:
+		address += hugetlb_pte_size(&hpte);
 	}
 	/*
 	 * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare
@@ -6597,7 +6630,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	i_mmap_unlock_write(vma->vm_file->f_mapping);
 	mmu_notifier_invalidate_range_end(&range);
 
-	return pages << h->order;
+	return base_pages;
 }
 
 /* Return true if reservation was successful, false otherwise.  */
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (15 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 16/26] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-07-19 10:48   ` manish.mishra
  2022-06-24 17:36 ` [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This enables support for GUP, and it is needed for the KVM demand paging
self-test to work.

One important change here is that, before, we never needed to grab the
i_mmap_sem, but now, to prevent someone from collapsing the page tables
out from under us, we grab it for reading when doing high-granularity PT
walks.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 mm/hugetlb.c | 70 ++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 57 insertions(+), 13 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f9c7daa6c090..aadfcee947cf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6298,14 +6298,18 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long vaddr = *position;
 	unsigned long remainder = *nr_pages;
 	struct hstate *h = hstate_vma(vma);
+	struct address_space *mapping = vma->vm_file->f_mapping;
 	int err = -EFAULT, refs;
+	bool has_i_mmap_sem = false;
 
 	while (vaddr < vma->vm_end && remainder) {
 		pte_t *pte;
 		spinlock_t *ptl = NULL;
 		bool unshare = false;
 		int absent;
+		unsigned long pages_per_hpte;
 		struct page *page;
+		struct hugetlb_pte hpte;
 
 		/*
 		 * If we have a pending SIGKILL, don't keep faulting pages and
@@ -6325,9 +6329,23 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h),
 				      huge_page_size(h));
-		if (pte)
-			ptl = huge_pte_lock(h, mm, pte);
-		absent = !pte || huge_pte_none(huge_ptep_get(pte));
+		if (pte) {
+			hugetlb_pte_populate(&hpte, pte, huge_page_shift(h));
+			if (hugetlb_hgm_enabled(vma)) {
+				BUG_ON(has_i_mmap_sem);
+				i_mmap_lock_read(mapping);
+				/*
+				 * Need to hold the mapping semaphore for
+				 * reading to do a HGM walk.
+				 */
+				has_i_mmap_sem = true;
+				hugetlb_walk_to(mm, &hpte, vaddr, PAGE_SIZE,
+						/*stop_at_none=*/true);
+			}
+			ptl = hugetlb_pte_lock(mm, &hpte);
+		}
+
+		absent = !pte || hugetlb_pte_none(&hpte);
 
 		/*
 		 * When coredumping, it suits get_dump_page if we just return
@@ -6338,8 +6356,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		if (absent && (flags & FOLL_DUMP) &&
 		    !hugetlbfs_pagecache_present(h, vma, vaddr)) {
-			if (pte)
+			if (pte) {
+				if (has_i_mmap_sem) {
+					i_mmap_unlock_read(mapping);
+					has_i_mmap_sem = false;
+				}
 				spin_unlock(ptl);
+			}
 			remainder = 0;
 			break;
 		}
@@ -6359,8 +6382,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			vm_fault_t ret;
 			unsigned int fault_flags = 0;
 
-			if (pte)
+			if (pte) {
+				if (has_i_mmap_sem) {
+					i_mmap_unlock_read(mapping);
+					has_i_mmap_sem = false;
+				}
 				spin_unlock(ptl);
+			}
 			if (flags & FOLL_WRITE)
 				fault_flags |= FAULT_FLAG_WRITE;
 			else if (unshare)
@@ -6403,8 +6431,11 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			continue;
 		}
 
-		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
-		page = pte_page(huge_ptep_get(pte));
+		pfn_offset = (vaddr & ~hugetlb_pte_mask(&hpte)) >> PAGE_SHIFT;
+		page = pte_page(hugetlb_ptep_get(&hpte));
+		pages_per_hpte = hugetlb_pte_size(&hpte) / PAGE_SIZE;
+		if (hugetlb_hgm_enabled(vma))
+			page = compound_head(page);
 
 		VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
 			       !PageAnonExclusive(page), page);
@@ -6414,17 +6445,21 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * and skip the same_page loop below.
 		 */
 		if (!pages && !vmas && !pfn_offset &&
-		    (vaddr + huge_page_size(h) < vma->vm_end) &&
-		    (remainder >= pages_per_huge_page(h))) {
-			vaddr += huge_page_size(h);
-			remainder -= pages_per_huge_page(h);
-			i += pages_per_huge_page(h);
+		    (vaddr + pages_per_hpte < vma->vm_end) &&
+		    (remainder >= pages_per_hpte)) {
+			vaddr += pages_per_hpte;
+			remainder -= pages_per_hpte;
+			i += pages_per_hpte;
 			spin_unlock(ptl);
+			if (has_i_mmap_sem) {
+				has_i_mmap_sem = false;
+				i_mmap_unlock_read(mapping);
+			}
 			continue;
 		}
 
 		/* vaddr may not be aligned to PAGE_SIZE */
-		refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
+		refs = min3(pages_per_hpte - pfn_offset, remainder,
 		    (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
 
 		if (pages || vmas)
@@ -6447,6 +6482,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
 							 flags))) {
 				spin_unlock(ptl);
+				if (has_i_mmap_sem) {
+					has_i_mmap_sem = false;
+					i_mmap_unlock_read(mapping);
+				}
 				remainder = 0;
 				err = -ENOMEM;
 				break;
@@ -6458,8 +6497,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		i += refs;
 
 		spin_unlock(ptl);
+		if (has_i_mmap_sem) {
+			has_i_mmap_sem = false;
+			i_mmap_unlock_read(mapping);
+		}
 	}
 	*nr_pages = remainder;
+	BUG_ON(has_i_mmap_sem);
 	/*
 	 * setting position is actually required only if remainder is
 	 * not zero but it's faster not to add a "if (remainder)"
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (16 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-25  2:58   ` kernel test robot
  2022-06-25  2:59   ` kernel test robot
  2022-06-24 17:36 ` [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
                   ` (10 subsequent siblings)
  28 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

Although this change is large, it is somewhat straightforward. Before,
all users of walk_hugetlb_range could get the size of the PTE just be
checking the hmask or the mm_walk struct. With HGM, that information is
held in the hugetlb_pte struct, so we provide that instead of the raw
pte_t*.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 arch/s390/mm/gmap.c      |  8 ++++++--
 fs/proc/task_mmu.c       | 35 +++++++++++++++++++----------------
 include/linux/pagewalk.h |  3 ++-
 mm/damon/vaddr.c         | 34 ++++++++++++++++++----------------
 mm/hmm.c                 |  7 ++++---
 mm/mempolicy.c           | 11 ++++++++---
 mm/mincore.c             |  4 ++--
 mm/mprotect.c            |  6 +++---
 mm/pagewalk.c            | 18 ++++++++++++++++--
 9 files changed, 78 insertions(+), 48 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index b8ae4a4aa2ba..518cebfd72cd 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2620,10 +2620,14 @@ static int __s390_enable_skey_pmd(pmd_t *pmd, unsigned long addr,
 	return 0;
 }
 
-static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
-				      unsigned long hmask, unsigned long next,
+static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte,
+				      unsigned long addr, unsigned long next,
 				      struct mm_walk *walk)
 {
+	if (!hugetlb_pte_present_leaf(hpte) ||
+			hugetlb_pte_size(hpte) != PMD_SIZE)
+		return -EINVAL;
+
 	pmd_t *pmd = (pmd_t *)pte;
 	unsigned long start, end;
 	struct page *page = pmd_page(*pmd);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2d04e3470d4c..b2d683f99fa9 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -714,18 +714,19 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
+static int smaps_hugetlb_range(struct hugetlb_pte *hpte,
 				 unsigned long addr, unsigned long end,
 				 struct mm_walk *walk)
 {
 	struct mem_size_stats *mss = walk->private;
 	struct vm_area_struct *vma = walk->vma;
 	struct page *page = NULL;
+	pte_t pte = hugetlb_ptep_get(hpte);
 
-	if (pte_present(*pte)) {
-		page = vm_normal_page(vma, addr, *pte);
-	} else if (is_swap_pte(*pte)) {
-		swp_entry_t swpent = pte_to_swp_entry(*pte);
+	if (hugetlb_pte_present_leaf(hpte)) {
+		page = vm_normal_page(vma, addr, pte);
+	} else if (is_swap_pte(pte)) {
+		swp_entry_t swpent = pte_to_swp_entry(pte);
 
 		if (is_pfn_swap_entry(swpent))
 			page = pfn_swap_entry_to_page(swpent);
@@ -734,9 +735,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
 		int mapcount = page_mapcount(page);
 
 		if (mapcount >= 2)
-			mss->shared_hugetlb += huge_page_size(hstate_vma(vma));
+			mss->shared_hugetlb += hugetlb_pte_size(hpte);
 		else
-			mss->private_hugetlb += huge_page_size(hstate_vma(vma));
+			mss->private_hugetlb += hugetlb_pte_size(hpte);
 	}
 	return 0;
 }
@@ -1535,7 +1536,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 
 #ifdef CONFIG_HUGETLB_PAGE
 /* This function walks within one hugetlb entry in the single call */
-static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
+static int pagemap_hugetlb_range(struct hugetlb_pte *hpte,
 				 unsigned long addr, unsigned long end,
 				 struct mm_walk *walk)
 {
@@ -1543,13 +1544,13 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 	struct vm_area_struct *vma = walk->vma;
 	u64 flags = 0, frame = 0;
 	int err = 0;
-	pte_t pte;
+	unsigned long hmask = hugetlb_pte_mask(hpte);
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;
 
-	pte = huge_ptep_get(ptep);
-	if (pte_present(pte)) {
+	if (hugetlb_pte_present_leaf(hpte)) {
+		pte_t pte = hugetlb_ptep_get(hpte);
 		struct page *page = pte_page(pte);
 
 		if (!PageAnon(page))
@@ -1565,7 +1566,7 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) +
 				((addr & ~hmask) >> PAGE_SHIFT);
-	} else if (pte_swp_uffd_wp_any(pte)) {
+	} else if (pte_swp_uffd_wp_any(hugetlb_ptep_get(hpte))) {
 		flags |= PM_UFFD_WP;
 	}
 
@@ -1869,17 +1870,19 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 	return 0;
 }
 #ifdef CONFIG_HUGETLB_PAGE
-static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
-		unsigned long addr, unsigned long end, struct mm_walk *walk)
+static int gather_hugetlb_stats(struct hugetlb_pte *hpte, unsigned long addr,
+		unsigned long end, struct mm_walk *walk)
 {
-	pte_t huge_pte = huge_ptep_get(pte);
+	pte_t huge_pte = hugetlb_ptep_get(hpte);
 	struct numa_maps *md;
 	struct page *page;
 
-	if (!pte_present(huge_pte))
+	if (!hugetlb_pte_present_leaf(hpte))
 		return 0;
 
 	page = pte_page(huge_pte);
+	if (page != compound_head(page))
+		return 0;
 
 	md = walk->private;
 	gather_stats(page, md, pte_dirty(huge_pte), 1);
diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index ac7b38ad5903..0d21e25df37f 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -3,6 +3,7 @@
 #define _LINUX_PAGEWALK_H
 
 #include <linux/mm.h>
+#include <linux/hugetlb.h>
 
 struct mm_walk;
 
@@ -47,7 +48,7 @@ struct mm_walk_ops {
 			 unsigned long next, struct mm_walk *walk);
 	int (*pte_hole)(unsigned long addr, unsigned long next,
 			int depth, struct mm_walk *walk);
-	int (*hugetlb_entry)(pte_t *pte, unsigned long hmask,
+	int (*hugetlb_entry)(struct hugetlb_pte *hpte,
 			     unsigned long addr, unsigned long next,
 			     struct mm_walk *walk);
 	int (*test_walk)(unsigned long addr, unsigned long next,
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index 59e1653799f8..ce50b937dcf2 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -324,14 +324,15 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned long addr,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
+static void damon_hugetlb_mkold(struct hugetlb_pte *hpte, struct mm_struct *mm,
 				struct vm_area_struct *vma, unsigned long addr)
 {
 	bool referenced = false;
 	pte_t entry = huge_ptep_get(pte);
 	struct page *page = pte_page(entry);
+	struct page *hpage = compound_head(page);
 
-	get_page(page);
+	get_page(hpage);
 
 	if (pte_young(entry)) {
 		referenced = true;
@@ -342,18 +343,18 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
 
 #ifdef CONFIG_MMU_NOTIFIER
 	if (mmu_notifier_clear_young(mm, addr,
-				     addr + huge_page_size(hstate_vma(vma))))
+				     addr + hugetlb_pte_size(hpte));
 		referenced = true;
 #endif /* CONFIG_MMU_NOTIFIER */
 
 	if (referenced)
-		set_page_young(page);
+		set_page_young(hpage);
 
-	set_page_idle(page);
-	put_page(page);
+	set_page_idle(hpage);
+	put_page(hpage);
 }
 
-static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask,
+static int damon_mkold_hugetlb_entry(struct hugetlb_pte *hpte,
 				     unsigned long addr, unsigned long end,
 				     struct mm_walk *walk)
 {
@@ -361,12 +362,12 @@ static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask,
 	spinlock_t *ptl;
 	pte_t entry;
 
-	ptl = huge_pte_lock(h, walk->mm, pte);
-	entry = huge_ptep_get(pte);
+	ptl = huge_pte_lock_shift(hpte->shift, walk->mm, hpte->ptep);
+	entry = huge_ptep_get(hpte->ptep);
 	if (!pte_present(entry))
 		goto out;
 
-	damon_hugetlb_mkold(pte, walk->mm, walk->vma, addr);
+	damon_hugetlb_mkold(hpte, walk->mm, walk->vma, addr);
 
 out:
 	spin_unlock(ptl);
@@ -474,31 +475,32 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned long addr,
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask,
+static int damon_young_hugetlb_entry(struct hugetlb_pte *hpte,
 				     unsigned long addr, unsigned long end,
 				     struct mm_walk *walk)
 {
 	struct damon_young_walk_private *priv = walk->private;
 	struct hstate *h = hstate_vma(walk->vma);
-	struct page *page;
+	struct page *page, *hpage;
 	spinlock_t *ptl;
 	pte_t entry;
 
-	ptl = huge_pte_lock(h, walk->mm, pte);
+	ptl = huge_pte_lock_shift(hpte->shift, walk->mm, hpte->ptep);
 	entry = huge_ptep_get(pte);
 	if (!pte_present(entry))
 		goto out;
 
 	page = pte_page(entry);
-	get_page(page);
+	hpage = compound_head(page);
+	get_page(hpage);
 
-	if (pte_young(entry) || !page_is_idle(page) ||
+	if (pte_young(entry) || !page_is_idle(hpage) ||
 	    mmu_notifier_test_young(walk->mm, addr)) {
 		*priv->page_sz = huge_page_size(h);
 		priv->young = true;
 	}
 
-	put_page(page);
+	put_page(hpage);
 
 out:
 	spin_unlock(ptl);
diff --git a/mm/hmm.c b/mm/hmm.c
index 3fd3242c5e50..1ad5d76fa8be 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -472,7 +472,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 #endif
 
 #ifdef CONFIG_HUGETLB_PAGE
-static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
+static int hmm_vma_walk_hugetlb_entry(struct hugetlb_pte *hpte,
 				      unsigned long start, unsigned long end,
 				      struct mm_walk *walk)
 {
@@ -483,11 +483,12 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 	unsigned int required_fault;
 	unsigned long pfn_req_flags;
 	unsigned long cpu_flags;
+	unsigned long hmask = hugetlb_pte_mask(hpte);
 	spinlock_t *ptl;
 	pte_t entry;
 
-	ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
-	entry = huge_ptep_get(pte);
+	ptl = huge_pte_lock_shift(hpte->shift, walk->mm, hpte->ptep);
+	entry = huge_ptep_get(hpte->ptep);
 
 	i = (start - range->start) >> PAGE_SHIFT;
 	pfn_req_flags = range->hmm_pfns[i];
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d39b01fd52fe..a1d82db7c19f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -559,7 +559,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
 	return addr != end ? -EIO : 0;
 }
 
-static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
+static int queue_pages_hugetlb(struct hugetlb_pte *hpte,
 			       unsigned long addr, unsigned long end,
 			       struct mm_walk *walk)
 {
@@ -571,8 +571,13 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
 	spinlock_t *ptl;
 	pte_t entry;
 
-	ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte);
-	entry = huge_ptep_get(pte);
+	/* We don't migrate high-granularity HugeTLB mappings for now. */
+	if (hugetlb_pte_size(hpte) !=
+			huge_page_size(hstate_vma(walk->vma)))
+		return -EINVAL;
+
+	ptl = hugetlb_pte_lock(walk->mm, hpte);
+	entry = hugetlb_ptep_get(hpte);
 	if (!pte_present(entry))
 		goto unlock;
 	page = pte_page(entry);
diff --git a/mm/mincore.c b/mm/mincore.c
index fa200c14185f..dc1717dc6a2c 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -22,7 +22,7 @@
 #include <linux/uaccess.h>
 #include "swap.h"
 
-static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
+static int mincore_hugetlb(struct hugetlb_pte *hpte, unsigned long addr,
 			unsigned long end, struct mm_walk *walk)
 {
 #ifdef CONFIG_HUGETLB_PAGE
@@ -33,7 +33,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
 	 * Hugepages under user process are always in RAM and never
 	 * swapped out, but theoretically it needs to be checked.
 	 */
-	present = pte && !huge_pte_none(huge_ptep_get(pte));
+	present = hpte->ptep && !hugetlb_pte_none(hpte);
 	for (; addr != end; vec++, addr += PAGE_SIZE)
 		*vec = present;
 	walk->private = vec;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ba5592655ee3..9c5a35a1c0eb 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -476,12 +476,12 @@ static int prot_none_pte_entry(pte_t *pte, unsigned long addr,
 		0 : -EACCES;
 }
 
-static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask,
+static int prot_none_hugetlb_entry(struct hugetlb_pte *hpte,
 				   unsigned long addr, unsigned long next,
 				   struct mm_walk *walk)
 {
-	return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ?
-		0 : -EACCES;
+	return pfn_modify_allowed(pte_pfn(*hpte->ptep),
+			*(pgprot_t *)(walk->private)) ? 0 : -EACCES;
 }
 
 static int prot_none_test(unsigned long addr, unsigned long next,
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 9b3db11a4d1d..f8e24a0a0179 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -3,6 +3,7 @@
 #include <linux/highmem.h>
 #include <linux/sched.h>
 #include <linux/hugetlb.h>
+#include <linux/minmax.h>
 
 /*
  * We want to know the real level where a entry is located ignoring any
@@ -301,13 +302,26 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 	pte_t *pte;
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
+	struct hugetlb_pte hpte;
 
 	do {
-		next = hugetlb_entry_end(h, addr, end);
 		pte = huge_pte_offset(walk->mm, addr & hmask, sz);
+		if (!pte) {
+			next = hugetlb_entry_end(h, addr, end);
+		} else {
+			hugetlb_pte_populate(&hpte, pte, huge_page_shift(h));
+			if (hugetlb_hgm_enabled(vma)) {
+				err = hugetlb_walk_to(walk->mm, &hpte, addr,
+						      PAGE_SIZE,
+						      /*stop_at_none=*/true);
+				if (err)
+					break;
+			}
+			next = min(addr + hugetlb_pte_size(&hpte), end);
+		}
 
 		if (pte)
-			err = ops->hugetlb_entry(pte, hmask, addr, next, walk);
+			err = ops->hugetlb_entry(&hpte, addr, next, walk);
 		else if (ops->pte_hole)
 			err = ops->pte_hole(addr, next, -1, walk);
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (17 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-07-11 23:41   ` Mike Kravetz
  2022-06-24 17:36 ` [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE James Houghton
                   ` (9 subsequent siblings)
  28 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This allows fork() to work with high-granularity mappings. The page
table structure is copied such that partially mapped regions will remain
partially mapped in the same way for the new process.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 mm/hugetlb.c | 74 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 59 insertions(+), 15 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index aadfcee947cf..0ec2f231524e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4851,7 +4851,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			    struct vm_area_struct *src_vma)
 {
 	pte_t *src_pte, *dst_pte, entry, dst_entry;
-	struct page *ptepage;
+	struct hugetlb_pte src_hpte, dst_hpte;
+	struct page *ptepage, *hpage;
 	unsigned long addr;
 	bool cow = is_cow_mapping(src_vma->vm_flags);
 	struct hstate *h = hstate_vma(src_vma);
@@ -4878,17 +4879,44 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		i_mmap_lock_read(mapping);
 	}
 
-	for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) {
+	addr = src_vma->vm_start;
+	while (addr < src_vma->vm_end) {
 		spinlock_t *src_ptl, *dst_ptl;
+		unsigned long hpte_sz;
 		src_pte = huge_pte_offset(src, addr, sz);
-		if (!src_pte)
+		if (!src_pte) {
+			addr += sz;
 			continue;
+		}
 		dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz);
 		if (!dst_pte) {
 			ret = -ENOMEM;
 			break;
 		}
 
+		hugetlb_pte_populate(&src_hpte, src_pte, huge_page_shift(h));
+		hugetlb_pte_populate(&dst_hpte, dst_pte, huge_page_shift(h));
+
+		if (hugetlb_hgm_enabled(src_vma)) {
+			BUG_ON(!hugetlb_hgm_enabled(dst_vma));
+			ret = hugetlb_walk_to(src, &src_hpte, addr,
+					      PAGE_SIZE, /*stop_at_none=*/true);
+			if (ret)
+				break;
+			ret = huge_pte_alloc_high_granularity(
+					&dst_hpte, dst, dst_vma, addr,
+					hugetlb_pte_shift(&src_hpte),
+					HUGETLB_SPLIT_NONE,
+					/*write_locked=*/false);
+			if (ret)
+				break;
+
+			src_pte = src_hpte.ptep;
+			dst_pte = dst_hpte.ptep;
+		}
+
+		hpte_sz = hugetlb_pte_size(&src_hpte);
+
 		/*
 		 * If the pagetables are shared don't copy or take references.
 		 * dst_pte == src_pte is the common case of src/dest sharing.
@@ -4899,16 +4927,19 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		 * after taking the lock below.
 		 */
 		dst_entry = huge_ptep_get(dst_pte);
-		if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
+		if ((dst_pte == src_pte) || !hugetlb_pte_none(&dst_hpte)) {
+			addr += hugetlb_pte_size(&src_hpte);
 			continue;
+		}
 
-		dst_ptl = huge_pte_lock(h, dst, dst_pte);
-		src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte);
+		dst_ptl = hugetlb_pte_lock(dst, &dst_hpte);
+		src_ptl = hugetlb_pte_lockptr(src, &src_hpte);
 		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
 		entry = huge_ptep_get(src_pte);
 		dst_entry = huge_ptep_get(dst_pte);
 again:
-		if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) {
+		if (hugetlb_pte_none(&src_hpte) ||
+		    !hugetlb_pte_none(&dst_hpte)) {
 			/*
 			 * Skip if src entry none.  Also, skip in the
 			 * unlikely case dst entry !none as this implies
@@ -4931,11 +4962,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 				if (userfaultfd_wp(src_vma) && uffd_wp)
 					entry = huge_pte_mkuffd_wp(entry);
 				set_huge_swap_pte_at(src, addr, src_pte,
-						     entry, sz);
+						     entry, hpte_sz);
 			}
 			if (!userfaultfd_wp(dst_vma) && uffd_wp)
 				entry = huge_pte_clear_uffd_wp(entry);
-			set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz);
+			set_huge_swap_pte_at(dst, addr, dst_pte, entry,
+					     hpte_sz);
 		} else if (unlikely(is_pte_marker(entry))) {
 			/*
 			 * We copy the pte marker only if the dst vma has
@@ -4946,7 +4978,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		} else {
 			entry = huge_ptep_get(src_pte);
 			ptepage = pte_page(entry);
-			get_page(ptepage);
+			hpage = compound_head(ptepage);
+			get_page(hpage);
 
 			/*
 			 * Failing to duplicate the anon rmap is a rare case
@@ -4959,9 +4992,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			 * sleep during the process.
 			 */
 			if (!PageAnon(ptepage)) {
-				page_dup_file_rmap(ptepage, true);
+				/* Only dup_rmap once for a page */
+				if (IS_ALIGNED(addr, sz))
+					page_dup_file_rmap(hpage, true);
 			} else if (page_try_dup_anon_rmap(ptepage, true,
 							  src_vma)) {
+				if (hugetlb_hgm_enabled(src_vma)) {
+					ret = -EINVAL;
+					break;
+				}
+				BUG_ON(!IS_ALIGNED(addr, hugetlb_pte_size(&src_hpte)));
 				pte_t src_pte_old = entry;
 				struct page *new;
 
@@ -4970,13 +5010,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 				/* Do not use reserve as it's private owned */
 				new = alloc_huge_page(dst_vma, addr, 1);
 				if (IS_ERR(new)) {
-					put_page(ptepage);
+					put_page(hpage);
 					ret = PTR_ERR(new);
 					break;
 				}
-				copy_user_huge_page(new, ptepage, addr, dst_vma,
+				copy_user_huge_page(new, hpage, addr, dst_vma,
 						    npages);
-				put_page(ptepage);
+				put_page(hpage);
 
 				/* Install the new huge page if src pte stable */
 				dst_ptl = huge_pte_lock(h, dst, dst_pte);
@@ -4994,6 +5034,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 				hugetlb_install_page(dst_vma, dst_pte, addr, new);
 				spin_unlock(src_ptl);
 				spin_unlock(dst_ptl);
+				addr += hugetlb_pte_size(&src_hpte);
 				continue;
 			}
 
@@ -5010,10 +5051,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			}
 
 			set_huge_pte_at(dst, addr, dst_pte, entry);
-			hugetlb_count_add(npages, dst);
+			hugetlb_count_add(
+					hugetlb_pte_size(&dst_hpte) / PAGE_SIZE,
+					dst);
 		}
 		spin_unlock(src_ptl);
 		spin_unlock(dst_ptl);
+		addr += hugetlb_pte_size(&src_hpte);
 	}
 
 	if (cow) {
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (18 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-07-15 16:21   ` Peter Xu
  2022-06-24 17:36 ` [RFC PATCH 21/26] hugetlb: add hugetlb_collapse James Houghton
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

The changes here are very similar to the changes made to
hugetlb_no_page, where we do a high-granularity page table walk and
do accounting slightly differently because we are mapping only a piece
of a page.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 fs/userfaultfd.c        |  3 +++
 include/linux/hugetlb.h |  6 +++--
 mm/hugetlb.c            | 54 +++++++++++++++++++++-----------------
 mm/userfaultfd.c        | 57 +++++++++++++++++++++++++++++++----------
 4 files changed, 82 insertions(+), 38 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index e943370107d0..77c1b8a7d0b9 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -245,6 +245,9 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 	if (!ptep)
 		goto out;
 
+	if (hugetlb_hgm_enabled(vma))
+		goto out;
+
 	ret = false;
 	pte = huge_ptep_get(ptep);
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ac4ac8fbd901..c207b1ac6195 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -221,13 +221,15 @@ unsigned long hugetlb_total_pages(void);
 vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, unsigned int flags);
 #ifdef CONFIG_USERFAULTFD
-int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
+int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
+				struct hugetlb_pte *dst_hpte,
 				struct vm_area_struct *dst_vma,
 				unsigned long dst_addr,
 				unsigned long src_addr,
 				enum mcopy_atomic_mode mode,
 				struct page **pagep,
-				bool wp_copy);
+				bool wp_copy,
+				bool new_mapping);
 #endif /* CONFIG_USERFAULTFD */
 bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
 						struct vm_area_struct *vma,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0ec2f231524e..09fa57599233 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5808,6 +5808,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 		vma_end_reservation(h, vma, haddr);
 	}
 
+	/* This lock will get pretty expensive at 4K. */
 	ptl = hugetlb_pte_lock(mm, hpte);
 	ret = 0;
 	/* If pte changed from under us, retry */
@@ -6098,24 +6099,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
  * modifications for huge pages.
  */
 int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
-			    pte_t *dst_pte,
+			    struct hugetlb_pte *dst_hpte,
 			    struct vm_area_struct *dst_vma,
 			    unsigned long dst_addr,
 			    unsigned long src_addr,
 			    enum mcopy_atomic_mode mode,
 			    struct page **pagep,
-			    bool wp_copy)
+			    bool wp_copy,
+			    bool new_mapping)
 {
 	bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
 	struct hstate *h = hstate_vma(dst_vma);
 	struct address_space *mapping = dst_vma->vm_file->f_mapping;
+	unsigned long haddr = dst_addr & huge_page_mask(h);
 	pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr);
 	unsigned long size;
 	int vm_shared = dst_vma->vm_flags & VM_SHARED;
 	pte_t _dst_pte;
 	spinlock_t *ptl;
 	int ret = -ENOMEM;
-	struct page *page;
+	struct page *page, *subpage;
 	int writable;
 	bool page_in_pagecache = false;
 
@@ -6130,12 +6133,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 		 * a non-missing case. Return -EEXIST.
 		 */
 		if (vm_shared &&
-		    hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) {
+		    hugetlbfs_pagecache_present(h, dst_vma, haddr)) {
 			ret = -EEXIST;
 			goto out;
 		}
 
-		page = alloc_huge_page(dst_vma, dst_addr, 0);
+		page = alloc_huge_page(dst_vma, haddr, 0);
 		if (IS_ERR(page)) {
 			ret = -ENOMEM;
 			goto out;
@@ -6151,13 +6154,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 			/* Free the allocated page which may have
 			 * consumed a reservation.
 			 */
-			restore_reserve_on_error(h, dst_vma, dst_addr, page);
+			restore_reserve_on_error(h, dst_vma, haddr, page);
 			put_page(page);
 
 			/* Allocate a temporary page to hold the copied
 			 * contents.
 			 */
-			page = alloc_huge_page_vma(h, dst_vma, dst_addr);
+			page = alloc_huge_page_vma(h, dst_vma, haddr);
 			if (!page) {
 				ret = -ENOMEM;
 				goto out;
@@ -6171,14 +6174,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 		}
 	} else {
 		if (vm_shared &&
-		    hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) {
+		    hugetlbfs_pagecache_present(h, dst_vma, haddr)) {
 			put_page(*pagep);
 			ret = -EEXIST;
 			*pagep = NULL;
 			goto out;
 		}
 
-		page = alloc_huge_page(dst_vma, dst_addr, 0);
+		page = alloc_huge_page(dst_vma, haddr, 0);
 		if (IS_ERR(page)) {
 			ret = -ENOMEM;
 			*pagep = NULL;
@@ -6216,8 +6219,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 		page_in_pagecache = true;
 	}
 
-	ptl = huge_pte_lockptr(huge_page_shift(h), dst_mm, dst_pte);
-	spin_lock(ptl);
+	ptl = hugetlb_pte_lock(dst_mm, dst_hpte);
 
 	/*
 	 * Recheck the i_size after holding PT lock to make sure not
@@ -6239,14 +6241,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	 * registered, we firstly wr-protect a none pte which has no page cache
 	 * page backing it, then access the page.
 	 */
-	if (!huge_pte_none_mostly(huge_ptep_get(dst_pte)))
+	if (!hugetlb_pte_none_mostly(dst_hpte))
 		goto out_release_unlock;
 
-	if (vm_shared) {
-		page_dup_file_rmap(page, true);
-	} else {
-		ClearHPageRestoreReserve(page);
-		hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
+	if (new_mapping) {
+		if (vm_shared) {
+			page_dup_file_rmap(page, true);
+		} else {
+			ClearHPageRestoreReserve(page);
+			hugepage_add_new_anon_rmap(page, dst_vma, haddr);
+		}
 	}
 
 	/*
@@ -6258,7 +6262,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	else
 		writable = dst_vma->vm_flags & VM_WRITE;
 
-	_dst_pte = make_huge_pte(dst_vma, page, writable);
+	subpage = hugetlb_find_subpage(h, page, dst_addr);
+	if (subpage != page)
+		BUG_ON(!hugetlb_hgm_enabled(dst_vma));
+
+	_dst_pte = make_huge_pte(dst_vma, subpage, writable);
 	/*
 	 * Always mark UFFDIO_COPY page dirty; note that this may not be
 	 * extremely important for hugetlbfs for now since swapping is not
@@ -6271,14 +6279,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	if (wp_copy)
 		_dst_pte = huge_pte_mkuffd_wp(_dst_pte);
 
-	set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+	set_huge_pte_at(dst_mm, dst_addr, dst_hpte->ptep, _dst_pte);
 
-	(void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte,
-					dst_vma->vm_flags & VM_WRITE);
-	hugetlb_count_add(pages_per_huge_page(h), dst_mm);
+	(void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_hpte->ptep,
+			_dst_pte, dst_vma->vm_flags & VM_WRITE);
+	hugetlb_count_add(hugetlb_pte_size(dst_hpte) / PAGE_SIZE, dst_mm);
 
 	/* No need to invalidate - it was non-present before */
-	update_mmu_cache(dst_vma, dst_addr, dst_pte);
+	update_mmu_cache(dst_vma, dst_addr, dst_hpte->ptep);
 
 	spin_unlock(ptl);
 	if (!is_continue)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 4f4892a5f767..ee40d98068bf 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -310,14 +310,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 {
 	int vm_shared = dst_vma->vm_flags & VM_SHARED;
 	ssize_t err;
-	pte_t *dst_pte;
 	unsigned long src_addr, dst_addr;
 	long copied;
 	struct page *page;
-	unsigned long vma_hpagesize;
+	unsigned long vma_hpagesize, vma_altpagesize;
 	pgoff_t idx;
 	u32 hash;
 	struct address_space *mapping;
+	bool use_hgm = hugetlb_hgm_enabled(dst_vma) &&
+		mode == MCOPY_ATOMIC_CONTINUE;
+	struct hstate *h = hstate_vma(dst_vma);
 
 	/*
 	 * There is no default zero huge page for all huge page sizes as
@@ -335,12 +337,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 	copied = 0;
 	page = NULL;
 	vma_hpagesize = vma_kernel_pagesize(dst_vma);
+	if (use_hgm)
+		vma_altpagesize = PAGE_SIZE;
+	else
+		vma_altpagesize = vma_hpagesize;
 
 	/*
 	 * Validate alignment based on huge page size
 	 */
 	err = -EINVAL;
-	if (dst_start & (vma_hpagesize - 1) || len & (vma_hpagesize - 1))
+	if (dst_start & (vma_altpagesize - 1) || len & (vma_altpagesize - 1))
 		goto out_unlock;
 
 retry:
@@ -361,6 +367,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		vm_shared = dst_vma->vm_flags & VM_SHARED;
 	}
 
+	BUG_ON(!vm_shared && use_hgm);
+
 	/*
 	 * If not shared, ensure the dst_vma has a anon_vma.
 	 */
@@ -371,11 +379,13 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 	}
 
 	while (src_addr < src_start + len) {
+		struct hugetlb_pte hpte;
+		bool new_mapping;
 		BUG_ON(dst_addr >= dst_start + len);
 
 		/*
 		 * Serialize via i_mmap_rwsem and hugetlb_fault_mutex.
-		 * i_mmap_rwsem ensures the dst_pte remains valid even
+		 * i_mmap_rwsem ensures the hpte.ptep remains valid even
 		 * in the case of shared pmds.  fault mutex prevents
 		 * races with other faulting threads.
 		 */
@@ -383,27 +393,47 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		i_mmap_lock_read(mapping);
 		idx = linear_page_index(dst_vma, dst_addr);
 		hash = hugetlb_fault_mutex_hash(mapping, idx);
+		/* This lock will get expensive at 4K. */
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
 
-		err = -ENOMEM;
-		dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize);
-		if (!dst_pte) {
+		err = 0;
+
+		pte_t *ptep = huge_pte_alloc(dst_mm, dst_vma, dst_addr,
+					     vma_hpagesize);
+		if (!ptep)
+			err = -ENOMEM;
+		else {
+			hugetlb_pte_populate(&hpte, ptep,
+					huge_page_shift(h));
+			/*
+			 * If the hstate-level PTE is not none, then a mapping
+			 * was previously established.
+			 * The per-hpage mutex prevents double-counting.
+			 */
+			new_mapping = hugetlb_pte_none(&hpte);
+			if (use_hgm)
+				err = hugetlb_alloc_largest_pte(&hpte, dst_mm, dst_vma,
+								dst_addr,
+								dst_start + len);
+		}
+
+		if (err) {
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			i_mmap_unlock_read(mapping);
 			goto out_unlock;
 		}
 
 		if (mode != MCOPY_ATOMIC_CONTINUE &&
-		    !huge_pte_none_mostly(huge_ptep_get(dst_pte))) {
+		    !hugetlb_pte_none_mostly(&hpte)) {
 			err = -EEXIST;
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			i_mmap_unlock_read(mapping);
 			goto out_unlock;
 		}
 
-		err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma,
+		err = hugetlb_mcopy_atomic_pte(dst_mm, &hpte, dst_vma,
 					       dst_addr, src_addr, mode, &page,
-					       wp_copy);
+					       wp_copy, new_mapping);
 
 		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 		i_mmap_unlock_read(mapping);
@@ -413,6 +443,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		if (unlikely(err == -ENOENT)) {
 			mmap_read_unlock(dst_mm);
 			BUG_ON(!page);
+			BUG_ON(hpte.shift != huge_page_shift(h));
 
 			err = copy_huge_page_from_user(page,
 						(const void __user *)src_addr,
@@ -430,9 +461,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 			BUG_ON(page);
 
 		if (!err) {
-			dst_addr += vma_hpagesize;
-			src_addr += vma_hpagesize;
-			copied += vma_hpagesize;
+			dst_addr += hugetlb_pte_size(&hpte);
+			src_addr += hugetlb_pte_size(&hpte);
+			copied += hugetlb_pte_size(&hpte);
 
 			if (fatal_signal_pending(current))
 				err = -EINTR;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 21/26] hugetlb: add hugetlb_collapse
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (19 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 17:36 ` [RFC PATCH 22/26] madvise: add uapi for HugeTLB HGM collapse: MADV_COLLAPSE James Houghton
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This is what implements MADV_COLLAPSE for HugeTLB pages. This is a
necessary extension to the UFFDIO_CONTINUE changes. When userspace
finishes mapping an entire hugepage with UFFDIO_CONTINUE, the kernel has
no mechanism to automatically collapse the page table to map the whole
hugepage normally. We require userspace to inform us that they would
like the hugepages to be collapsed; they do this with MADV_COLLAPSE.

If userspace has not mapped all of a hugepage with UFFDIO_CONTINUE, but
only some, hugetlb_collapse will cause the requested range to be mapped
as if it were UFFDIO_CONTINUE'd already.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/hugetlb.h |  7 ++++
 mm/hugetlb.c            | 88 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c207b1ac6195..438057dc3b75 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1197,6 +1197,8 @@ int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
 				    unsigned int desired_sz,
 				    enum split_mode mode,
 				    bool write_locked);
+int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma,
+		     unsigned long start, unsigned long end);
 #else
 static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
 {
@@ -1221,6 +1223,11 @@ static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
 {
 	return -EINVAL;
 }
+int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma,
+		     unsigned long start, unsigned long end)
+{
+	return -EINVAL;
+}
 #endif
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 09fa57599233..70bb3a1342d9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7280,6 +7280,94 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
 	return -EINVAL;
 }
 
+/*
+ * Collapse the address range from @start to @end to be mapped optimally.
+ *
+ * This is only valid for shared mappings. The main use case for this function
+ * is following UFFDIO_CONTINUE. If a user UFFDIO_CONTINUEs an entire hugepage
+ * by calling UFFDIO_CONTINUE once for each 4K region, the kernel doesn't know
+ * to collapse the mapping after the final UFFDIO_CONTINUE. Instead, we leave
+ * it up to userspace to tell us to do so, via MADV_COLLAPSE.
+ *
+ * Any holes in the mapping will be filled. If there is no page in the
+ * pagecache for a region we're collapsing, the PTEs will be cleared.
+ */
+int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma,
+			    unsigned long start, unsigned long end)
+{
+	struct hstate *h = hstate_vma(vma);
+	struct address_space *mapping = vma->vm_file->f_mapping;
+	struct mmu_notifier_range range;
+	struct mmu_gather tlb;
+	struct hstate *tmp_h;
+	unsigned int shift;
+	unsigned long curr = start;
+	int ret = 0;
+	struct page *hpage, *subpage;
+	pgoff_t idx;
+	bool writable = vma->vm_flags & VM_WRITE;
+	bool shared = vma->vm_flags & VM_SHARED;
+	pte_t entry;
+
+	/*
+	 * This is only supported for shared VMAs, because we need to look up
+	 * the page to use for any PTEs we end up creating.
+	 */
+	if (!shared)
+		return -EINVAL;
+
+	i_mmap_assert_write_locked(mapping);
+
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
+				start, end);
+	mmu_notifier_invalidate_range_start(&range);
+	tlb_gather_mmu(&tlb, mm);
+
+	while (curr < end) {
+		for_each_hgm_shift(h, tmp_h, shift) {
+			unsigned long sz = 1UL << shift;
+			struct hugetlb_pte hpte;
+
+			if (!IS_ALIGNED(curr, sz) || curr + sz > end)
+				continue;
+
+			hugetlb_pte_init(&hpte);
+			ret = hugetlb_walk_to(mm, &hpte, curr, sz,
+					      /*stop_at_none=*/false);
+			if (ret)
+				goto out;
+			if (hugetlb_pte_size(&hpte) >= sz)
+				goto hpte_finished;
+
+			idx = vma_hugecache_offset(h, vma, curr);
+			hpage = find_lock_page(mapping, idx);
+			hugetlb_free_range(&tlb, &hpte, curr,
+					   curr + hugetlb_pte_size(&hpte));
+			if (!hpage) {
+				hugetlb_pte_clear(mm, &hpte, curr);
+				goto hpte_finished;
+			}
+
+			subpage = hugetlb_find_subpage(h, hpage, curr);
+			entry = make_huge_pte_with_shift(vma, subpage,
+							 writable, shift);
+			set_huge_pte_at(mm, curr, hpte.ptep, entry);
+			unlock_page(hpage);
+hpte_finished:
+			curr += hugetlb_pte_size(&hpte);
+			goto next;
+		}
+		ret = -EINVAL;
+		goto out;
+next:
+		continue;
+	}
+out:
+	tlb_finish_mmu(&tlb);
+	mmu_notifier_invalidate_range_end(&range);
+	return ret;
+}
+
 /*
  * Given a particular address, split the HugeTLB PTE that currently maps it
  * so that, for the given address, the PTE that maps it is `desired_shift`.
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 22/26] madvise: add uapi for HugeTLB HGM collapse: MADV_COLLAPSE
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (20 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 21/26] hugetlb: add hugetlb_collapse James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 17:36 ` [RFC PATCH 23/26] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This commit is co-opting the same madvise mode that is being introduced
by zokeefe@google.com to manually collapse THPs[1].

As with the rest of the high-granularity mapping support, MADV_COLLAPSE
is only supported for shared VMAs right now.

[1] https://lore.kernel.org/linux-mm/20220604004004.954674-10-zokeefe@google.com/

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/uapi/asm-generic/mman-common.h |  2 ++
 mm/madvise.c                           | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 6c1aa92a92e4..b686920ca731 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -77,6 +77,8 @@
 
 #define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
 
+#define MADV_COLLAPSE	25		/* collapse an address range into hugepages */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/mm/madvise.c b/mm/madvise.c
index d7b4f2602949..c624c0f02276 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -59,6 +59,7 @@ static int madvise_need_mmap_write(int behavior)
 	case MADV_FREE:
 	case MADV_POPULATE_READ:
 	case MADV_POPULATE_WRITE:
+	case MADV_COLLAPSE:
 		return 0;
 	default:
 		/* be safe, default to 1. list exceptions explicitly */
@@ -981,6 +982,20 @@ static long madvise_remove(struct vm_area_struct *vma,
 	return error;
 }
 
+static int madvise_collapse(struct vm_area_struct *vma,
+			    struct vm_area_struct **prev,
+			    unsigned long start, unsigned long end)
+{
+	bool shared = vma->vm_flags & VM_SHARED;
+	*prev = vma;
+
+	/* Only allow collapsing for HGM-enabled, shared mappings. */
+	if (!is_vm_hugetlb_page(vma) || !hugetlb_hgm_enabled(vma) || !shared)
+		return -EINVAL;
+
+	return hugetlb_collapse(vma->vm_mm, vma, start, end);
+}
+
 /*
  * Apply an madvise behavior to a region of a vma.  madvise_update_vma
  * will handle splitting a vm area into separate areas, each area with its own
@@ -1011,6 +1026,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
 	case MADV_POPULATE_READ:
 	case MADV_POPULATE_WRITE:
 		return madvise_populate(vma, prev, start, end, behavior);
+	case MADV_COLLAPSE:
+		return madvise_collapse(vma, prev, start, end);
 	case MADV_NORMAL:
 		new_flags = new_flags & ~VM_RAND_READ & ~VM_SEQ_READ;
 		break;
@@ -1158,6 +1175,9 @@ madvise_behavior_valid(int behavior)
 #ifdef CONFIG_MEMORY_FAILURE
 	case MADV_SOFT_OFFLINE:
 	case MADV_HWPOISON:
+#endif
+#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
+	case MADV_COLLAPSE:
 #endif
 		return true;
 
@@ -1351,6 +1371,9 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
  *		triggering read faults if required
  *  MADV_POPULATE_WRITE - populate (prefault) page tables writable by
  *		triggering write faults if required
+ *  MADV_COLLAPSE - collapse a high-granularity HugeTLB mapping into huge
+ *		mappings. This is useful after an entire hugepage has been
+ *		mapped with individual small UFFDIO_CONTINUE operations.
  *
  * return values:
  *  zero    - success
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 23/26] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (21 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 22/26] madvise: add uapi for HugeTLB HGM collapse: MADV_COLLAPSE James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 17:36 ` [RFC PATCH 24/26] arm64/hugetlb: add support for high-granularity mappings James Houghton
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This is so that userspace is aware that their kernel was compiled with
HugeTLB high-granularity mapping and that UFFDIO_CONTINUE down to
PAGE_SIZE-aligned chunks are valid.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 fs/userfaultfd.c                 | 7 ++++++-
 include/uapi/linux/userfaultfd.h | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 77c1b8a7d0b9..59bfdb7a67e0 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1935,10 +1935,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
 		goto err_out;
 	/* report all available features and ioctls to userland */
 	uffdio_api.features = UFFD_API_FEATURES;
+
 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
 	uffdio_api.features &=
 		~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM);
-#endif
+#ifndef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
+	uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS_HGM;
+#endif  /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
+#endif  /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
+
 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
 	uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP;
 #endif
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 7d32b1e797fb..50fbcb0bcba0 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -32,6 +32,7 @@
 			   UFFD_FEATURE_SIGBUS |		\
 			   UFFD_FEATURE_THREAD_ID |		\
 			   UFFD_FEATURE_MINOR_HUGETLBFS |	\
+			   UFFD_FEATURE_MINOR_HUGETLBFS_HGM |	\
 			   UFFD_FEATURE_MINOR_SHMEM |		\
 			   UFFD_FEATURE_EXACT_ADDRESS |		\
 			   UFFD_FEATURE_WP_HUGETLBFS_SHMEM)
@@ -213,6 +214,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_MINOR_SHMEM		(1<<10)
 #define UFFD_FEATURE_EXACT_ADDRESS		(1<<11)
 #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM		(1<<12)
+#define UFFD_FEATURE_MINOR_HUGETLBFS_HGM	(1<<13)
 	__u64 features;
 
 	__u64 ioctls;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 24/26] arm64/hugetlb: add support for high-granularity mappings
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (22 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 23/26] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 17:36 ` [RFC PATCH 25/26] selftests: add HugeTLB HGM to userfaultfd selftest James Houghton
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This is included in this RFC to demonstrate how an architecture that
doesn't use ARCH_WANT_GENERAL_HUGETLB can be updated to support HugeTLB
high-granularity mappings: an architecture just needs to implement
hugetlb_walk_to.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 arch/arm64/Kconfig          |  1 +
 arch/arm64/mm/hugetlbpage.c | 63 +++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1652a9800ebe..74108713a99a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -99,6 +99,7 @@ config ARM64
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
 	select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
+	select ARCH_HAS_SPECIAL_HUGETLB_HGM
 	select ARCH_WANT_LD_ORPHAN_WARN
 	select ARCH_WANTS_NO_INSTR
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index e2a5ec9fdc0d..1901818bed9d 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -281,6 +281,69 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 		set_pte(ptep, pte);
 }
 
+int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
+		    unsigned long addr, unsigned long sz, bool stop_at_none)
+{
+	pgd_t *pgdp;
+	p4d_t *p4dp;
+	pte_t *ptep;
+
+	if (!hpte->ptep) {
+		pgdp = pgd_offset(mm, addr);
+		p4dp = p4d_offset(pgdp, addr);
+		if (!p4dp)
+			return -ENOMEM;
+		hugetlb_pte_populate(hpte, (pte_t *)p4dp, P4D_SHIFT);
+	}
+
+	while (hugetlb_pte_size(hpte) > sz &&
+			!hugetlb_pte_present_leaf(hpte) &&
+			!(stop_at_none && hugetlb_pte_none(hpte))) {
+		if (hpte->shift == PMD_SHIFT) {
+			unsigned long rounded_addr = sz == CONT_PTE_SIZE
+						     ? addr & CONT_PTE_MASK
+						     : addr;
+
+			ptep = pte_offset_kernel((pmd_t *)hpte->ptep,
+						 rounded_addr);
+			if (!ptep)
+				return -ENOMEM;
+			if (sz == CONT_PTE_SIZE)
+				hpte->shift = CONT_PTE_SHIFT;
+			else
+				hpte->shift = pte_cont(*ptep) ? CONT_PTE_SHIFT
+							      : PAGE_SHIFT;
+			hpte->ptep = ptep;
+		} else if (hpte->shift == PUD_SHIFT) {
+			pud_t *pudp = (pud_t *)hpte->ptep;
+
+			ptep = (pte_t *)pmd_alloc(mm, pudp, addr);
+
+			if (!ptep)
+				return -ENOMEM;
+			if (sz == CONT_PMD_SIZE)
+				hpte->shift = CONT_PMD_SHIFT;
+			else
+				hpte->shift = pte_cont(*ptep) ? CONT_PMD_SHIFT
+							      : PMD_SHIFT;
+			hpte->ptep = ptep;
+		} else if (hpte->shift == P4D_SHIFT) {
+			ptep = (pte_t *)pud_alloc(mm, (p4d_t *)hpte->ptep, addr);
+			if (!ptep)
+				return -ENOMEM;
+			hpte->shift = PUD_SHIFT;
+			hpte->ptep = ptep;
+		} else
+			/*
+			 * This also catches the cases of CONT_PMD_SHIFT and
+			 * CONT_PTE_SHIFT. Those PTEs should always be leaves.
+			 */
+			BUG();
+	}
+
+	return 0;
+}
+
 pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long addr, unsigned long sz)
 {
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 25/26] selftests: add HugeTLB HGM to userfaultfd selftest
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (23 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 24/26] arm64/hugetlb: add support for high-granularity mappings James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 17:36 ` [RFC PATCH 26/26] selftests: add HugeTLB HGM to KVM demand paging selftest James Houghton
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

It behaves just like the regular shared HugeTLB configuration, except
that it uses 4K instead of hugepages.

This doesn't test collapsing yet. I'll add a test for that for v1.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 61 ++++++++++++++++++++----
 1 file changed, 51 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 0bdfc1955229..9cbb959519a6 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -64,7 +64,7 @@
 
 #ifdef __NR_userfaultfd
 
-static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size;
+static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size, hpage_size;
 
 #define BOUNCE_RANDOM		(1<<0)
 #define BOUNCE_RACINGFAULTS	(1<<1)
@@ -72,9 +72,10 @@ static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size;
 #define BOUNCE_POLL		(1<<3)
 static int bounces;
 
-#define TEST_ANON	1
-#define TEST_HUGETLB	2
-#define TEST_SHMEM	3
+#define TEST_ANON		1
+#define TEST_HUGETLB		2
+#define TEST_HUGETLB_HGM	3
+#define TEST_SHMEM		4
 static int test_type;
 
 /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */
@@ -85,6 +86,7 @@ static volatile bool test_uffdio_zeropage_eexist = true;
 static bool test_uffdio_wp = true;
 /* Whether to test uffd minor faults */
 static bool test_uffdio_minor = false;
+static bool test_uffdio_copy = true;
 
 static bool map_shared;
 static int shm_fd;
@@ -140,12 +142,17 @@ static void usage(void)
 	fprintf(stderr, "\nUsage: ./userfaultfd <test type> <MiB> <bounces> "
 		"[hugetlbfs_file]\n\n");
 	fprintf(stderr, "Supported <test type>: anon, hugetlb, "
-		"hugetlb_shared, shmem\n\n");
+		"hugetlb_shared, hugetlb_shared_hgm, shmem\n\n");
 	fprintf(stderr, "Examples:\n\n");
 	fprintf(stderr, "%s", examples);
 	exit(1);
 }
 
+static bool test_is_hugetlb(void)
+{
+	return test_type == TEST_HUGETLB || test_type == TEST_HUGETLB_HGM;
+}
+
 #define _err(fmt, ...)						\
 	do {							\
 		int ret = errno;				\
@@ -348,7 +355,7 @@ static struct uffd_test_ops *uffd_test_ops;
 
 static inline uint64_t uffd_minor_feature(void)
 {
-	if (test_type == TEST_HUGETLB && map_shared)
+	if (test_is_hugetlb() && map_shared)
 		return UFFD_FEATURE_MINOR_HUGETLBFS;
 	else if (test_type == TEST_SHMEM)
 		return UFFD_FEATURE_MINOR_SHMEM;
@@ -360,7 +367,7 @@ static uint64_t get_expected_ioctls(uint64_t mode)
 {
 	uint64_t ioctls = UFFD_API_RANGE_IOCTLS;
 
-	if (test_type == TEST_HUGETLB)
+	if (test_is_hugetlb())
 		ioctls &= ~(1 << _UFFDIO_ZEROPAGE);
 
 	if (!((mode & UFFDIO_REGISTER_MODE_WP) && test_uffdio_wp))
@@ -1116,6 +1123,12 @@ static int userfaultfd_events_test(void)
 	char c;
 	struct uffd_stats stats = { 0 };
 
+	if (!test_uffdio_copy) {
+		printf("Skipping userfaultfd events test "
+			"(test_uffdio_copy=false)\n");
+		return 0;
+	}
+
 	printf("testing events (fork, remap, remove): ");
 	fflush(stdout);
 
@@ -1169,6 +1182,12 @@ static int userfaultfd_sig_test(void)
 	char c;
 	struct uffd_stats stats = { 0 };
 
+	if (!test_uffdio_copy) {
+		printf("Skipping userfaultfd signal test "
+			"(test_uffdio_copy=false)\n");
+		return 0;
+	}
+
 	printf("testing signal delivery: ");
 	fflush(stdout);
 
@@ -1438,6 +1457,12 @@ static int userfaultfd_stress(void)
 	pthread_attr_init(&attr);
 	pthread_attr_setstacksize(&attr, 16*1024*1024);
 
+	if (!test_uffdio_copy) {
+		printf("Skipping userfaultfd stress test "
+			"(test_uffdio_copy=false)\n");
+		bounces = 0;
+	}
+
 	while (bounces--) {
 		printf("bounces: %d, mode:", bounces);
 		if (bounces & BOUNCE_RANDOM)
@@ -1598,6 +1623,13 @@ static void set_test_type(const char *type)
 		uffd_test_ops = &hugetlb_uffd_test_ops;
 		/* Minor faults require shared hugetlb; only enable here. */
 		test_uffdio_minor = true;
+	} else if (!strcmp(type, "hugetlb_shared_hgm")) {
+		map_shared = true;
+		test_type = TEST_HUGETLB_HGM;
+		uffd_test_ops = &hugetlb_uffd_test_ops;
+		/* Minor faults require shared hugetlb; only enable here. */
+		test_uffdio_minor = true;
+		test_uffdio_copy = false;
 	} else if (!strcmp(type, "shmem")) {
 		map_shared = true;
 		test_type = TEST_SHMEM;
@@ -1607,8 +1639,10 @@ static void set_test_type(const char *type)
 		err("Unknown test type: %s", type);
 	}
 
+	hpage_size = default_huge_page_size();
 	if (test_type == TEST_HUGETLB)
-		page_size = default_huge_page_size();
+		// TEST_HUGETLB_HGM gets small pages.
+		page_size = hpage_size;
 	else
 		page_size = sysconf(_SC_PAGE_SIZE);
 
@@ -1658,19 +1692,26 @@ int main(int argc, char **argv)
 	nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
 	nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size /
 		nr_cpus;
+	if (test_type == TEST_HUGETLB_HGM)
+		/*
+		 * `page_size` refers to the page_size we can use in
+		 * UFFDIO_CONTINUE. We still need nr_pages to be appropriately
+		 * aligned, so align it here.
+		 */
+		nr_pages_per_cpu -= nr_pages_per_cpu % (hpage_size / page_size);
 	if (!nr_pages_per_cpu) {
 		_err("invalid MiB");
 		usage();
 	}
+	nr_pages = nr_pages_per_cpu * nr_cpus;
 
 	bounces = atoi(argv[3]);
 	if (bounces <= 0) {
 		_err("invalid bounces");
 		usage();
 	}
-	nr_pages = nr_pages_per_cpu * nr_cpus;
 
-	if (test_type == TEST_HUGETLB && map_shared) {
+	if (test_is_hugetlb() && map_shared) {
 		if (argc < 5)
 			usage();
 		huge_fd = open(argv[4], O_CREAT | O_RDWR, 0755);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH 26/26] selftests: add HugeTLB HGM to KVM demand paging selftest
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (24 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 25/26] selftests: add HugeTLB HGM to userfaultfd selftest James Houghton
@ 2022-06-24 17:36 ` James Houghton
  2022-06-24 18:29 ` [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping Matthew Wilcox
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-24 17:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, Dr . David Alan Gilbert, linux-mm,
	linux-kernel, James Houghton

This doesn't address collapsing yet, and it only works with the MINOR
mode (UFFDIO_CONTINUE).

Signed-off-by: James Houghton <jthoughton@google.com>
---
 tools/testing/selftests/kvm/include/test_util.h |  2 ++
 tools/testing/selftests/kvm/lib/kvm_util.c      |  2 +-
 tools/testing/selftests/kvm/lib/test_util.c     | 14 ++++++++++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index 99e0dcdc923f..6209e44981a7 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -87,6 +87,7 @@ enum vm_mem_backing_src_type {
 	VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
 	VM_MEM_SRC_SHMEM,
 	VM_MEM_SRC_SHARED_HUGETLB,
+	VM_MEM_SRC_SHARED_HUGETLB_HGM,
 	NUM_SRC_TYPES,
 };
 
@@ -105,6 +106,7 @@ size_t get_def_hugetlb_pagesz(void);
 const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i);
 size_t get_backing_src_pagesz(uint32_t i);
 bool is_backing_src_hugetlb(uint32_t i);
+bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type);
 void backing_src_help(const char *flag);
 enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name);
 long get_run_delay(void);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 1665a220abcb..382f8fb75b7f 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -993,7 +993,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
 	region->fd = -1;
 	if (backing_src_is_shared(src_type))
 		region->fd = kvm_memfd_alloc(region->mmap_size,
-					     src_type == VM_MEM_SRC_SHARED_HUGETLB);
+				is_backing_src_shared_hugetlb(src_type));
 
 	region->mmap_start = mmap(NULL, region->mmap_size,
 				  PROT_READ | PROT_WRITE,
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
index 6d23878bbfe1..710dc42077fe 100644
--- a/tools/testing/selftests/kvm/lib/test_util.c
+++ b/tools/testing/selftests/kvm/lib/test_util.c
@@ -254,6 +254,13 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i)
 			 */
 			.flag = MAP_SHARED,
 		},
+		[VM_MEM_SRC_SHARED_HUGETLB_HGM] = {
+			/*
+			 * Identical to shared_hugetlb except for the name.
+			 */
+			.name = "shared_hugetlb_hgm",
+			.flag = MAP_SHARED,
+		},
 	};
 	_Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
 		       "Missing new backing src types?");
@@ -272,6 +279,7 @@ size_t get_backing_src_pagesz(uint32_t i)
 	switch (i) {
 	case VM_MEM_SRC_ANONYMOUS:
 	case VM_MEM_SRC_SHMEM:
+	case VM_MEM_SRC_SHARED_HUGETLB_HGM:
 		return getpagesize();
 	case VM_MEM_SRC_ANONYMOUS_THP:
 		return get_trans_hugepagesz();
@@ -288,6 +296,12 @@ bool is_backing_src_hugetlb(uint32_t i)
 	return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB);
 }
 
+bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type)
+{
+	return src_type == VM_MEM_SRC_SHARED_HUGETLB ||
+		src_type == VM_MEM_SRC_SHARED_HUGETLB_HGM;
+}
+
 static void print_available_backing_src_types(const char *prefix)
 {
 	int i;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (25 preceding siblings ...)
  2022-06-24 17:36 ` [RFC PATCH 26/26] selftests: add HugeTLB HGM to KVM demand paging selftest James Houghton
@ 2022-06-24 18:29 ` Matthew Wilcox
  2022-06-27 16:36   ` James Houghton
  2022-06-24 18:41 ` Mina Almasry
  2022-06-24 18:47 ` Matthew Wilcox
  28 siblings, 1 reply; 128+ messages in thread
From: Matthew Wilcox @ 2022-06-24 18:29 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Manish Mishra, Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 05:36:30PM +0000, James Houghton wrote:
> [1] This used to be called HugeTLB double mapping, a bad and confusing
>     name. "High-granularity mapping" is not a great name either. I am open
>     to better names.

Oh good, I was grinding my teeth every time I read it ;-)

How does "Fine granularity" work for you?
"sub-page mapping" might work too.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (26 preceding siblings ...)
  2022-06-24 18:29 ` [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping Matthew Wilcox
@ 2022-06-24 18:41 ` Mina Almasry
  2022-06-27 16:27   ` James Houghton
  2022-06-24 18:47 ` Matthew Wilcox
  28 siblings, 1 reply; 128+ messages in thread
From: Mina Almasry @ 2022-06-24 18:41 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
>
> This RFC introduces the concept of HugeTLB high-granularity mapping
> (HGM)[1].  In broad terms, this series teaches HugeTLB how to map HugeTLB
> pages at different granularities, and, more importantly, to partially map
> a HugeTLB page.  This cover letter will go over
>  - the motivation for these changes
>  - userspace API
>  - some of the changes to HugeTLB to make this work
>  - limitations & future enhancements
>
> High-granularity mapping does *not* involve dissolving the hugepages
> themselves; it only affects how they are mapped.
>
> ---- Motivation ----
>
> Being able to map HugeTLB memory with PAGE_SIZE PTEs has important use
> cases in post-copy live migration and memory failure handling.
>
> - Live Migration (userfaultfd)
> For post-copy live migration, using userfaultfd, currently we have to
> install an entire hugepage before we can allow a guest to access that page.
> This is because, right now, either the WHOLE hugepage is mapped or NONE of
> it is.  So either the guest can access the WHOLE hugepage or NONE of it.
> This makes post-copy live migration for 1G HugeTLB-backed VMs completely
> infeasible.
>
> With high-granularity mapping, we can map PAGE_SIZE pieces of a hugepage,
> thereby allowing the guest to access only PAGE_SIZE chunks, and getting
> page faults on the rest (and triggering another demand-fetch). This gives
> userspace the flexibility to install PAGE_SIZE chunks of memory into a
> hugepage, making migration of 1G-backed VMs perfectly feasible, and it
> vastly reduces the vCPU stall time during post-copy for 2M-backed VMs.
>
> At Google, for a 48 vCPU VM in post-copy, we can expect these approximate
> per-page median fetch latencies:
>      4K: <100us
>      2M: >10ms
> Being able to unpause a vCPU 100x quicker is helpful for guest stability,
> and being able to use 1G pages at all can significant improve steady-state
> guest performance.
>
> After fully copying a hugepage over the network, we will want to collapse
> the mapping down to what it would normally be (e.g., one PUD for a 1G
> page). Rather than having the kernel do this automatically, we leave it up
> to userspace to tell us to collapse a range (via MADV_COLLAPSE, co-opting
> the API that is being introduced for THPs[2]).
>
> - Memory Failure
> When a memory error is found within a HugeTLB page, it would be ideal if we
> could unmap only the PAGE_SIZE section that contained the error. This is
> what THPs are able to do. Using high-granularity mapping, we could do this,
> but this isn't tackled in this patch series.
>
> ---- Userspace API ----
>
> This patch series introduces a single way to take advantage of
> high-granularity mapping: via UFFDIO_CONTINUE. UFFDIO_CONTINUE allows
> userspace to resolve MINOR page faults on shared VMAs.
>
> To collapse a HugeTLB address range that has been mapped with several
> UFFDIO_CONTINUE operations, userspace can issue MADV_COLLAPSE. We expect
> userspace to know when all pages (that they care about) have been fetched.
>

Thanks James! Cover letter looks good. A few questions:

Why not have the kernel collapse the hugepage once all the 4K pages
have been fetched automatically? It would remove the need for a new
userspace API, and AFACT there aren't really any cases where it is
beneficial to have a hugepage sharded into 4K mappings when those
mappings can be collapsed.

> ---- HugeTLB Changes ----
>
> - Mapcount
> The way mapcount is handled is different from the way that it was handled
> before. If the PUD for a hugepage is not none, a hugepage's mapcount will
> be increased. This scheme means that, for hugepages that aren't mapped at
> high granularity, their mapcounts will remain the same as what they would
> have been pre-HGM.
>

Sorry, I didn't quite follow this. It says mapcount is handled
differently, but the same if the page is not mapped at high
granularity. Can you elaborate on how the mapcount handling will be
different when the page is mapped at high granularity?

> - Page table walking and manipulation
> A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
> high-granularity mappings. Eventually, it's possible to merge
> hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.
>
> We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
> This is because we generally need to know the "size" of a PTE (previously
> always just huge_page_size(hstate)).
>
> For every page table manipulation function that has a huge version (e.g.
> huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
> hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
> PTE really is "huge".
>
> - Synchronization
> For existing bits of HugeTLB, synchronization is unchanged. For splitting
> and collapsing HugeTLB PTEs, we require that the i_mmap_rw_sem is held for
> writing, and for doing high-granularity page table walks, we require it to
> be held for reading.
>
> ---- Limitations & Future Changes ----
>
> This patch series only implements high-granularity mapping for VM_SHARED
> VMAs.  I intend to implement enough HGM to support 4K unmapping for memory
> failure recovery for both shared and private mappings.
>
> The memory failure use case poses its own challenges that can be
> addressed, but I will do so in a separate RFC.
>
> Performance has not been heavily scrutinized with this patch series. There
> are places where lock contention can significantly reduce performance. This
> will be addressed later.
>
> The patch series, as it stands right now, is compatible with the VMEMMAP
> page struct optimization[3], as we do not need to modify data contained
> in the subpage page structs.
>
> Other omissions:
>  - Compatibility with userfaultfd write-protect (will be included in v1).
>  - Support for mremap() (will be included in v1). This looks a lot like
>    the support we have for fork().
>  - Documentation changes (will be included in v1).
>  - Completely ignores PMD sharing and hugepage migration (will be included
>    in v1).
>  - Implementations for architectures that don't use GENERAL_HUGETLB other
>    than arm64.
>
> ---- Patch Breakdown ----
>
> Patch 1     - Preliminary changes
> Patch 2-10  - HugeTLB HGM core changes
> Patch 11-13 - HugeTLB HGM page table walking functionality
> Patch 14-19 - HugeTLB HGM compatibility with other bits
> Patch 20-23 - Userfaultfd and collapse changes
> Patch 24-26 - arm64 support and selftests
>
> [1] This used to be called HugeTLB double mapping, a bad and confusing
>     name. "High-granularity mapping" is not a great name either. I am open
>     to better names.

I would drop 1 extra word and do "granular mapping", as in the mapping
is more granular than what it normally is (2MB/1G, etc).

> [2] https://lore.kernel.org/linux-mm/20220604004004.954674-10-zokeefe@google.com/
> [3] commit f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
>
> James Houghton (26):
>   hugetlb: make hstate accessor functions const
>   hugetlb: sort hstates in hugetlb_init_hstates
>   hugetlb: add make_huge_pte_with_shift
>   hugetlb: make huge_pte_lockptr take an explicit shift argument.
>   hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
>   mm: make free_p?d_range functions public
>   hugetlb: add hugetlb_pte to track HugeTLB page table entries
>   hugetlb: add hugetlb_free_range to free PT structures
>   hugetlb: add hugetlb_hgm_enabled
>   hugetlb: add for_each_hgm_shift
>   hugetlb: add hugetlb_walk_to to do PT walks
>   hugetlb: add HugeTLB splitting functionality
>   hugetlb: add huge_pte_alloc_high_granularity
>   hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page
>   hugetlb: make unmapping compatible with high-granularity mappings
>   hugetlb: make hugetlb_change_protection compatible with HGM
>   hugetlb: update follow_hugetlb_page to support HGM
>   hugetlb: use struct hugetlb_pte for walk_hugetlb_range
>   hugetlb: add HGM support for copy_hugetlb_page_range
>   hugetlb: add support for high-granularity UFFDIO_CONTINUE
>   hugetlb: add hugetlb_collapse
>   madvise: add uapi for HugeTLB HGM collapse: MADV_COLLAPSE
>   userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM
>   arm64/hugetlb: add support for high-granularity mappings
>   selftests: add HugeTLB HGM to userfaultfd selftest
>   selftests: add HugeTLB HGM to KVM demand paging selftest
>
>  arch/arm64/Kconfig                            |   1 +
>  arch/arm64/mm/hugetlbpage.c                   |  63 ++
>  arch/powerpc/mm/pgtable.c                     |   3 +-
>  arch/s390/mm/gmap.c                           |   8 +-
>  fs/Kconfig                                    |   7 +
>  fs/proc/task_mmu.c                            |  35 +-
>  fs/userfaultfd.c                              |  10 +-
>  include/asm-generic/tlb.h                     |   6 +-
>  include/linux/hugetlb.h                       | 177 +++-
>  include/linux/mm.h                            |   7 +
>  include/linux/pagewalk.h                      |   3 +-
>  include/uapi/asm-generic/mman-common.h        |   2 +
>  include/uapi/linux/userfaultfd.h              |   2 +
>  mm/damon/vaddr.c                              |  34 +-
>  mm/hmm.c                                      |   7 +-
>  mm/hugetlb.c                                  | 987 +++++++++++++++---
>  mm/madvise.c                                  |  23 +
>  mm/memory.c                                   |   8 +-
>  mm/mempolicy.c                                |  11 +-
>  mm/migrate.c                                  |   3 +-
>  mm/mincore.c                                  |   4 +-
>  mm/mprotect.c                                 |   6 +-
>  mm/page_vma_mapped.c                          |   3 +-
>  mm/pagewalk.c                                 |  18 +-
>  mm/userfaultfd.c                              |  57 +-
>  .../testing/selftests/kvm/include/test_util.h |   2 +
>  tools/testing/selftests/kvm/lib/kvm_util.c    |   2 +-
>  tools/testing/selftests/kvm/lib/test_util.c   |  14 +
>  tools/testing/selftests/vm/userfaultfd.c      |  61 +-
>  29 files changed, 1314 insertions(+), 250 deletions(-)
>
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 01/26] hugetlb: make hstate accessor functions const
  2022-06-24 17:36 ` [RFC PATCH 01/26] hugetlb: make hstate accessor functions const James Houghton
@ 2022-06-24 18:43   ` Mina Almasry
       [not found]   ` <e55f90f5-ba14-5d6e-8f8f-abf731b9095e@nutanix.com>
  2022-06-29  6:18   ` Muchun Song
  2 siblings, 0 replies; 128+ messages in thread
From: Mina Almasry @ 2022-06-24 18:43 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
>
> This is just a const-correctness change so that the new hugetlb_pte
> changes can be const-correct too.
>
> Acked-by: David Rientjes <rientjes@google.com>
>

Reviewed-By: Mina Almasry <almasrymina@google.com>

> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/hugetlb.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index e4cff27d1198..498a4ae3d462 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -715,7 +715,7 @@ static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
>         return hstate_file(vma->vm_file);
>  }
>
> -static inline unsigned long huge_page_size(struct hstate *h)
> +static inline unsigned long huge_page_size(const struct hstate *h)
>  {
>         return (unsigned long)PAGE_SIZE << h->order;
>  }
> @@ -729,27 +729,27 @@ static inline unsigned long huge_page_mask(struct hstate *h)
>         return h->mask;
>  }
>
> -static inline unsigned int huge_page_order(struct hstate *h)
> +static inline unsigned int huge_page_order(const struct hstate *h)
>  {
>         return h->order;
>  }
>
> -static inline unsigned huge_page_shift(struct hstate *h)
> +static inline unsigned huge_page_shift(const struct hstate *h)
>  {
>         return h->order + PAGE_SHIFT;
>  }
>
> -static inline bool hstate_is_gigantic(struct hstate *h)
> +static inline bool hstate_is_gigantic(const struct hstate *h)
>  {
>         return huge_page_order(h) >= MAX_ORDER;
>  }
>
> -static inline unsigned int pages_per_huge_page(struct hstate *h)
> +static inline unsigned int pages_per_huge_page(const struct hstate *h)
>  {
>         return 1 << h->order;
>  }
>
> -static inline unsigned int blocks_per_huge_page(struct hstate *h)
> +static inline unsigned int blocks_per_huge_page(const struct hstate *h)
>  {
>         return huge_page_size(h) / 512;
>  }
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
                   ` (27 preceding siblings ...)
  2022-06-24 18:41 ` Mina Almasry
@ 2022-06-24 18:47 ` Matthew Wilcox
  2022-06-27 16:48   ` James Houghton
  28 siblings, 1 reply; 128+ messages in thread
From: Matthew Wilcox @ 2022-06-24 18:47 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Manish Mishra, Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 05:36:30PM +0000, James Houghton wrote:
> - Page table walking and manipulation
> A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
> high-granularity mappings. Eventually, it's possible to merge
> hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.
> 
> We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
> This is because we generally need to know the "size" of a PTE (previously
> always just huge_page_size(hstate)).
> 
> For every page table manipulation function that has a huge version (e.g.
> huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
> hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
> PTE really is "huge".

I'm disappointed to hear that page table walking is going to become even
more special.  I'd much prefer it if hugetlb walking were exactly the
same as THP walking.  This seems like a good time to do at least some
of that work.

Was there a reason you chose the "more complexity" direction?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
  2022-06-24 17:36 ` [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
@ 2022-06-24 18:51   ` Mina Almasry
  2022-06-27 12:08   ` manish.mishra
  2022-06-27 18:42   ` Mike Kravetz
  2 siblings, 0 replies; 128+ messages in thread
From: Mina Almasry @ 2022-06-24 18:51 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
>
> When using HugeTLB high-granularity mapping, we need to go through the
> supported hugepage sizes in decreasing order so that we pick the largest
> size that works. Consider the case where we're faulting in a 1G hugepage
> for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> a PUD. By going through the sizes in decreasing order, we will find that
> PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
>

Mostly nits:

Reviewed-by: Mina Almasry <almasrymina@google.com>

> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 37 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a57e1be41401..5df838d86f32 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -33,6 +33,7 @@
>  #include <linux/migrate.h>
>  #include <linux/nospec.h>
>  #include <linux/delayacct.h>
> +#include <linux/sort.h>
>
>  #include <asm/page.h>
>  #include <asm/pgalloc.h>
> @@ -48,6 +49,10 @@
>
>  int hugetlb_max_hstate __read_mostly;
>  unsigned int default_hstate_idx;
> +/*
> + * After hugetlb_init_hstates is called, hstates will be sorted from largest
> + * to smallest.
> + */
>  struct hstate hstates[HUGE_MAX_HSTATE];
>
>  #ifdef CONFIG_CMA
> @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
>         kfree(node_alloc_noretry);
>  }
>
> +static int compare_hstates_decreasing(const void *a, const void *b)
> +{
> +       const int shift_a = huge_page_shift((const struct hstate *)a);
> +       const int shift_b = huge_page_shift((const struct hstate *)b);
> +
> +       if (shift_a < shift_b)
> +               return 1;
> +       if (shift_a > shift_b)
> +               return -1;
> +       return 0;
> +}
> +
> +static void sort_hstates(void)

Maybe sort_hstates_descending(void) for extra clarity.

> +{
> +       unsigned long default_hstate_sz = huge_page_size(&default_hstate);
> +
> +       /* Sort from largest to smallest. */

I'd remove this redundant comment; it's somewhat obvious what the next
line does.

> +       sort(hstates, hugetlb_max_hstate, sizeof(*hstates),
> +            compare_hstates_decreasing, NULL);
> +
> +       /*
> +        * We may have changed the location of the default hstate, so we need to
> +        * update it.
> +        */
> +       default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz));
> +}
> +
>  static void __init hugetlb_init_hstates(void)
>  {
>         struct hstate *h, *h2;
>
> -       for_each_hstate(h) {
> -               if (minimum_order > huge_page_order(h))
> -                       minimum_order = huge_page_order(h);
> +       sort_hstates();
>
> +       /* The last hstate is now the smallest. */

Same, given that above is sort_hstates().

> +       minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]);
> +
> +       for_each_hstate(h) {
>                 /* oversize hugepages were init'ed in early boot */
>                 if (!hstate_is_gigantic(h))
>                         hugetlb_hstate_alloc_pages(h);
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift
  2022-06-24 17:36 ` [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift James Houghton
@ 2022-06-24 19:01   ` Mina Almasry
  2022-06-27 12:13   ` manish.mishra
  1 sibling, 0 replies; 128+ messages in thread
From: Mina Almasry @ 2022-06-24 19:01 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
>
> This allows us to make huge PTEs at shifts other than the hstate shift,
> which will be necessary for high-granularity mappings.
>

Can you elaborate on why?

> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  mm/hugetlb.c | 33 ++++++++++++++++++++-------------
>  1 file changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5df838d86f32..0eec34edf3b2 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4686,23 +4686,30 @@ const struct vm_operations_struct hugetlb_vm_ops = {
>         .pagesize = hugetlb_vm_op_pagesize,
>  };
>
> +static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma,
> +                                     struct page *page, int writable,
> +                                     int shift)
> +{
> +       bool huge = shift > PAGE_SHIFT;
> +       pte_t entry = huge ? mk_huge_pte(page, vma->vm_page_prot)
> +                          : mk_pte(page, vma->vm_page_prot);
> +
> +       if (writable)
> +               entry = huge ? huge_pte_mkwrite(entry) : pte_mkwrite(entry);
> +       else
> +               entry = huge ? huge_pte_wrprotect(entry) : pte_wrprotect(entry);
> +       pte_mkyoung(entry);
> +       if (huge)
> +               entry = arch_make_huge_pte(entry, shift, vma->vm_flags);
> +       return entry;
> +}
> +
>  static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
> -                               int writable)
> +                          int writable)

Looks like an unnecessary diff?

>  {
> -       pte_t entry;
>         unsigned int shift = huge_page_shift(hstate_vma(vma));
>
> -       if (writable) {
> -               entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page,
> -                                        vma->vm_page_prot)));

In this case there is an intermediate call to huge_pte_mkdirty() that
is not done in make_huge_pte_with_shift(). Why was this removed?

> -       } else {
> -               entry = huge_pte_wrprotect(mk_huge_pte(page,
> -                                          vma->vm_page_prot));
> -       }
> -       entry = pte_mkyoung(entry);
> -       entry = arch_make_huge_pte(entry, shift, vma->vm_flags);
> -
> -       return entry;
> +       return make_huge_pte_with_shift(vma, page, writable, shift);

I think this is marginally cleaner to calculate the shift inline:

  return make_huge_pte_with_shift(vma, page, writable,
huge_page_shift(hstate_vma(vma)));

>  }
>
>  static void set_huge_ptep_writable(struct vm_area_struct *vma,
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range
  2022-06-24 17:36 ` [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
@ 2022-06-25  2:58   ` kernel test robot
  2022-06-25  2:59   ` kernel test robot
  1 sibling, 0 replies; 128+ messages in thread
From: kernel test robot @ 2022-06-25  2:58 UTC (permalink / raw)
  To: James Houghton; +Cc: llvm, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 9371 bytes --]

Hi James,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on kvm/queue]
[also build test WARNING on arnd-asm-generic/master arm64/for-next/core linus/master v5.19-rc3]
[cannot apply to next-20220624]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/James-Houghton/hugetlb-Introduce-HugeTLB-high-granularity-mapping/20220625-014117
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: s390-randconfig-r044-20220624
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 6fa9120080c35a5ff851c3fc3358692c4ef7ce0d)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/c404da72c79a05fc3ed42e3dd616734ab17c7093
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review James-Houghton/hugetlb-Introduce-HugeTLB-high-granularity-mapping/20220625-014117
        git checkout c404da72c79a05fc3ed42e3dd616734ab17c7093
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash arch/s390/mm/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from arch/s390/mm/gmap.c:12:
   In file included from include/linux/pagewalk.h:6:
   include/linux/hugetlb.h:1205:38: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
                                        ^
   include/linux/hugetlb.h:1212:58: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
                                                            ^
   include/linux/hugetlb.h:1245:62: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
   include/linux/hugetlb.h:1248:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:71:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:78:42: note: expanded from macro 'unlikely'
   # define unlikely(x)    __builtin_expect(!!(x), 0)
                                               ^
   include/linux/hugetlb.h:1245:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
   include/linux/hugetlb.h:1251:6: error: call to undeclared function 'hugetlb_pte_none'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
               ^
   include/linux/hugetlb.h:1251:32: error: call to undeclared function 'hugetlb_pte_present_leaf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
                                         ^
   include/linux/hugetlb.h:1252:27: error: call to undeclared function 'hugetlb_pte_shift'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                   return huge_pte_lockptr(hugetlb_pte_shift(hpte),
                                           ^
   include/linux/hugetlb.h:1253:13: error: incomplete definition of type 'struct hugetlb_pte'
                                   mm, hpte->ptep);
                                       ~~~~^
   include/linux/hugetlb.h:1245:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
   include/linux/hugetlb.h:1258:59: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                             ^
   include/linux/hugetlb.h:1260:44: error: incompatible pointer types passing 'struct hugetlb_pte *' to parameter of type 'struct hugetlb_pte *' [-Werror,-Wincompatible-pointer-types]
           spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
                                                     ^~~~
   include/linux/hugetlb.h:1245:75: note: passing argument to parameter 'hpte' here
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                             ^
   In file included from arch/s390/mm/gmap.c:12:
   include/linux/pagewalk.h:51:30: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
           int (*hugetlb_entry)(struct hugetlb_pte *hpte,
                                       ^
   In file included from arch/s390/mm/gmap.c:19:
   include/linux/mman.h:154:9: warning: division by zero is undefined [-Wdivision-by-zero]
                  _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mman.h:132:21: note: expanded from macro '_calc_vm_trans'
      : ((x) & (bit1)) / ((bit1) / (bit2))))
                       ^ ~~~~~~~~~~~~~~~~~
>> arch/s390/mm/gmap.c:2623:46: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte,
                                                ^
   arch/s390/mm/gmap.c:2627:7: error: call to undeclared function 'hugetlb_pte_present_leaf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (!hugetlb_pte_present_leaf(hpte) ||
                ^
   arch/s390/mm/gmap.c:2628:4: error: call to undeclared function 'hugetlb_pte_size'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                           hugetlb_pte_size(hpte) != PMD_SIZE)
                           ^
   arch/s390/mm/gmap.c:2628:4: note: did you mean 'hugetlb_pte_lock'?
   include/linux/hugetlb.h:1258:13: note: 'hugetlb_pte_lock' declared here
   spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
               ^
   arch/s390/mm/gmap.c:2631:24: error: use of undeclared identifier 'pte'
           pmd_t *pmd = (pmd_t *)pte;
                                 ^
>> arch/s390/mm/gmap.c:2631:9: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
           pmd_t *pmd = (pmd_t *)pte;
                  ^
   arch/s390/mm/gmap.c:2654:20: error: incompatible function pointer types initializing 'int (*)(struct hugetlb_pte *, unsigned long, unsigned long, struct mm_walk *)' with an expression of type 'int (struct hugetlb_pte *, unsigned long, unsigned long, struct mm_walk *)' [-Werror,-Wincompatible-function-pointer-types]
           .hugetlb_entry          = __s390_enable_skey_hugetlb,
                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~
   8 warnings and 10 errors generated.


vim +2623 arch/s390/mm/gmap.c

  2622	
> 2623	static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte,
  2624					      unsigned long addr, unsigned long next,
  2625					      struct mm_walk *walk)
  2626	{
  2627		if (!hugetlb_pte_present_leaf(hpte) ||
  2628				hugetlb_pte_size(hpte) != PMD_SIZE)
  2629			return -EINVAL;
  2630	
> 2631		pmd_t *pmd = (pmd_t *)pte;
  2632		unsigned long start, end;
  2633		struct page *page = pmd_page(*pmd);
  2634	
  2635		/*
  2636		 * The write check makes sure we do not set a key on shared
  2637		 * memory. This is needed as the walker does not differentiate
  2638		 * between actual guest memory and the process executable or
  2639		 * shared libraries.
  2640		 */
  2641		if (pmd_val(*pmd) & _SEGMENT_ENTRY_INVALID ||
  2642		    !(pmd_val(*pmd) & _SEGMENT_ENTRY_WRITE))
  2643			return 0;
  2644	
  2645		start = pmd_val(*pmd) & HPAGE_MASK;
  2646		end = start + HPAGE_SIZE - 1;
  2647		__storage_key_init_range(start, end);
  2648		set_bit(PG_arch_1, &page->flags);
  2649		cond_resched();
  2650		return 0;
  2651	}
  2652	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 80687 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/s390 5.19.0-rc1 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="clang version 15.0.0 (git://gitmirror/llvm_project 6fa9120080c35a5ff851c3fc3358692c4ef7ce0d)"
CONFIG_GCC_VERSION=0
CONFIG_CC_IS_CLANG=y
CONFIG_CLANG_VERSION=150000
CONFIG_AS_IS_LLVM=y
CONFIG_AS_VERSION=150000
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23800
CONFIG_LLD_VERSION=0
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_HAVE_KERNEL_UNCOMPRESSED=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_KERNEL_UNCOMPRESSED=y
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_SIM=y
CONFIG_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_DEBUGFS=y
# end of IRQ subsystem

CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
# CONFIG_TIME_KUNIT_TEST is not set

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_DEFAULT_ON=y
# end of BPF subsystem

CONFIG_PREEMPT_VOLUNTARY_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
# CONFIG_IKCONFIG_PROC is not set
CONFIG_IKHEADERS=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
# CONFIG_PRINTK_INDEX is not set

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough"
# CONFIG_CGROUPS is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
CONFIG_NET_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_RD_GZIP is not set
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
# CONFIG_RD_ZSTD is not set
# CONFIG_BOOT_CONFIG is not set
CONFIG_INITRAMFS_PRESERVE_MTIME=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
# CONFIG_EXPERT is not set
CONFIG_MULTIUSER=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_RSEQ=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_MMU=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_PGSTE=y
CONFIG_AUDIT_ARCH=y
CONFIG_NO_IOPORT_MAP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_KASAN_SHADOW_OFFSET=0x1C000000000000
CONFIG_S390=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_HAVE_LIVEPATCH=y

#
# Processor type and features
#
CONFIG_HAVE_MARCH_Z10_FEATURES=y
CONFIG_HAVE_MARCH_Z196_FEATURES=y
# CONFIG_MARCH_Z10 is not set
CONFIG_MARCH_Z196=y
# CONFIG_MARCH_ZEC12 is not set
# CONFIG_MARCH_Z13 is not set
# CONFIG_MARCH_Z14 is not set
# CONFIG_MARCH_Z15 is not set
# CONFIG_MARCH_Z16 is not set
CONFIG_MARCH_Z196_TUNE=y
CONFIG_TUNE_DEFAULT=y
# CONFIG_TUNE_Z10 is not set
# CONFIG_TUNE_Z196 is not set
# CONFIG_TUNE_ZEC12 is not set
# CONFIG_TUNE_Z13 is not set
# CONFIG_TUNE_Z14 is not set
# CONFIG_TUNE_Z15 is not set
# CONFIG_TUNE_Z16 is not set
CONFIG_64BIT=y
CONFIG_COMMAND_LINE_SIZE=4096
CONFIG_SMP=y
CONFIG_NR_CPUS=64
CONFIG_HOTPLUG_CPU=y
# CONFIG_SCHED_TOPOLOGY is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_KEXEC=y
# CONFIG_ARCH_RANDOM is not set
CONFIG_KERNEL_NOBP=y
# CONFIG_RELOCATABLE is not set
# end of Processor type and features

#
# Memory setup
#
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_MAX_PHYSMEM_BITS=46
# CONFIG_CHECK_STACK is not set
# end of Memory setup

#
# I/O subsystem
#
CONFIG_QDIO=y
CONFIG_CHSC_SCH=m
# CONFIG_SCM_BUS is not set
# CONFIG_VFIO_CCW is not set
# end of I/O subsystem

#
# Dump support
#
CONFIG_CRASH_DUMP=y
# end of Dump support

CONFIG_CCW=y
CONFIG_HAVE_PNETID=m

#
# Virtualization
#
CONFIG_PROTECTED_VIRTUALIZATION_GUEST=y
CONFIG_PFAULT=y
CONFIG_CMM=m
# CONFIG_APPLDATA_BASE is not set
# CONFIG_S390_HYPFS_FS is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_KVM_ASYNC_PF_SYNC=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_HAVE_KVM_INVALID_WAKEUPS=y
CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=y
# CONFIG_KVM_S390_UCONTROL is not set
CONFIG_S390_GUEST=y
# end of Virtualization

#
# Selftests
#
CONFIG_S390_UNWIND_SELFTEST=m
# CONFIG_S390_KPROBES_SANITY_TEST is not set
# CONFIG_S390_MODULES_SANITY_TEST is not set
# end of Selftests

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_ARCH_32BIT_USTAT_F_TINODE=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_MMU_GATHER_NO_GATHER=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_LTO_NONE=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE=y
CONFIG_ARCH_HAS_SCALED_CPUTIME=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ALTERNATE_USER_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_CLONE_BACKWARDS2=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_ARCH_HAS_VDSO_DATA=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y

#
# GCOV-based kernel profiling
#
CONFIG_GCOV_KERNEL=y
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
CONFIG_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
# CONFIG_MODULE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
# CONFIG_MODULE_SIG_SHA256 is not set
# CONFIG_MODULE_SIG_SHA384 is not set
CONFIG_MODULE_SIG_SHA512=y
CONFIG_MODULE_SIG_HASH="sha512"
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLOCK_LEGACY_AUTOLOAD=y
CONFIG_BLK_DEV_BSG_COMMON=y
CONFIG_BLK_ICQ=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=m
CONFIG_BLK_DEV_ZONED=y
# CONFIG_BLK_WBT is not set
# CONFIG_BLK_DEBUG_FS is not set
CONFIG_BLK_SED_OPAL=y
CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_EFI_PARTITION=y
# end of Partition Types

CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLOCK_HOLDER_DEPRECATED=y
CONFIG_BLK_MQ_STACKING=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK=y
CONFIG_ARCH_INLINE_SPIN_LOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_READ_TRYLOCK=y
CONFIG_ARCH_INLINE_READ_LOCK=y
CONFIG_ARCH_INLINE_READ_LOCK_BH=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_READ_UNLOCK=y
CONFIG_ARCH_INLINE_READ_UNLOCK_BH=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_WRITE_TRYLOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y

#
# Executable file formats
#
# CONFIG_BINFMT_ELF is not set
CONFIG_ARCH_BINFMT_ELF_STATE=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=y
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SWAP=y
# CONFIG_ZSWAP is not set

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLAB_MERGE_DEFAULT is not set
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SLUB_STATS=y
CONFIG_SLUB_CPU_PARTIAL=y
# end of SLAB allocator options

CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
CONFIG_COMPAT_BRK=y
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK_PHYS_MAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_EXCLUSIVE_SYSTEM_RAM=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
# CONFIG_KSM is not set
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
CONFIG_READ_ONLY_THP_FOR_FS=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
CONFIG_CMA_DEBUGFS=y
# CONFIG_CMA_SYSFS is not set
CONFIG_CMA_AREAS=7
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ZONE_DMA=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_PERCPU_STATS is not set
# CONFIG_GUP_TEST is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
# CONFIG_ANON_VMA_NAME is not set
CONFIG_USERFAULTFD=y

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
# CONFIG_UNIX is not set
CONFIG_IUCV=y
CONFIG_AFIUCV=y
# CONFIG_INET is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
CONFIG_NETWORK_PHY_TIMESTAMPING=y
# CONFIG_NETFILTER is not set
CONFIG_ATM=y
CONFIG_ATM_LANE=y
CONFIG_STP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_MRP=y
# CONFIG_BRIDGE_CFM is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
CONFIG_X25=y
CONFIG_LAPB=y
# CONFIG_PHONET is not set
CONFIG_IEEE802154=m
CONFIG_IEEE802154_NL802154_EXPERIMENTAL=y
CONFIG_IEEE802154_SOCKET=m
CONFIG_MAC802154=m
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=y
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=y
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_CBS=y
CONFIG_NET_SCH_ETF=m
CONFIG_NET_SCH_TAPRIO=m
CONFIG_NET_SCH_GRED=m
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
CONFIG_NET_SCH_DRR=y
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
# CONFIG_NET_SCH_CHOKE is not set
# CONFIG_NET_SCH_QFQ is not set
# CONFIG_NET_SCH_CODEL is not set
# CONFIG_NET_SCH_FQ_CODEL is not set
CONFIG_NET_SCH_CAKE=y
# CONFIG_NET_SCH_FQ is not set
# CONFIG_NET_SCH_HHF is not set
CONFIG_NET_SCH_PIE=y
CONFIG_NET_SCH_FQ_PIE=m
# CONFIG_NET_SCH_PLUG is not set
CONFIG_NET_SCH_ETS=y
# CONFIG_NET_SCH_DEFAULT is not set

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=y
CONFIG_NET_CLS_TCINDEX=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=y
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=y
CONFIG_NET_CLS_FLOW=m
# CONFIG_NET_CLS_BPF is not set
CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_CLS_MATCHALL=m
# CONFIG_NET_EMATCH is not set
# CONFIG_NET_CLS_ACT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
# CONFIG_DNS_RESOLVER is not set
CONFIG_BATMAN_ADV=y
# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_DEBUG=y
# CONFIG_BATMAN_ADV_TRACING is not set
CONFIG_VSOCKETS=y
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VIRTIO_VSOCKETS=y
CONFIG_VIRTIO_VSOCKETS_COMMON=y
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=y
CONFIG_MPLS_ROUTING=m
# CONFIG_NET_NSH is not set
# CONFIG_HSR is not set
# CONFIG_QRTR is not set
CONFIG_PCPU_DEV_REFCNT=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
# end of Network testing
# end of Networking options

# CONFIG_CAN is not set
# CONFIG_MCTP is not set
CONFIG_RFKILL=y
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_INPUT=y
CONFIG_RFKILL_GPIO=y
# CONFIG_NET_9P is not set
CONFIG_CAIF=y
CONFIG_CAIF_DEBUG=y
CONFIG_CAIF_NETDEV=m
CONFIG_CAIF_USB=m
CONFIG_NFC=m
CONFIG_NFC_DIGITAL=m
# CONFIG_NFC_NCI is not set
CONFIG_NFC_HCI=m
# CONFIG_NFC_SHDLC is not set

#
# Near Field Communication (NFC) devices
#
CONFIG_NFC_SIM=m
CONFIG_NFC_PN533=m
# CONFIG_NFC_PN533_I2C is not set
CONFIG_NFC_PN532_UART=m
# end of Near Field Communication (NFC) devices

CONFIG_PSAMPLE=m
CONFIG_NET_IFE=m
# CONFIG_LWTUNNEL is not set
CONFIG_FAILOVER=y
# CONFIG_ETHTOOL_NETLINK is not set
# CONFIG_NETDEV_ADDR_LIST_TEST is not set

#
# Device Drivers
#
CONFIG_HAVE_PCI=y
# CONFIG_PCI is not set
CONFIG_PCCARD=y
CONFIG_PCMCIA=m
CONFIG_PCMCIA_LOAD_CIS=y

#
# PC-card bridges
#

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
# CONFIG_DEVTMPFS_SAFE is not set
# CONFIG_STANDALONE is not set
# CONFIG_PREVENT_FIRMWARE_BUILD is not set

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_FW_LOADER_SYSFS=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_COMPRESS=y
CONFIG_FW_LOADER_COMPRESS_XZ=y
# CONFIG_FW_LOADER_COMPRESS_ZSTD is not set
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
CONFIG_TEST_ASYNC_DRIVER_PROBE=m
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=y
CONFIG_REGMAP_SPMI=m
CONFIG_REGMAP_IRQ=y
CONFIG_REGMAP_I3C=y
# end of Generic Driver Options

#
# Bus devices
#
CONFIG_MHI_BUS=y
# CONFIG_MHI_BUS_DEBUG is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

CONFIG_CONNECTOR=m

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
CONFIG_MTD=m
CONFIG_MTD_TESTS=m

#
# Partition parsers
#
CONFIG_MTD_AR7_PARTS=m
# CONFIG_MTD_CMDLINE_PARTS is not set
CONFIG_MTD_OF_PARTS=m
CONFIG_MTD_REDBOOT_PARTS=m
CONFIG_MTD_REDBOOT_DIRECTORY_BLOCK=-1
# CONFIG_MTD_REDBOOT_PARTS_UNALLOCATED is not set
# CONFIG_MTD_REDBOOT_PARTS_READONLY is not set
# end of Partition parsers

#
# User Modules And Translation Layers
#
CONFIG_MTD_BLKDEVS=m
CONFIG_MTD_BLOCK=m
# CONFIG_MTD_BLOCK_RO is not set

#
# Note that in some cases UBI block is preferred. See MTD_UBI_BLOCK.
#
CONFIG_FTL=m
# CONFIG_NFTL is not set
CONFIG_INFTL=m
CONFIG_RFD_FTL=m
CONFIG_SSFDC=m
CONFIG_SM_FTL=m
CONFIG_MTD_OOPS=m
# CONFIG_MTD_SWAP is not set
# CONFIG_MTD_PARTITIONED_MASTER is not set

#
# RAM/ROM/Flash chip drivers
#
# CONFIG_MTD_CFI is not set
# CONFIG_MTD_JEDECPROBE is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_RAM is not set
CONFIG_MTD_ROM=m
# CONFIG_MTD_ABSENT is not set
# end of RAM/ROM/Flash chip drivers

#
# NAND
#
CONFIG_MTD_NAND_CORE=m
CONFIG_MTD_RAW_NAND=m

#
# Raw/parallel NAND flash controllers
#

#
# Misc
#
# CONFIG_MTD_NAND_NANDSIM is not set

#
# ECC engine support
#
CONFIG_MTD_NAND_ECC=y
CONFIG_MTD_NAND_ECC_SW_HAMMING=y
# CONFIG_MTD_NAND_ECC_SW_HAMMING_SMC is not set
# CONFIG_MTD_NAND_ECC_SW_BCH is not set
# end of ECC engine support
# end of NAND

#
# LPDDR & LPDDR2 PCM memory drivers
#
CONFIG_MTD_LPDDR=m
CONFIG_MTD_QINFO_PROBE=m
# end of LPDDR & LPDDR2 PCM memory drivers

CONFIG_MTD_UBI=m
CONFIG_MTD_UBI_WL_THRESHOLD=4096
CONFIG_MTD_UBI_BEB_LIMIT=20
CONFIG_MTD_UBI_FASTMAP=y
# CONFIG_MTD_UBI_GLUEBI is not set
# CONFIG_MTD_UBI_BLOCK is not set
CONFIG_DTC=y
CONFIG_OF=y
CONFIG_OF_UNITTEST=y
CONFIG_OF_FLATTREE=y
CONFIG_OF_EARLY_FLATTREE=y
CONFIG_OF_KOBJ=y
CONFIG_OF_DYNAMIC=y
CONFIG_OF_IRQ=y
CONFIG_OF_RESERVED_MEM=y
CONFIG_OF_RESOLVE=y
# CONFIG_OF_OVERLAY is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
CONFIG_CDROM=m
# CONFIG_ZRAM is not set
# CONFIG_BLK_DEV_LOOP is not set

#
# DRBD disabled because PROC_FS or INET not selected
#
# CONFIG_BLK_DEV_NBD is not set
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
# CONFIG_CDROM_PKTCDVD is not set
CONFIG_ATA_OVER_ETH=m

#
# S/390 block device drivers
#
# CONFIG_DCSSBLK is not set
CONFIG_DASD=m
# CONFIG_DASD_PROFILE is not set
# CONFIG_DASD_ECKD is not set
CONFIG_DASD_FBA=m
# CONFIG_DASD_DIAG is not set
# CONFIG_DASD_EER is not set
# CONFIG_VIRTIO_BLK is not set

#
# NVME Support
#
# CONFIG_NVME_FC is not set
# CONFIG_NVME_TARGET is not set
# end of NVME Support

#
# Misc devices
#
CONFIG_AD525X_DPOT=y
CONFIG_AD525X_DPOT_I2C=m
# CONFIG_DUMMY_IRQ is not set
CONFIG_ICS932S401=y
CONFIG_ENCLOSURE_SERVICES=m
# CONFIG_APDS9802ALS is not set
# CONFIG_ISL29003 is not set
CONFIG_ISL29020=y
# CONFIG_SENSORS_TSL2550 is not set
CONFIG_SENSORS_BH1770=y
# CONFIG_SENSORS_APDS990X is not set
CONFIG_HMC6352=m
CONFIG_DS1682=y
# CONFIG_OPEN_DICE is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
CONFIG_EEPROM_AT24=m
CONFIG_EEPROM_LEGACY=m
CONFIG_EEPROM_MAX6875=y
CONFIG_EEPROM_93CX6=m
CONFIG_EEPROM_IDT_89HPESX=m
# CONFIG_EEPROM_EE1004 is not set
# end of EEPROM support

#
# Texas Instruments shared transport line discipline
#
CONFIG_TI_ST=m
# end of Texas Instruments shared transport line discipline

# CONFIG_SENSORS_LIS3_I2C is not set
CONFIG_ALTERA_STAPL=m
CONFIG_ECHO=m
# CONFIG_UACCE is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=m
CONFIG_RAID_ATTRS=m
CONFIG_SCSI_COMMON=m
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_BLK_DEV_BSG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
# CONFIG_SCSI_CONSTANTS is not set
# CONFIG_SCSI_LOGGING is not set
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
# CONFIG_SCSI_FC_ATTRS is not set
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
# CONFIG_SCSI_SAS_HOST_SMP is not set
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

# CONFIG_SCSI_LOWLEVEL is not set
# CONFIG_SCSI_DH is not set
# end of SCSI device support

CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_BCACHE=m
# CONFIG_BCACHE_DEBUG is not set
# CONFIG_BCACHE_CLOSURES_DEBUG is not set
CONFIG_BCACHE_ASYNC_REGISTRATION=y
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=y
CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING=y
# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set
CONFIG_DM_BIO_PRISON=y
CONFIG_DM_PERSISTENT_DATA=y
CONFIG_DM_UNSTRIPED=m
CONFIG_DM_CRYPT=y
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=y
# CONFIG_DM_CACHE is not set
# CONFIG_DM_WRITECACHE is not set
CONFIG_DM_EBS=m
CONFIG_DM_ERA=y
CONFIG_DM_CLONE=m
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
# CONFIG_DM_ZERO is not set
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=y
CONFIG_DM_MULTIPATH_HST=m
# CONFIG_DM_MULTIPATH_IOA is not set
CONFIG_DM_DELAY=y
CONFIG_DM_DUST=m
CONFIG_DM_INIT=y
CONFIG_DM_UEVENT=y
# CONFIG_DM_FLAKEY is not set
CONFIG_DM_VERITY=m
CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
# CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG_SECONDARY_KEYRING is not set
CONFIG_DM_VERITY_FEC=y
CONFIG_DM_SWITCH=m
CONFIG_DM_LOG_WRITES=m
CONFIG_DM_INTEGRITY=y
# CONFIG_DM_ZONED is not set
CONFIG_DM_AUDIT=y
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
CONFIG_TCM_FILEIO=m
# CONFIG_TCM_PSCSI is not set
# CONFIG_LOOPBACK_TARGET is not set
CONFIG_NETDEVICES=y
# CONFIG_NET_CORE is not set
CONFIG_ARCNET=m
CONFIG_ARCNET_1201=m
CONFIG_ARCNET_1051=m
CONFIG_ARCNET_RAW=m
# CONFIG_ARCNET_CAP is not set
CONFIG_ARCNET_COM90xx=m
CONFIG_ARCNET_COM90xxIO=m
CONFIG_ARCNET_RIM_I=m
CONFIG_ARCNET_COM20020=m
CONFIG_ARCNET_COM20020_CS=m
# CONFIG_ATM_DRIVERS is not set
# CONFIG_CAIF_DRIVERS is not set
# CONFIG_ETHERNET is not set
CONFIG_PHYLIB=y
CONFIG_SWPHY=y
CONFIG_LED_TRIGGER_PHY=y
CONFIG_FIXED_PHY=y

#
# MII PHY device drivers
#
# CONFIG_AMD_PHY is not set
CONFIG_ADIN_PHY=m
# CONFIG_ADIN1100_PHY is not set
CONFIG_AQUANTIA_PHY=m
CONFIG_AX88796B_PHY=m
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM54140_PHY is not set
CONFIG_BCM7XXX_PHY=y
# CONFIG_BCM84881_PHY is not set
# CONFIG_BCM87XX_PHY is not set
CONFIG_BCM_NET_PHYLIB=y
# CONFIG_CICADA_PHY is not set
CONFIG_CORTINA_PHY=y
CONFIG_DAVICOM_PHY=y
# CONFIG_ICPLUS_PHY is not set
CONFIG_LXT_PHY=y
# CONFIG_INTEL_XWAY_PHY is not set
CONFIG_LSI_ET1011C_PHY=y
CONFIG_MARVELL_PHY=y
CONFIG_MARVELL_10G_PHY=m
# CONFIG_MARVELL_88X2222_PHY is not set
CONFIG_MAXLINEAR_GPHY=m
# CONFIG_MEDIATEK_GE_PHY is not set
CONFIG_MICREL_PHY=y
# CONFIG_MICROCHIP_PHY is not set
# CONFIG_MICROCHIP_T1_PHY is not set
CONFIG_MICROSEMI_PHY=m
# CONFIG_MOTORCOMM_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_NXP_C45_TJA11XX_PHY is not set
CONFIG_AT803X_PHY=m
CONFIG_QSEMI_PHY=y
# CONFIG_REALTEK_PHY is not set
CONFIG_RENESAS_PHY=m
CONFIG_ROCKCHIP_PHY=y
CONFIG_SMSC_PHY=m
CONFIG_STE10XP=y
# CONFIG_TERANETICS_PHY is not set
CONFIG_DP83822_PHY=y
# CONFIG_DP83TC811_PHY is not set
CONFIG_DP83848_PHY=m
# CONFIG_DP83867_PHY is not set
CONFIG_DP83869_PHY=m
# CONFIG_DP83TD510_PHY is not set
CONFIG_VITESSE_PHY=m
CONFIG_XILINX_GMII2RGMII=m
CONFIG_MDIO_DEVICE=y
CONFIG_MDIO_BUS=y
CONFIG_FWNODE_MDIO=y
CONFIG_OF_MDIO=y
CONFIG_MDIO_DEVRES=y
CONFIG_MDIO_BITBANG=m
# CONFIG_MDIO_GPIO is not set

#
# MDIO Multiplexers
#
# CONFIG_MDIO_BUS_MUX_MULTIPLEXER is not set

#
# PCS device drivers
#
CONFIG_PCS_XPCS=m
# end of PCS device drivers

CONFIG_PPP=y
# CONFIG_PPP_BSDCOMP is not set
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_FILTER=y
CONFIG_PPP_MPPE=m
# CONFIG_PPP_MULTILINK is not set
CONFIG_PPPOATM=m
CONFIG_PPPOE=m
# CONFIG_PPP_ASYNC is not set
CONFIG_PPP_SYNC_TTY=m
CONFIG_SLIP=y
CONFIG_SLHC=y
# CONFIG_SLIP_COMPRESSED is not set
# CONFIG_SLIP_SMART is not set
CONFIG_SLIP_MODE_SLIP6=y

#
# S/390 network device drivers
#
CONFIG_CTCM=m
# CONFIG_NETIUCV is not set
# CONFIG_SMSGIUCV is not set
CONFIG_CCWGROUP=m
# end of S/390 network device drivers

#
# Host-side USB support is needed for USB Network Adapter support
#
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m
# CONFIG_IEEE802154_FAKELB is not set
CONFIG_IEEE802154_HWSIM=m

#
# Wireless WAN
#
# CONFIG_WWAN is not set
# end of Wireless WAN

CONFIG_NET_FAILOVER=y

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_LEDS is not set
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_SPARSEKMAP=m
# CONFIG_INPUT_MATRIXKMAP is not set
CONFIG_INPUT_VIVALDIFMAP=m

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=y
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_BYD=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_PS2_SMBUS=y
CONFIG_MOUSE_SERIAL=y
CONFIG_MOUSE_CYAPA=m
CONFIG_MOUSE_ELAN_I2C=m
# CONFIG_MOUSE_ELAN_I2C_I2C is not set
CONFIG_MOUSE_ELAN_I2C_SMBUS=y
CONFIG_MOUSE_VSXXXAA=y
# CONFIG_MOUSE_GPIO is not set
CONFIG_MOUSE_SYNAPTICS_I2C=m
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
CONFIG_INPUT_TOUCHSCREEN=y
# CONFIG_TOUCHSCREEN_AD7879 is not set
CONFIG_TOUCHSCREEN_ADC=y
CONFIG_TOUCHSCREEN_AR1021_I2C=y
CONFIG_TOUCHSCREEN_ATMEL_MXT=m
# CONFIG_TOUCHSCREEN_AUO_PIXCIR is not set
CONFIG_TOUCHSCREEN_BU21013=y
CONFIG_TOUCHSCREEN_BU21029=y
CONFIG_TOUCHSCREEN_CHIPONE_ICN8318=m
# CONFIG_TOUCHSCREEN_CY8CTMA140 is not set
# CONFIG_TOUCHSCREEN_CY8CTMG110 is not set
# CONFIG_TOUCHSCREEN_CYTTSP_CORE is not set
CONFIG_TOUCHSCREEN_CYTTSP4_CORE=m
CONFIG_TOUCHSCREEN_CYTTSP4_I2C=m
# CONFIG_TOUCHSCREEN_DYNAPRO is not set
CONFIG_TOUCHSCREEN_HAMPSHIRE=m
# CONFIG_TOUCHSCREEN_EETI is not set
CONFIG_TOUCHSCREEN_EGALAX=m
CONFIG_TOUCHSCREEN_EGALAX_SERIAL=m
CONFIG_TOUCHSCREEN_EXC3000=y
CONFIG_TOUCHSCREEN_FUJITSU=y
CONFIG_TOUCHSCREEN_GOODIX=m
CONFIG_TOUCHSCREEN_HIDEEP=y
# CONFIG_TOUCHSCREEN_HYCON_HY46XX is not set
CONFIG_TOUCHSCREEN_ILI210X=y
# CONFIG_TOUCHSCREEN_ILITEK is not set
# CONFIG_TOUCHSCREEN_S6SY761 is not set
CONFIG_TOUCHSCREEN_GUNZE=m
CONFIG_TOUCHSCREEN_EKTF2127=y
CONFIG_TOUCHSCREEN_ELAN=y
CONFIG_TOUCHSCREEN_ELO=m
CONFIG_TOUCHSCREEN_WACOM_W8001=y
# CONFIG_TOUCHSCREEN_WACOM_I2C is not set
CONFIG_TOUCHSCREEN_MAX11801=y
# CONFIG_TOUCHSCREEN_MCS5000 is not set
CONFIG_TOUCHSCREEN_MMS114=y
CONFIG_TOUCHSCREEN_MELFAS_MIP4=m
# CONFIG_TOUCHSCREEN_MSG2638 is not set
CONFIG_TOUCHSCREEN_MTOUCH=m
# CONFIG_TOUCHSCREEN_IMAGIS is not set
# CONFIG_TOUCHSCREEN_INEXIO is not set
CONFIG_TOUCHSCREEN_MK712=m
CONFIG_TOUCHSCREEN_PENMOUNT=m
CONFIG_TOUCHSCREEN_EDT_FT5X06=y
# CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
# CONFIG_TOUCHSCREEN_TOUCHWIN is not set
CONFIG_TOUCHSCREEN_PIXCIR=m
CONFIG_TOUCHSCREEN_WDT87XX_I2C=y
# CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
# CONFIG_TOUCHSCREEN_TSC_SERIO is not set
CONFIG_TOUCHSCREEN_TSC200X_CORE=m
CONFIG_TOUCHSCREEN_TSC2004=m
CONFIG_TOUCHSCREEN_TSC2007=m
# CONFIG_TOUCHSCREEN_TSC2007_IIO is not set
# CONFIG_TOUCHSCREEN_RM_TS is not set
# CONFIG_TOUCHSCREEN_SILEAD is not set
# CONFIG_TOUCHSCREEN_SIS_I2C is not set
# CONFIG_TOUCHSCREEN_ST1232 is not set
# CONFIG_TOUCHSCREEN_STMFTS is not set
# CONFIG_TOUCHSCREEN_SX8654 is not set
CONFIG_TOUCHSCREEN_TPS6507X=m
CONFIG_TOUCHSCREEN_ZET6223=m
# CONFIG_TOUCHSCREEN_ZFORCE is not set
CONFIG_TOUCHSCREEN_ROHM_BU21023=y
CONFIG_TOUCHSCREEN_IQS5XX=y
CONFIG_TOUCHSCREEN_ZINITIX=m
# CONFIG_INPUT_MISC is not set
CONFIG_RMI4_CORE=m
CONFIG_RMI4_I2C=m
CONFIG_RMI4_SMB=m
CONFIG_RMI4_F03=y
CONFIG_RMI4_F03_SERIO=m
CONFIG_RMI4_2D_SENSOR=y
CONFIG_RMI4_F11=y
# CONFIG_RMI4_F12 is not set
# CONFIG_RMI4_F30 is not set
# CONFIG_RMI4_F34 is not set
CONFIG_RMI4_F3A=y
# CONFIG_RMI4_F55 is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_SERPORT=y
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
CONFIG_SERIO_PS2MULT=m
# CONFIG_SERIO_GPIO_PS2 is not set
# CONFIG_USERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
# CONFIG_LDISC_AUTOLOAD is not set
CONFIG_N_GSM=y
# CONFIG_NULL_TTY is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IUCV=y
# CONFIG_RPMSG_TTY is not set
CONFIG_SERIAL_DEV_BUS=y
CONFIG_SERIAL_DEV_CTRL_TTYPORT=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_IPMB_DEVICE_INTERFACE=m
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_VIRTIO is not set
# CONFIG_HW_RANDOM_S390 is not set

#
# PCMCIA character devices
#
CONFIG_SYNCLINK_CS=m
CONFIG_CARDMAN_4000=m
CONFIG_CARDMAN_4040=m
# CONFIG_SCR24X is not set
CONFIG_IPWIRELESS=m
# end of PCMCIA character devices

# CONFIG_DEVMEM is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# S/390 character device drivers
#
# CONFIG_TN3270 is not set
CONFIG_TN3215=y
# CONFIG_TN3215_CONSOLE is not set
# CONFIG_SCLP_TTY is not set
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
# CONFIG_HMC_DRV is not set
CONFIG_SCLP_OFB=y
CONFIG_S390_UV_UAPI=m
# CONFIG_S390_TAPE is not set
CONFIG_VMLOGRDR=m
CONFIG_VMCP=y
CONFIG_VMCP_CMA_SIZE=4
# CONFIG_MONREADER is not set
CONFIG_MONWRITER=y
CONFIG_S390_VMUR=m
CONFIG_XILLYBUS_CLASS=y
CONFIG_XILLYBUS=y
# CONFIG_XILLYBUS_OF is not set
CONFIG_RANDOM_TRUST_BOOTLOADER=y
# end of Character devices

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
CONFIG_I2C_MUX=y

#
# Multiplexer I2C Chip support
#
CONFIG_I2C_ARB_GPIO_CHALLENGE=y
CONFIG_I2C_MUX_GPIO=m
# CONFIG_I2C_MUX_GPMUX is not set
# CONFIG_I2C_MUX_LTC4306 is not set
CONFIG_I2C_MUX_PCA9541=y
CONFIG_I2C_MUX_PCA954x=m
# CONFIG_I2C_MUX_PINCTRL is not set
CONFIG_I2C_DEMUX_PINCTRL=m
CONFIG_I2C_MUX_MLXCPLD=m
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_STUB=m
CONFIG_I2C_SLAVE=y
CONFIG_I2C_SLAVE_EEPROM=y
CONFIG_I2C_SLAVE_TESTUNIT=y
CONFIG_I2C_DEBUG_CORE=y
# CONFIG_I2C_DEBUG_ALGO is not set
# end of I2C support

CONFIG_I3C=y
CONFIG_SPMI=m
CONFIG_HSI=m
CONFIG_HSI_BOARDINFO=y

#
# HSI controllers
#

#
# HSI clients
#
CONFIG_HSI_CHAR=m
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set
# CONFIG_NTP_PPS is not set

#
# PPS clients support
#
CONFIG_PPS_CLIENT_KTIMER=y
# CONFIG_PPS_CLIENT_LDISC is not set
CONFIG_PPS_CLIENT_GPIO=y

#
# PPS generators support
#

#
# PTP clock support
#
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_PTP_1588_CLOCK_OPTIONAL=y
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
CONFIG_DEBUG_PINCTRL=y
CONFIG_PINCTRL_MCP23S08_I2C=y
CONFIG_PINCTRL_MCP23S08=y
# CONFIG_PINCTRL_SX150X is not set

#
# Renesas pinctrl drivers
#
# end of Renesas pinctrl drivers

CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_GPIOLIB_IRQCHIP=y
# CONFIG_DEBUG_GPIO is not set
CONFIG_GPIO_CDEV=y
CONFIG_GPIO_CDEV_V1=y

#
# I2C GPIO expanders
#
CONFIG_GPIO_ADP5588=m
# CONFIG_GPIO_MAX7300 is not set
CONFIG_GPIO_MAX732X=m
# CONFIG_GPIO_PCA953X is not set
CONFIG_GPIO_PCA9570=m
CONFIG_GPIO_PCF857X=y
CONFIG_GPIO_TPIC2810=m
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
# end of MFD GPIO expanders

#
# Virtual GPIO drivers
#
CONFIG_GPIO_AGGREGATOR=y
CONFIG_GPIO_MOCKUP=m
# CONFIG_GPIO_VIRTIO is not set
# CONFIG_GPIO_SIM is not set
# end of Virtual GPIO drivers

# CONFIG_POWER_RESET is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_GENERIC_ADC_BATTERY is not set
# CONFIG_IP5XXX_POWER is not set
# CONFIG_TEST_POWER is not set
CONFIG_CHARGER_ADP5061=y
CONFIG_BATTERY_CW2015=m
CONFIG_BATTERY_DS2782=m
# CONFIG_BATTERY_SAMSUNG_SDI is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_CHARGER_SBS is not set
CONFIG_MANAGER_SBS=y
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_MAX17040 is not set
CONFIG_BATTERY_MAX17042=m
CONFIG_CHARGER_MAX8903=m
# CONFIG_CHARGER_LP8727 is not set
CONFIG_CHARGER_GPIO=y
# CONFIG_CHARGER_MANAGER is not set
CONFIG_CHARGER_LT3651=y
# CONFIG_CHARGER_LTC4162L is not set
# CONFIG_CHARGER_DETECTOR_MAX14656 is not set
# CONFIG_CHARGER_MAX77976 is not set
# CONFIG_CHARGER_BQ2415X is not set
# CONFIG_CHARGER_BQ24190 is not set
CONFIG_CHARGER_BQ24257=m
# CONFIG_CHARGER_BQ24735 is not set
CONFIG_CHARGER_BQ2515X=m
CONFIG_CHARGER_BQ25890=y
# CONFIG_CHARGER_BQ25980 is not set
# CONFIG_CHARGER_BQ256XX is not set
CONFIG_CHARGER_SMB347=m
CONFIG_BATTERY_GAUGE_LTC2941=y
CONFIG_BATTERY_RT5033=y
# CONFIG_CHARGER_RT9455 is not set
CONFIG_CHARGER_UCS1002=m
CONFIG_CHARGER_BD99954=m
# CONFIG_BATTERY_UG3105 is not set
CONFIG_THERMAL=y
CONFIG_THERMAL_NETLINK=y
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
CONFIG_THERMAL_OF=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
# CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE is not set
CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE=y
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
CONFIG_THERMAL_GOV_FAIR_SHARE=y
# CONFIG_THERMAL_GOV_STEP_WISE is not set
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
CONFIG_CPU_THERMAL=y
# CONFIG_THERMAL_EMULATION is not set
# CONFIG_GENERIC_ADC_THERMAL is not set
# CONFIG_WATCHDOG is not set
CONFIG_REGULATOR=y
CONFIG_REGULATOR_DEBUG=y
CONFIG_REGULATOR_FIXED_VOLTAGE=m
# CONFIG_REGULATOR_VIRTUAL_CONSUMER is not set
CONFIG_REGULATOR_USERSPACE_CONSUMER=m
CONFIG_REGULATOR_88PG86X=y
CONFIG_REGULATOR_ACT8865=y
CONFIG_REGULATOR_AD5398=m
# CONFIG_REGULATOR_DA9121 is not set
CONFIG_REGULATOR_DA9210=y
CONFIG_REGULATOR_DA9211=m
# CONFIG_REGULATOR_FAN53555 is not set
CONFIG_REGULATOR_FAN53880=m
CONFIG_REGULATOR_GPIO=y
CONFIG_REGULATOR_ISL9305=y
CONFIG_REGULATOR_ISL6271A=y
# CONFIG_REGULATOR_LP3971 is not set
CONFIG_REGULATOR_LP3972=m
# CONFIG_REGULATOR_LP872X is not set
CONFIG_REGULATOR_LP8755=y
CONFIG_REGULATOR_LTC3589=m
CONFIG_REGULATOR_LTC3676=y
# CONFIG_REGULATOR_MAX1586 is not set
CONFIG_REGULATOR_MAX8649=m
# CONFIG_REGULATOR_MAX8660 is not set
# CONFIG_REGULATOR_MAX8893 is not set
CONFIG_REGULATOR_MAX8952=m
CONFIG_REGULATOR_MAX8973=m
# CONFIG_REGULATOR_MAX20086 is not set
# CONFIG_REGULATOR_MAX77826 is not set
CONFIG_REGULATOR_MCP16502=m
CONFIG_REGULATOR_MP5416=m
CONFIG_REGULATOR_MP8859=m
CONFIG_REGULATOR_MP886X=y
CONFIG_REGULATOR_MPQ7920=m
CONFIG_REGULATOR_MT6311=m
# CONFIG_REGULATOR_MT6315 is not set
CONFIG_REGULATOR_PCA9450=y
# CONFIG_REGULATOR_PF8X00 is not set
# CONFIG_REGULATOR_PFUZE100 is not set
# CONFIG_REGULATOR_PV88060 is not set
CONFIG_REGULATOR_PV88080=y
CONFIG_REGULATOR_PV88090=y
CONFIG_REGULATOR_QCOM_SPMI=m
CONFIG_REGULATOR_QCOM_USB_VBUS=m
CONFIG_REGULATOR_RT4801=m
# CONFIG_REGULATOR_RT5190A is not set
# CONFIG_REGULATOR_RT5759 is not set
# CONFIG_REGULATOR_RT6160 is not set
# CONFIG_REGULATOR_RT6245 is not set
# CONFIG_REGULATOR_RTQ2134 is not set
CONFIG_REGULATOR_RTMV20=m
# CONFIG_REGULATOR_RTQ6752 is not set
CONFIG_REGULATOR_SLG51000=y
# CONFIG_REGULATOR_SY7636A is not set
CONFIG_REGULATOR_SY8106A=y
CONFIG_REGULATOR_SY8824X=y
# CONFIG_REGULATOR_SY8827N is not set
CONFIG_REGULATOR_TPS51632=y
CONFIG_REGULATOR_TPS62360=y
# CONFIG_REGULATOR_TPS6286X is not set
# CONFIG_REGULATOR_TPS65023 is not set
# CONFIG_REGULATOR_TPS6507X is not set
# CONFIG_REGULATOR_TPS65132 is not set
CONFIG_REGULATOR_VCTRL=m
# CONFIG_REGULATOR_QCOM_LABIBB is not set
CONFIG_RC_CORE=m
CONFIG_LIRC=y
CONFIG_RC_MAP=m
# CONFIG_RC_DECODERS is not set
# CONFIG_RC_DEVICES is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

#
# Graphics support
#

#
# Console display driver support
#
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
# end of Console display driver support
# end of Graphics support

#
# HID support
#
CONFIG_HID=m
# CONFIG_HID_BATTERY_STRENGTH is not set
# CONFIG_HIDRAW is not set
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set

#
# Special HID drivers
#
CONFIG_HID_A4TECH=m
CONFIG_HID_ACRUX=m
# CONFIG_HID_ACRUX_FF is not set
CONFIG_HID_APPLE=m
CONFIG_HID_AUREAL=m
# CONFIG_HID_BELKIN is not set
# CONFIG_HID_CHERRY is not set
CONFIG_HID_COUGAR=m
CONFIG_HID_MACALLY=m
CONFIG_HID_CMEDIA=m
CONFIG_HID_CYPRESS=m
CONFIG_HID_DRAGONRISE=m
# CONFIG_DRAGONRISE_FF is not set
CONFIG_HID_EMS_FF=m
# CONFIG_HID_ELECOM is not set
CONFIG_HID_EZKEY=m
# CONFIG_HID_GEMBIRD is not set
# CONFIG_HID_GFRM is not set
# CONFIG_HID_GLORIOUS is not set
CONFIG_HID_VIVALDI_COMMON=m
CONFIG_HID_VIVALDI=m
CONFIG_HID_KEYTOUCH=m
CONFIG_HID_KYE=m
# CONFIG_HID_WALTOP is not set
CONFIG_HID_VIEWSONIC=m
# CONFIG_HID_XIAOMI is not set
# CONFIG_HID_GYRATION is not set
# CONFIG_HID_ICADE is not set
CONFIG_HID_ITE=m
CONFIG_HID_JABRA=m
CONFIG_HID_TWINHAN=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LCPOWER=m
CONFIG_HID_LED=m
CONFIG_HID_LENOVO=m
CONFIG_HID_MAGICMOUSE=m
# CONFIG_HID_MALTRON is not set
# CONFIG_HID_MAYFLASH is not set
# CONFIG_HID_REDRAGON is not set
CONFIG_HID_MICROSOFT=m
# CONFIG_HID_MONTEREY is not set
# CONFIG_HID_MULTITOUCH is not set
# CONFIG_HID_NINTENDO is not set
CONFIG_HID_NTI=m
CONFIG_HID_ORTEK=m
# CONFIG_HID_PANTHERLORD is not set
CONFIG_HID_PETALYNX=m
# CONFIG_HID_PICOLCD is not set
CONFIG_HID_PLANTRONICS=m
# CONFIG_HID_PLAYSTATION is not set
# CONFIG_HID_RAZER is not set
CONFIG_HID_PRIMAX=m
CONFIG_HID_SAITEK=m
# CONFIG_HID_SEMITEK is not set
CONFIG_HID_SPEEDLINK=m
# CONFIG_HID_STEAM is not set
# CONFIG_HID_STEELSERIES is not set
CONFIG_HID_SUNPLUS=m
# CONFIG_HID_RMI is not set
CONFIG_HID_GREENASIA=m
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_SMARTJOYPLUS=m
CONFIG_SMARTJOYPLUS_FF=y
CONFIG_HID_TIVO=m
CONFIG_HID_TOPSEED=m
# CONFIG_HID_THINGM is not set
CONFIG_HID_UDRAW_PS3=m
CONFIG_HID_WIIMOTE=m
# CONFIG_HID_XINMO is not set
# CONFIG_HID_ZEROPLUS is not set
# CONFIG_HID_ZYDACRON is not set
# CONFIG_HID_ALPS is not set
# end of Special HID drivers

#
# I2C HID support
#
# CONFIG_I2C_HID_OF is not set
# CONFIG_I2C_HID_OF_GOODIX is not set
# end of I2C HID support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_SCSI_UFSHCD is not set
CONFIG_MEMSTICK=m
CONFIG_MEMSTICK_DEBUG=y

#
# MemoryStick drivers
#
# CONFIG_MEMSTICK_UNSAFE_RESUME is not set
CONFIG_MSPRO_BLOCK=m
CONFIG_MS_BLOCK=m

#
# MemoryStick Host Controller Drivers
#
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_CLASS_FLASH=m
CONFIG_LEDS_CLASS_MULTICOLOR=m
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set

#
# LED drivers
#
# CONFIG_LEDS_AN30259A is not set
# CONFIG_LEDS_AW2013 is not set
CONFIG_LEDS_LM3530=y
CONFIG_LEDS_LM3532=y
CONFIG_LEDS_LM3642=y
# CONFIG_LEDS_LM3692X is not set
CONFIG_LEDS_PCA9532=y
CONFIG_LEDS_PCA9532_GPIO=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_LP3944=m
CONFIG_LEDS_LP3952=y
CONFIG_LEDS_LP50XX=m
CONFIG_LEDS_LP55XX_COMMON=m
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_LP5523 is not set
# CONFIG_LEDS_LP5562 is not set
CONFIG_LEDS_LP8501=m
CONFIG_LEDS_LP8860=m
CONFIG_LEDS_PCA955X=y
CONFIG_LEDS_PCA955X_GPIO=y
CONFIG_LEDS_PCA963X=m
CONFIG_LEDS_REGULATOR=m
CONFIG_LEDS_BD2802=y
CONFIG_LEDS_LT3593=m
CONFIG_LEDS_TCA6507=m
# CONFIG_LEDS_TLC591XX is not set
# CONFIG_LEDS_LM355x is not set
CONFIG_LEDS_IS31FL319X=m
# CONFIG_LEDS_IS31FL32XX is not set

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=y
# CONFIG_LEDS_MLXREG is not set
CONFIG_LEDS_USER=y
CONFIG_LEDS_TI_LMU_COMMON=y
# CONFIG_LEDS_LM3697 is not set

#
# Flash and Torch LED drivers
#
CONFIG_LEDS_AAT1290=m
# CONFIG_LEDS_AS3645A is not set
# CONFIG_LEDS_KTD2692 is not set
CONFIG_LEDS_LM3601X=m
# CONFIG_LEDS_RT4505 is not set
# CONFIG_LEDS_RT8515 is not set
CONFIG_LEDS_SGM3140=m

#
# RGB LED drivers
#

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
# CONFIG_LEDS_TRIGGER_ONESHOT is not set
CONFIG_LEDS_TRIGGER_MTD=y
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_CPU is not set
CONFIG_LEDS_TRIGGER_ACTIVITY=m
# CONFIG_LEDS_TRIGGER_GPIO is not set
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=m
# CONFIG_LEDS_TRIGGER_CAMERA is not set
CONFIG_LEDS_TRIGGER_PANIC=y
CONFIG_LEDS_TRIGGER_NETDEV=m
CONFIG_LEDS_TRIGGER_PATTERN=y
CONFIG_LEDS_TRIGGER_AUDIO=y
# CONFIG_LEDS_TRIGGER_TTY is not set

#
# Simple LED drivers
#
CONFIG_ACCESSIBILITY=y

#
# Speakup console speech
#
CONFIG_SPEAKUP=m
# CONFIG_SPEAKUP_SYNTH_ACNTSA is not set
CONFIG_SPEAKUP_SYNTH_APOLLO=m
CONFIG_SPEAKUP_SYNTH_AUDPTR=m
CONFIG_SPEAKUP_SYNTH_BNS=m
# CONFIG_SPEAKUP_SYNTH_DECTLK is not set
CONFIG_SPEAKUP_SYNTH_DECEXT=m
# CONFIG_SPEAKUP_SYNTH_LTLK is not set
CONFIG_SPEAKUP_SYNTH_SOFT=m
CONFIG_SPEAKUP_SYNTH_SPKOUT=m
CONFIG_SPEAKUP_SYNTH_TXPRT=m
# CONFIG_SPEAKUP_SYNTH_DUMMY is not set
# end of Speakup console speech

CONFIG_DMADEVICES=y
CONFIG_DMADEVICES_DEBUG=y
CONFIG_DMADEVICES_VDEBUG=y

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_OF=y
CONFIG_FSL_EDMA=m
CONFIG_INTEL_IDMA64=y
# CONFIG_QCOM_HIDMA is not set

#
# DMA Clients
#
# CONFIG_ASYNC_TX_DMA is not set
CONFIG_DMATEST=m
CONFIG_DMA_ENGINE_RAID=y

#
# DMABUF options
#
# CONFIG_SYNC_FILE is not set
# CONFIG_DMABUF_HEAPS is not set
# end of DMABUF options

# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
CONFIG_VFIO=m
CONFIG_VFIO_IOMMU_TYPE1=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_MDEV=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS=y
# CONFIG_VIRTIO_MENU is not set
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
# CONFIG_VHOST_NET is not set
CONFIG_VHOST_SCSI=m
# CONFIG_VHOST_VSOCK is not set
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

CONFIG_GREYBUS=y
# CONFIG_COMEDI is not set
CONFIG_STAGING=y

#
# IIO staging drivers
#

#
# Accelerometers
#
# end of Accelerometers

#
# Analog to digital converters
#
# end of Analog to digital converters

#
# Analog digital bi-direction converters
#
CONFIG_ADT7316=y
# CONFIG_ADT7316_I2C is not set
# end of Analog digital bi-direction converters

#
# Capacitance to digital converters
#
# CONFIG_AD7746 is not set
# end of Capacitance to digital converters

#
# Direct Digital Synthesis
#
# end of Direct Digital Synthesis

#
# Network Analyzer, Impedance Converters
#
# CONFIG_AD5933 is not set
# end of Network Analyzer, Impedance Converters

#
# Active energy metering IC
#
# CONFIG_ADE7854 is not set
# end of Active energy metering IC

#
# Resolver to digital converters
#
# end of Resolver to digital converters
# end of IIO staging drivers

CONFIG_STAGING_MEDIA=y
CONFIG_GREYBUS_BOOTROM=m
CONFIG_GREYBUS_HID=m
# CONFIG_GREYBUS_LIGHT is not set
CONFIG_GREYBUS_LOG=y
CONFIG_GREYBUS_LOOPBACK=m
# CONFIG_GREYBUS_POWER is not set
CONFIG_GREYBUS_RAW=m
CONFIG_GREYBUS_VIBRATOR=m
CONFIG_GREYBUS_BRIDGED_PHY=y
CONFIG_GREYBUS_GPIO=y
CONFIG_GREYBUS_I2C=m
CONFIG_GREYBUS_UART=y
CONFIG_FIELDBUS_DEV=m
CONFIG_HMS_ANYBUSS_BUS=m
CONFIG_HMS_PROFINET=m

#
# VME Device Drivers
#
CONFIG_HAVE_CLK=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
CONFIG_COMMON_CLK_MAX9485=m
# CONFIG_COMMON_CLK_SI5341 is not set
CONFIG_COMMON_CLK_SI5351=y
CONFIG_COMMON_CLK_SI514=m
# CONFIG_COMMON_CLK_SI544 is not set
# CONFIG_COMMON_CLK_SI570 is not set
CONFIG_COMMON_CLK_CDCE706=m
CONFIG_COMMON_CLK_CDCE925=m
# CONFIG_COMMON_CLK_CS2000_CP is not set
# CONFIG_COMMON_CLK_RS9_PCIE is not set
# CONFIG_COMMON_CLK_VC5 is not set
CONFIG_COMMON_CLK_FIXED_MMIO=y
# CONFIG_CLK_KUNIT_TEST is not set
# CONFIG_CLK_GATE_KUNIT_TEST is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
# CONFIG_MICROCHIP_PIT64B is not set
# end of Clock Source drivers

CONFIG_MAILBOX=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

CONFIG_IOMMU_DEBUGFS=y
CONFIG_IOMMU_DEFAULT_DMA_STRICT=y
# CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_OF_IOMMU=y
CONFIG_S390_CCW_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
CONFIG_RPMSG=m
# CONFIG_RPMSG_CHAR is not set
# CONFIG_RPMSG_CTRL is not set
CONFIG_RPMSG_NS=m
CONFIG_RPMSG_VIRTIO=m
# end of Rpmsg drivers

CONFIG_SOUNDWIRE=m

#
# SoundWire Devices
#

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

CONFIG_SOC_TI=y

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
CONFIG_EXTCON=y

#
# Extcon Device Drivers
#
# CONFIG_EXTCON_ADC_JACK is not set
CONFIG_EXTCON_FSA9480=y
CONFIG_EXTCON_GPIO=y
CONFIG_EXTCON_MAX3355=y
# CONFIG_EXTCON_PTN5150 is not set
CONFIG_EXTCON_RT8973A=y
CONFIG_EXTCON_SM5502=y
CONFIG_EXTCON_USB_GPIO=y
# CONFIG_EXTCON_USBC_TUSB320 is not set
CONFIG_MEMORY=y
CONFIG_IIO=y
CONFIG_IIO_BUFFER=y
CONFIG_IIO_BUFFER_CB=y
# CONFIG_IIO_BUFFER_DMA is not set
# CONFIG_IIO_BUFFER_DMAENGINE is not set
CONFIG_IIO_BUFFER_HW_CONSUMER=y
CONFIG_IIO_KFIFO_BUF=y
CONFIG_IIO_TRIGGERED_BUFFER=y
CONFIG_IIO_CONFIGFS=y
CONFIG_IIO_TRIGGER=y
CONFIG_IIO_CONSUMERS_PER_TRIGGER=2
CONFIG_IIO_SW_DEVICE=y
# CONFIG_IIO_SW_TRIGGER is not set
CONFIG_IIO_TRIGGERED_EVENT=m

#
# Accelerometers
#
# CONFIG_ADXL313_I2C is not set
CONFIG_ADXL345=m
CONFIG_ADXL345_I2C=m
# CONFIG_ADXL355_I2C is not set
# CONFIG_ADXL367_I2C is not set
CONFIG_ADXL372=m
CONFIG_ADXL372_I2C=m
CONFIG_BMA180=m
CONFIG_BMA400=y
CONFIG_BMA400_I2C=y
CONFIG_BMC150_ACCEL=m
CONFIG_BMC150_ACCEL_I2C=m
# CONFIG_DA280 is not set
CONFIG_DA311=y
CONFIG_DMARD06=y
CONFIG_DMARD09=m
# CONFIG_DMARD10 is not set
# CONFIG_FXLS8962AF_I2C is not set
CONFIG_IIO_ST_ACCEL_3AXIS=m
CONFIG_IIO_ST_ACCEL_I2C_3AXIS=m
# CONFIG_KXSD9 is not set
# CONFIG_KXCJK1013 is not set
# CONFIG_MC3230 is not set
CONFIG_MMA7455=y
CONFIG_MMA7455_I2C=y
# CONFIG_MMA7660 is not set
# CONFIG_MMA8452 is not set
CONFIG_MMA9551_CORE=y
CONFIG_MMA9551=y
CONFIG_MMA9553=y
CONFIG_MXC4005=m
CONFIG_MXC6255=y
CONFIG_STK8312=y
# CONFIG_STK8BA50 is not set
# end of Accelerometers

#
# Analog to digital converters
#
CONFIG_AD7091R5=y
CONFIG_AD7291=m
CONFIG_AD799X=m
CONFIG_ENVELOPE_DETECTOR=m
CONFIG_HX711=y
# CONFIG_INA2XX_ADC is not set
CONFIG_LTC2471=m
CONFIG_LTC2485=y
# CONFIG_LTC2497 is not set
# CONFIG_MAX1363 is not set
CONFIG_MAX9611=m
CONFIG_MCP3422=y
CONFIG_NAU7802=y
CONFIG_QCOM_VADC_COMMON=m
# CONFIG_QCOM_SPMI_IADC is not set
CONFIG_QCOM_SPMI_VADC=m
CONFIG_QCOM_SPMI_ADC5=m
CONFIG_SD_ADC_MODULATOR=m
CONFIG_TI_ADC081C=m
CONFIG_TI_ADS1015=y
# end of Analog to digital converters

#
# Analog to digital and digital to analog converters
#
# end of Analog to digital and digital to analog converters

#
# Analog Front Ends
#
CONFIG_IIO_RESCALE=m
# end of Analog Front Ends

#
# Amplifiers
#
CONFIG_HMC425=m
# end of Amplifiers

#
# Capacitance to digital converters
#
CONFIG_AD7150=m
# end of Capacitance to digital converters

#
# Chemical Sensors
#
# CONFIG_ATLAS_PH_SENSOR is not set
CONFIG_ATLAS_EZO_SENSOR=m
CONFIG_BME680=m
CONFIG_BME680_I2C=m
CONFIG_CCS811=m
CONFIG_IAQCORE=m
CONFIG_PMS7003=m
# CONFIG_SCD30_CORE is not set
# CONFIG_SCD4X is not set
CONFIG_SENSIRION_SGP30=m
# CONFIG_SENSIRION_SGP40 is not set
# CONFIG_SPS30_I2C is not set
# CONFIG_SPS30_SERIAL is not set
# CONFIG_SENSEAIR_SUNRISE_CO2 is not set
CONFIG_VZ89X=y
# end of Chemical Sensors

#
# Hid Sensor IIO Common
#
# end of Hid Sensor IIO Common

CONFIG_IIO_MS_SENSORS_I2C=y

#
# IIO SCMI Sensors
#
# end of IIO SCMI Sensors

#
# SSP Sensor Common
#
# end of SSP Sensor Common

CONFIG_IIO_ST_SENSORS_I2C=m
CONFIG_IIO_ST_SENSORS_CORE=m

#
# Digital to analog converters
#
# CONFIG_AD5064 is not set
# CONFIG_AD5380 is not set
CONFIG_AD5446=y
CONFIG_AD5592R_BASE=m
CONFIG_AD5593R=m
# CONFIG_AD5696_I2C is not set
CONFIG_DPOT_DAC=m
CONFIG_DS4424=m
CONFIG_M62332=y
CONFIG_MAX517=y
# CONFIG_MAX5821 is not set
CONFIG_MCP4725=y
CONFIG_TI_DAC5571=y
# end of Digital to analog converters

#
# IIO dummy driver
#
CONFIG_IIO_SIMPLE_DUMMY=m
# CONFIG_IIO_SIMPLE_DUMMY_EVENTS is not set
CONFIG_IIO_SIMPLE_DUMMY_BUFFER=y
# end of IIO dummy driver

#
# Filters
#
# end of Filters

#
# Frequency Synthesizers DDS/PLL
#

#
# Clock Generator/Distribution
#
# end of Clock Generator/Distribution

#
# Phase-Locked Loop (PLL) frequency synthesizers
#
# end of Phase-Locked Loop (PLL) frequency synthesizers
# end of Frequency Synthesizers DDS/PLL

#
# Digital gyroscope sensors
#
# CONFIG_BMG160 is not set
# CONFIG_FXAS21002C is not set
CONFIG_MPU3050=y
CONFIG_MPU3050_I2C=y
# CONFIG_IIO_ST_GYRO_3AXIS is not set
# CONFIG_ITG3200 is not set
# end of Digital gyroscope sensors

#
# Health Sensors
#

#
# Heart Rate Monitors
#
CONFIG_AFE4404=m
# CONFIG_MAX30100 is not set
CONFIG_MAX30102=y
# end of Heart Rate Monitors
# end of Health Sensors

#
# Humidity sensors
#
CONFIG_AM2315=y
CONFIG_DHT11=m
CONFIG_HDC100X=y
CONFIG_HDC2010=m
# CONFIG_HTS221 is not set
CONFIG_HTU21=y
# CONFIG_SI7005 is not set
CONFIG_SI7020=m
# end of Humidity sensors

#
# Inertial measurement units
#
CONFIG_BMI160=m
CONFIG_BMI160_I2C=m
CONFIG_FXOS8700=m
CONFIG_FXOS8700_I2C=m
# CONFIG_KMX61 is not set
CONFIG_INV_ICM42600=m
CONFIG_INV_ICM42600_I2C=m
CONFIG_INV_MPU6050_IIO=m
CONFIG_INV_MPU6050_I2C=m
CONFIG_IIO_ST_LSM6DSX=y
CONFIG_IIO_ST_LSM6DSX_I2C=y
CONFIG_IIO_ST_LSM6DSX_I3C=y
# CONFIG_IIO_ST_LSM9DS0 is not set
# end of Inertial measurement units

#
# Light sensors
#
CONFIG_ADJD_S311=m
# CONFIG_ADUX1020 is not set
CONFIG_AL3010=m
CONFIG_AL3320A=m
# CONFIG_APDS9300 is not set
CONFIG_APDS9960=y
CONFIG_AS73211=m
# CONFIG_BH1750 is not set
# CONFIG_BH1780 is not set
CONFIG_CM32181=m
# CONFIG_CM3232 is not set
CONFIG_CM3323=y
CONFIG_CM3605=m
CONFIG_CM36651=m
CONFIG_GP2AP002=m
CONFIG_GP2AP020A00F=y
CONFIG_SENSORS_ISL29018=m
CONFIG_SENSORS_ISL29028=y
CONFIG_ISL29125=m
# CONFIG_JSA1212 is not set
CONFIG_RPR0521=m
CONFIG_LTR501=m
# CONFIG_LV0104CS is not set
# CONFIG_MAX44000 is not set
# CONFIG_MAX44009 is not set
CONFIG_NOA1305=m
CONFIG_OPT3001=m
CONFIG_PA12203001=y
CONFIG_SI1133=m
CONFIG_SI1145=y
# CONFIG_STK3310 is not set
CONFIG_ST_UVIS25=m
CONFIG_ST_UVIS25_I2C=m
CONFIG_TCS3414=m
CONFIG_TCS3472=y
CONFIG_SENSORS_TSL2563=y
CONFIG_TSL2583=y
# CONFIG_TSL2591 is not set
# CONFIG_TSL2772 is not set
# CONFIG_TSL4531 is not set
CONFIG_US5182D=y
# CONFIG_VCNL4000 is not set
CONFIG_VCNL4035=m
CONFIG_VEML6030=y
CONFIG_VEML6070=m
# CONFIG_VL6180 is not set
CONFIG_ZOPT2201=y
# end of Light sensors

#
# Magnetometer sensors
#
CONFIG_AK8974=y
# CONFIG_AK8975 is not set
# CONFIG_AK09911 is not set
CONFIG_BMC150_MAGN=y
CONFIG_BMC150_MAGN_I2C=y
CONFIG_MAG3110=m
CONFIG_MMC35240=m
# CONFIG_IIO_ST_MAGN_3AXIS is not set
# CONFIG_SENSORS_HMC5843_I2C is not set
CONFIG_SENSORS_RM3100=m
CONFIG_SENSORS_RM3100_I2C=m
# CONFIG_YAMAHA_YAS530 is not set
# end of Magnetometer sensors

#
# Multiplexers
#
CONFIG_IIO_MUX=m
# end of Multiplexers

#
# Inclinometer sensors
#
# end of Inclinometer sensors

#
# Triggers - standalone
#
# CONFIG_IIO_INTERRUPT_TRIGGER is not set
CONFIG_IIO_SYSFS_TRIGGER=y
# end of Triggers - standalone

#
# Linear and angular position sensors
#
# end of Linear and angular position sensors

#
# Digital potentiometers
#
# CONFIG_AD5110 is not set
CONFIG_AD5272=y
CONFIG_DS1803=m
CONFIG_MAX5432=m
CONFIG_MCP4018=y
# CONFIG_MCP4531 is not set
CONFIG_TPL0102=y
# end of Digital potentiometers

#
# Digital potentiostats
#
CONFIG_LMP91000=m
# end of Digital potentiostats

#
# Pressure sensors
#
CONFIG_ABP060MG=y
CONFIG_BMP280=m
CONFIG_BMP280_I2C=m
# CONFIG_DLHL60D is not set
CONFIG_DPS310=y
CONFIG_HP03=y
CONFIG_ICP10100=y
# CONFIG_MPL115_I2C is not set
CONFIG_MPL3115=m
# CONFIG_MS5611 is not set
CONFIG_MS5637=y
CONFIG_IIO_ST_PRESS=m
CONFIG_IIO_ST_PRESS_I2C=m
# CONFIG_T5403 is not set
CONFIG_HP206C=m
# CONFIG_ZPA2326 is not set
# end of Pressure sensors

#
# Lightning sensors
#
# end of Lightning sensors

#
# Proximity and distance sensors
#
CONFIG_ISL29501=m
# CONFIG_LIDAR_LITE_V2 is not set
# CONFIG_MB1232 is not set
# CONFIG_PING is not set
CONFIG_RFD77402=m
CONFIG_SRF04=y
CONFIG_SX_COMMON=y
CONFIG_SX9310=y
# CONFIG_SX9324 is not set
# CONFIG_SX9360 is not set
# CONFIG_SX9500 is not set
CONFIG_SRF08=y
# CONFIG_VCNL3020 is not set
# CONFIG_VL53L0X_I2C is not set
# end of Proximity and distance sensors

#
# Resolver to digital converters
#
# end of Resolver to digital converters

#
# Temperature sensors
#
CONFIG_MLX90614=m
CONFIG_MLX90632=y
# CONFIG_TMP006 is not set
CONFIG_TMP007=m
# CONFIG_TMP117 is not set
# CONFIG_TSYS01 is not set
# CONFIG_TSYS02D is not set
# end of Temperature sensors

# CONFIG_PWM is not set

#
# IRQ chip support
#
CONFIG_IRQCHIP=y
# CONFIG_AL_FIC is not set
# end of IRQ chip support

# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
CONFIG_GENERIC_PHY=y
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# end of PHY drivers for Broadcom platforms
# end of PHY Subsystem

CONFIG_POWERCAP=y
# CONFIG_DTPM is not set

#
# Performance monitor support
#
# end of Performance monitor support

# CONFIG_RAS is not set

#
# Android
#
CONFIG_ANDROID=y
CONFIG_ANDROID_BINDER_IPC=y
CONFIG_ANDROID_BINDERFS=y
CONFIG_ANDROID_BINDER_DEVICES="binder,hwbinder,vndbinder"
CONFIG_ANDROID_BINDER_IPC_SELFTEST=y
# end of Android

CONFIG_DAX=y
CONFIG_DEV_DAX=y
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y
CONFIG_NVMEM_SPMI_SDAM=m

#
# HW tracing support
#
# CONFIG_STM is not set
# end of HW tracing support

CONFIG_FPGA=m
# CONFIG_ALTERA_PR_IP_CORE is not set
CONFIG_FPGA_BRIDGE=m
# CONFIG_FPGA_REGION is not set
CONFIG_FSI=m
# CONFIG_FSI_NEW_DEV_NODE is not set
# CONFIG_FSI_MASTER_GPIO is not set
CONFIG_FSI_MASTER_HUB=m
# CONFIG_FSI_SCOM is not set
CONFIG_MULTIPLEXER=m

#
# Multiplexer drivers
#
CONFIG_MUX_ADG792A=m
CONFIG_MUX_GPIO=m
# CONFIG_MUX_MMIO is not set
# end of Multiplexer drivers

# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
CONFIG_INTERCONNECT=y
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
CONFIG_VALIDATE_FS_PARSER=y
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_POSIX_ACL=y
# CONFIG_EXT3_FS_SECURITY is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_USE_FOR_EXT2 is not set
CONFIG_EXT4_FS_POSIX_ACL=y
# CONFIG_EXT4_FS_SECURITY is not set
CONFIG_EXT4_DEBUG=y
# CONFIG_EXT4_KUNIT_TESTS is not set
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
CONFIG_REISERFS_CHECK=y
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_REISERFS_FS_XATTR is not set
# CONFIG_JFS_FS is not set
CONFIG_XFS_FS=y
# CONFIG_XFS_SUPPORT_V4 is not set
CONFIG_XFS_QUOTA=y
# CONFIG_XFS_POSIX_ACL is not set
# CONFIG_XFS_RT is not set
# CONFIG_XFS_ONLINE_SCRUB is not set
# CONFIG_XFS_WARN is not set
# CONFIG_XFS_DEBUG is not set
CONFIG_GFS2_FS=m
CONFIG_BTRFS_FS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_ASSERT=y
CONFIG_BTRFS_FS_REF_VERIFY=y
CONFIG_NILFS2_FS=m
CONFIG_F2FS_FS=m
# CONFIG_F2FS_STAT_FS is not set
CONFIG_F2FS_FS_XATTR=y
# CONFIG_F2FS_FS_POSIX_ACL is not set
CONFIG_F2FS_FS_SECURITY=y
CONFIG_F2FS_CHECK_FS=y
# CONFIG_F2FS_FAULT_INJECTION is not set
CONFIG_F2FS_FS_COMPRESSION=y
# CONFIG_F2FS_FS_LZO is not set
CONFIG_F2FS_FS_LZ4=y
CONFIG_F2FS_FS_LZ4HC=y
# CONFIG_F2FS_FS_ZSTD is not set
CONFIG_F2FS_IOSTAT=y
CONFIG_ZONEFS_FS=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
# CONFIG_EXPORTFS_BLOCK_OPS is not set
CONFIG_FILE_LOCKING=y
# CONFIG_FS_ENCRYPTION is not set
CONFIG_FS_VERITY=y
# CONFIG_FS_VERITY_DEBUG is not set
CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
# CONFIG_INOTIFY_USER is not set
# CONFIG_FANOTIFY is not set
# CONFIG_QUOTA is not set
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS4_FS is not set
CONFIG_AUTOFS_FS=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_VIRTIO_FS=m
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
# CONFIG_OVERLAY_FS_INDEX is not set
CONFIG_OVERLAY_FS_XINO_AUTO=y
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_NETFS_SUPPORT=m
# CONFIG_NETFS_STATS is not set
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
CONFIG_FSCACHE_DEBUG=y
# CONFIG_CACHEFILES is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=y
# CONFIG_MSDOS_FS is not set
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
CONFIG_FAT_DEFAULT_UTF8=y
# CONFIG_FAT_KUNIT_TEST is not set
CONFIG_EXFAT_FS=m
CONFIG_EXFAT_DEFAULT_IOCHARSET="utf8"
CONFIG_NTFS_FS=y
# CONFIG_NTFS_DEBUG is not set
CONFIG_NTFS_RW=y
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
# CONFIG_PROC_VMCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROC_CHILDREN is not set
CONFIG_KERNFS=y
CONFIG_SYSFS=y
# CONFIG_TMPFS is not set
CONFIG_ARCH_SUPPORTS_HUGETLBFS=y
# CONFIG_HUGETLBFS is not set
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=y
# end of Pseudo filesystems

# CONFIG_MISC_FILESYSTEMS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
CONFIG_NLS_CODEPAGE_737=y
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_CODEPAGE_852=y
# CONFIG_NLS_CODEPAGE_855 is not set
CONFIG_NLS_CODEPAGE_857=m
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
CONFIG_NLS_CODEPAGE_864=y
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=y
# CONFIG_NLS_CODEPAGE_869 is not set
CONFIG_NLS_CODEPAGE_936=y
CONFIG_NLS_CODEPAGE_950=y
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=y
# CONFIG_NLS_ISO8859_8 is not set
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=y
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
# CONFIG_NLS_ISO8859_13 is not set
CONFIG_NLS_ISO8859_14=y
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=y
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
# CONFIG_NLS_MAC_CELTIC is not set
CONFIG_NLS_MAC_CENTEURO=m
CONFIG_NLS_MAC_CROATIAN=y
CONFIG_NLS_MAC_CYRILLIC=m
# CONFIG_NLS_MAC_GAELIC is not set
CONFIG_NLS_MAC_GREEK=m
CONFIG_NLS_MAC_ICELAND=y
CONFIG_NLS_MAC_INUIT=y
CONFIG_NLS_MAC_ROMANIAN=m
# CONFIG_NLS_MAC_TURKISH is not set
CONFIG_NLS_UTF8=y
CONFIG_UNICODE=y
CONFIG_UNICODE_NORMALIZATION_SELFTEST=m
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
# CONFIG_TRUSTED_KEYS is not set
# CONFIG_ENCRYPTED_KEYS is not set
CONFIG_KEY_DH_OPERATIONS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_PATH=y
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_HARDENED_USERCOPY=y
# CONFIG_FORTIFY_SOURCE is not set
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH="/sbin/usermode-helper"
CONFIG_SECURITY_TOMOYO=y
CONFIG_SECURITY_TOMOYO_MAX_ACCEPT_ENTRY=2048
CONFIG_SECURITY_TOMOYO_MAX_AUDIT_LOG=1024
# CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER is not set
CONFIG_SECURITY_TOMOYO_POLICY_LOADER="/sbin/tomoyo-init"
CONFIG_SECURITY_TOMOYO_ACTIVATION_TRIGGER="/sbin/init"
# CONFIG_SECURITY_TOMOYO_INSECURE_BUILTIN_SETTING is not set
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_APPARMOR_HASH=y
# CONFIG_SECURITY_APPARMOR_HASH_DEFAULT is not set
# CONFIG_SECURITY_APPARMOR_DEBUG is not set
CONFIG_SECURITY_LOADPIN=y
CONFIG_SECURITY_LOADPIN_ENFORCE=y
# CONFIG_SECURITY_YAMA is not set
# CONFIG_SECURITY_SAFESETID is not set
CONFIG_SECURITY_LOCKDOWN_LSM=y
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_LOCK_DOWN_KERNEL_FORCE_NONE=y
# CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY is not set
# CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY is not set
# CONFIG_SECURITY_LANDLOCK is not set
# CONFIG_INTEGRITY is not set
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
CONFIG_DEFAULT_SECURITY_TOMOYO=y
# CONFIG_DEFAULT_SECURITY_APPARMOR is not set
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
# CONFIG_INIT_STACK_NONE is not set
CONFIG_INIT_STACK_ALL_PATTERN=y
# CONFIG_INIT_STACK_ALL_ZERO is not set
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
CONFIG_INIT_ON_FREE_DEFAULT_ON=y
# end of Memory initialization

CONFIG_CC_HAS_RANDSTRUCT=y
CONFIG_RANDSTRUCT_NONE=y
# CONFIG_RANDSTRUCT_FULL is not set
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=y
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=y
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=y
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_USER is not set
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
# CONFIG_CRYPTO_MANAGER_EXTRA_TESTS is not set
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=y
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=y
# CONFIG_CRYPTO_TEST is not set

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=y
# CONFIG_CRYPTO_DH_RFC7919_GROUPS is not set
# CONFIG_CRYPTO_ECDH is not set
# CONFIG_CRYPTO_ECDSA is not set
# CONFIG_CRYPTO_ECRDSA is not set
CONFIG_CRYPTO_SM2=y
CONFIG_CRYPTO_CURVE25519=m

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=y
# CONFIG_CRYPTO_GCM is not set
CONFIG_CRYPTO_CHACHA20POLY1305=y
CONFIG_CRYPTO_AEGIS128=y
CONFIG_CRYPTO_SEQIV=y
# CONFIG_CRYPTO_ECHAINIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=m
# CONFIG_CRYPTO_LRW is not set
CONFIG_CRYPTO_OFB=y
# CONFIG_CRYPTO_PCBC is not set
CONFIG_CRYPTO_XTS=m
CONFIG_CRYPTO_KEYWRAP=y
# CONFIG_CRYPTO_ADIANTUM is not set
CONFIG_CRYPTO_ESSIV=y

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=y
CONFIG_CRYPTO_XXHASH=y
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_BLAKE2S=m
CONFIG_CRYPTO_CRCT10DIF=m
CONFIG_CRYPTO_CRC64_ROCKSOFT=m
CONFIG_CRYPTO_GHASH=m
CONFIG_CRYPTO_POLY1305=y
CONFIG_CRYPTO_MD4=m
# CONFIG_CRYPTO_MD5 is not set
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD160=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
# CONFIG_CRYPTO_SHA3 is not set
CONFIG_CRYPTO_SM3=y
# CONFIG_CRYPTO_SM3_GENERIC is not set
CONFIG_CRYPTO_STREEBOG=y
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=y
CONFIG_CRYPTO_CAST5=y
CONFIG_CRYPTO_CAST6=y
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_CHACHA20=y
CONFIG_CRYPTO_SERPENT=m
# CONFIG_CRYPTO_SM4_GENERIC is not set
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_TWOFISH_COMMON=y

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
# CONFIG_CRYPTO_LZO is not set
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=y
CONFIG_CRYPTO_ZSTD=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_KDF800108_CTR=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=m
CONFIG_CRYPTO_USER_API_RNG=m
# CONFIG_CRYPTO_USER_API_RNG_CAVP is not set
# CONFIG_CRYPTO_USER_API_AEAD is not set
# CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE is not set
CONFIG_CRYPTO_HASH_INFO=y
# CONFIG_CRYPTO_HW is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
CONFIG_PKCS8_PRIVATE_KEY_PARSER=m
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
CONFIG_SIGNED_PE_FILE_VERIFICATION=y

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_MODULE_SIG_KEY_TYPE_RSA=y
# CONFIG_MODULE_SIG_KEY_TYPE_ECDSA is not set
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
CONFIG_SECONDARY_TRUSTED_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
CONFIG_SYSTEM_REVOCATION_LIST=y
CONFIG_SYSTEM_REVOCATION_KEYS=""
# CONFIG_SYSTEM_BLACKLIST_AUTH_UPDATE is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
# CONFIG_RAID6_PQ_BENCHMARK is not set
CONFIG_LINEAR_RANGES=y
CONFIG_PACKING=y
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_CORDIC=y
CONFIG_PRIME_NUMBERS=m
CONFIG_RATIONAL=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=y
# CONFIG_CRYPTO_LIB_CHACHA is not set
CONFIG_CRYPTO_LIB_CURVE25519_GENERIC=m
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_DES=y
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=y
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=m
CONFIG_CRC64_ROCKSOFT=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
# CONFIG_CRC32_SLICEBY8 is not set
# CONFIG_CRC32_SLICEBY4 is not set
CONFIG_CRC32_SARWATE=y
# CONFIG_CRC32_BIT is not set
CONFIG_CRC64=y
CONFIG_CRC4=m
CONFIG_CRC7=y
CONFIG_LIBCRC32C=y
CONFIG_CRC8=y
CONFIG_XXHASH=y
CONFIG_RANDOM32_SELFTEST=y
CONFIG_842_COMPRESS=m
CONFIG_842_DECOMPRESS=m
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
# CONFIG_ZLIB_DFLTCC is not set
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=y
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
# CONFIG_XZ_DEC_MICROLZMA is not set
CONFIG_XZ_DEC_BCJ=y
CONFIG_XZ_DEC_TEST=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_DEC8=y
CONFIG_REED_SOLOMON_ENC16=y
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_DMA=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DMA_DECLARE_COHERENT=y
CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED=y
CONFIG_SWIOTLB=y
# CONFIG_DMA_RESTRICTED_POOL is not set
CONFIG_DMA_CMA=y
# CONFIG_DMA_PERNUMA_CMA is not set

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=16
CONFIG_CMA_SIZE_PERCENTAGE=10
# CONFIG_CMA_SIZE_SEL_MBYTES is not set
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
CONFIG_CMA_SIZE_SEL_MAX=y
CONFIG_CMA_ALIGNMENT=8
CONFIG_DMA_API_DEBUG=y
# CONFIG_DMA_API_DEBUG_SG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
CONFIG_GLOB_SELFTEST=m
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_LIBFDT=y
CONFIG_OID_REGISTRY=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_SG_POOL=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_STACKDEPOT_ALWAYS_INIT=y
CONFIG_STACK_HASH_ORDER=20
CONFIG_SBITMAP=y
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
# CONFIG_PRINTK_TIME is not set
CONFIG_PRINTK_CALLER=y
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_DYNAMIC_DEBUG is not set
CONFIG_DYNAMIC_DEBUG_CORE=y
# CONFIG_SYMBOLIC_ERRNAME is not set
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO_NONE=y
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
CONFIG_FRAME_WARN=8192
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_HEADERS_INSTALL is not set
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
# CONFIG_MAGIC_SYSRQ_SERIAL is not set
CONFIG_DEBUG_FS=y
# CONFIG_DEBUG_FS_ALLOW_ALL is not set
CONFIG_DEBUG_FS_DISALLOW_MOUNT=y
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
CONFIG_UBSAN=y
# CONFIG_UBSAN_TRAP is not set
CONFIG_CC_HAS_UBSAN_BOUNDS=y
CONFIG_CC_HAS_UBSAN_ARRAY_BOUNDS=y
CONFIG_UBSAN_BOUNDS=y
CONFIG_UBSAN_ARRAY_BOUNDS=y
CONFIG_UBSAN_SHIFT=y
# CONFIG_UBSAN_DIV_ZERO is not set
# CONFIG_UBSAN_UNREACHABLE is not set
CONFIG_UBSAN_BOOL=y
CONFIG_UBSAN_ENUM=y
# CONFIG_UBSAN_ALIGNMENT is not set
# CONFIG_UBSAN_SANITIZE_ALL is not set
CONFIG_TEST_UBSAN=m
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_HAVE_KCSAN_COMPILER=y
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# CONFIG_NET_DEV_REFCNT_TRACKER is not set
# CONFIG_NET_NS_REFCNT_TRACKER is not set
# CONFIG_DEBUG_NET is not set
# end of Networking Debugging

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
# CONFIG_PAGE_OWNER is not set
CONFIG_PAGE_POISONING=y
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_PTDUMP_DEBUGFS is not set
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y
CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
CONFIG_DEBUG_VM_RB=y
CONFIG_DEBUG_VM_PGFLAGS=y
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
# CONFIG_KASAN_STACK is not set
# CONFIG_KASAN_VMALLOC is not set
CONFIG_KASAN_KUNIT_TEST=m
# CONFIG_KASAN_MODULE_TEST is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

# CONFIG_DEBUG_SHIRQ is not set

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
# CONFIG_DETECT_HUNG_TASK is not set
CONFIG_WQ_WATCHDOG=y
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
# CONFIG_SCHEDSTATS is not set
# end of Scheduler Debugging

CONFIG_DEBUG_TIMEKEEPING=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
CONFIG_PROVE_RAW_LOCK_NESTING=y
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
CONFIG_DEBUG_LOCKDEP=y
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_LOCK_TORTURE_TEST=y
CONFIG_WW_MUTEX_SELFTEST=m
CONFIG_SCF_TORTURE_TEST=m
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
# CONFIG_DEBUG_LIST is not set
CONFIG_DEBUG_PLIST=y
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# end of Debug kernel data structures

CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_TORTURE_TEST=y
CONFIG_RCU_SCALE_TEST=y
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20
CONFIG_RCU_TRACE=y
CONFIG_RCU_EQS_DEBUG=y
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
# CONFIG_LATENCYTOP is not set
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_NOP_MCOUNT=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
# CONFIG_STRICT_DEVMEM is not set

#
# s390 Debugging
#
CONFIG_EARLY_PRINTK=y
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_CIO_INJECT is not set
# end of s390 Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=m
# CONFIG_KUNIT_DEBUGFS is not set
CONFIG_KUNIT_TEST=m
CONFIG_KUNIT_EXAMPLE_TEST=m
# CONFIG_KUNIT_ALL_TESTS is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
CONFIG_FAULT_INJECTION=y
CONFIG_FAILSLAB=y
CONFIG_FAIL_PAGE_ALLOC=y
CONFIG_FAULT_INJECTION_USERCOPY=y
# CONFIG_FAIL_MAKE_REQUEST is not set
CONFIG_FAIL_IO_TIMEOUT=y
CONFIG_FAIL_FUTEX=y
# CONFIG_FAULT_INJECTION_DEBUG_FS is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
CONFIG_KCOV=y
CONFIG_KCOV_ENABLE_COMPARISONS=y
# CONFIG_KCOV_INSTRUMENT_ALL is not set
CONFIG_KCOV_IRQ_AREA_SIZE=0x40000
CONFIG_RUNTIME_TESTING_MENU=y
CONFIG_LKDTM=y
CONFIG_TEST_LIST_SORT=m
CONFIG_TEST_MIN_HEAP=m
CONFIG_TEST_SORT=m
# CONFIG_TEST_DIV64 is not set
CONFIG_KPROBES_SANITY_TEST=m
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_TEST_REF_TRACKER is not set
CONFIG_RBTREE_TEST=y
CONFIG_REED_SOLOMON_TEST=m
CONFIG_INTERVAL_TREE_TEST=m
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
CONFIG_ASYNC_RAID6_TEST=m
CONFIG_TEST_HEXDUMP=m
CONFIG_STRING_SELFTEST=y
# CONFIG_TEST_STRING_HELPERS is not set
CONFIG_TEST_STRSCPY=y
CONFIG_TEST_KSTRTOX=y
CONFIG_TEST_PRINTF=y
# CONFIG_TEST_SCANF is not set
CONFIG_TEST_BITMAP=y
CONFIG_TEST_UUID=m
CONFIG_TEST_XARRAY=y
CONFIG_TEST_RHASHTABLE=y
# CONFIG_TEST_SIPHASH is not set
CONFIG_TEST_IDA=m
CONFIG_TEST_LKM=m
CONFIG_TEST_BITOPS=m
CONFIG_TEST_VMALLOC=m
CONFIG_TEST_USER_COPY=m
CONFIG_TEST_BPF=m
CONFIG_TEST_BLACKHOLE_DEV=m
CONFIG_FIND_BIT_BENCHMARK=y
CONFIG_TEST_FIRMWARE=y
CONFIG_TEST_SYSCTL=m
# CONFIG_BITFIELD_KUNIT is not set
# CONFIG_HASH_KUNIT_TEST is not set
# CONFIG_RESOURCE_KUNIT_TEST is not set
CONFIG_SYSCTL_KUNIT_TEST=m
CONFIG_LIST_KUNIT_TEST=m
CONFIG_LINEAR_RANGES_TEST=m
# CONFIG_CMDLINE_KUNIT_TEST is not set
CONFIG_BITS_TEST=m
# CONFIG_SLUB_KUNIT_TEST is not set
# CONFIG_RATIONAL_KUNIT_TEST is not set
# CONFIG_MEMCPY_KUNIT_TEST is not set
# CONFIG_OVERFLOW_KUNIT_TEST is not set
# CONFIG_STACKINIT_KUNIT_TEST is not set
CONFIG_TEST_UDELAY=y
CONFIG_TEST_STATIC_KEYS=m
CONFIG_TEST_MEMCAT_P=m
CONFIG_TEST_MEMINIT=y
CONFIG_TEST_FREE_PAGES=y
# end of Kernel Testing and Coverage
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
@ 2022-06-25  2:59   ` kernel test robot
  2022-06-27 12:47   ` manish.mishra
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 128+ messages in thread
From: kernel test robot @ 2022-06-25  2:59 UTC (permalink / raw)
  To: James Houghton; +Cc: llvm, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 30087 bytes --]

Hi James,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on kvm/queue]
[also build test ERROR on arnd-asm-generic/master arm64/for-next/core linus/master v5.19-rc3]
[cannot apply to next-20220624]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/James-Houghton/hugetlb-Introduce-HugeTLB-high-granularity-mapping/20220625-014117
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: hexagon-randconfig-r041-20220624
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 6fa9120080c35a5ff851c3fc3358692c4ef7ce0d)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/dde229f56a775b9af56a84d32ddaa63f4e573dca
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review James-Houghton/hugetlb-Introduce-HugeTLB-high-granularity-mapping/20220625-014117
        git checkout dde229f56a775b9af56a84d32ddaa63f4e573dca
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/events/ mm/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from mm/filemap.c:36:
>> include/linux/hugetlb.h:1196:62: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:41: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                             ^
   include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__'
                           ______r = __builtin_expect(!!(x), expect);      \
                                                         ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                      ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:68: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                                                        ^
   include/linux/compiler.h:35:19: note: expanded from macro '__branch_check__'
                                                expect, is_constant);      \
                                                        ^~~~~~~~~~~
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                      ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:41: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                             ^
   include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__'
                           ______r = __builtin_expect(!!(x), expect);      \
                                                         ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:61: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                               ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:68: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                                                        ^
   include/linux/compiler.h:35:19: note: expanded from macro '__branch_check__'
                                                expect, is_constant);      \
                                                        ^~~~~~~~~~~
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:61: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                               ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:41: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                             ^
   include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__'
                           ______r = __builtin_expect(!!(x), expect);      \
                                                         ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:86: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                                                        ^~~~
   include/linux/compiler.h:69:3: note: expanded from macro '__trace_if_value'
           (cond) ?                                        \
            ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:68: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                                                        ^
   include/linux/compiler.h:35:19: note: expanded from macro '__branch_check__'
                                                expect, is_constant);      \
                                                        ^~~~~~~~~~~
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:86: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                                                        ^~~~
   include/linux/compiler.h:69:3: note: expanded from macro '__trace_if_value'
           (cond) ?                                        \
            ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1202:6: error: call to undeclared function 'hugetlb_pte_none'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
               ^
>> include/linux/hugetlb.h:1202:32: error: call to undeclared function 'hugetlb_pte_present_leaf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
                                         ^
>> include/linux/hugetlb.h:1203:27: error: call to undeclared function 'hugetlb_pte_shift'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                   return huge_pte_lockptr(hugetlb_pte_shift(hpte),
                                           ^
   include/linux/hugetlb.h:1204:13: error: incomplete definition of type 'struct hugetlb_pte'
                                   mm, hpte->ptep);
                                       ~~~~^
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
   include/linux/hugetlb.h:1209:59: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                             ^
>> include/linux/hugetlb.h:1211:44: error: incompatible pointer types passing 'struct hugetlb_pte *' to parameter of type 'struct hugetlb_pte *' [-Werror,-Wincompatible-pointer-types]
           spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
                                                     ^~~~
   include/linux/hugetlb.h:1196:75: note: passing argument to parameter 'hpte' here
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                             ^
   2 warnings and 11 errors generated.
--
   In file included from mm/shmem.c:37:
>> include/linux/hugetlb.h:1196:62: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:41: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                             ^
   include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__'
                           ______r = __builtin_expect(!!(x), expect);      \
                                                         ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                      ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:68: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                                                        ^
   include/linux/compiler.h:35:19: note: expanded from macro '__branch_check__'
                                                expect, is_constant);      \
                                                        ^~~~~~~~~~~
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                      ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:41: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                             ^
   include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__'
                           ______r = __builtin_expect(!!(x), expect);      \
                                                         ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:61: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                               ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:68: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                                                        ^
   include/linux/compiler.h:35:19: note: expanded from macro '__branch_check__'
                                                expect, is_constant);      \
                                                        ^~~~~~~~~~~
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:61: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                               ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:41: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                             ^
   include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__'
                           ______r = __builtin_expect(!!(x), expect);      \
                                                         ^
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:86: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                                                        ^~~~
   include/linux/compiler.h:69:3: note: expanded from macro '__trace_if_value'
           (cond) ?                                        \
            ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1199:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:162:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:48:68: note: expanded from macro 'unlikely'
   #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                                                                        ^
   include/linux/compiler.h:35:19: note: expanded from macro '__branch_check__'
                                                expect, is_constant);      \
                                                        ^~~~~~~~~~~
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                                                 ^~~~
   include/linux/compiler.h:58:86: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                                                        ^~~~
   include/linux/compiler.h:69:3: note: expanded from macro '__trace_if_value'
           (cond) ?                                        \
            ^~~~
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
>> include/linux/hugetlb.h:1202:6: error: call to undeclared function 'hugetlb_pte_none'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
               ^
>> include/linux/hugetlb.h:1202:32: error: call to undeclared function 'hugetlb_pte_present_leaf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
                                         ^
>> include/linux/hugetlb.h:1203:27: error: call to undeclared function 'hugetlb_pte_shift'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                   return huge_pte_lockptr(hugetlb_pte_shift(hpte),
                                           ^
   include/linux/hugetlb.h:1204:13: error: incomplete definition of type 'struct hugetlb_pte'
                                   mm, hpte->ptep);
                                       ~~~~^
   include/linux/hugetlb.h:1196:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
   include/linux/hugetlb.h:1209:59: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                             ^
>> include/linux/hugetlb.h:1211:44: error: incompatible pointer types passing 'struct hugetlb_pte *' to parameter of type 'struct hugetlb_pte *' [-Werror,-Wincompatible-pointer-types]
           spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
                                                     ^~~~
   include/linux/hugetlb.h:1196:75: note: passing argument to parameter 'hpte' here
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                             ^
   mm/shmem.c:4031:13: warning: no previous prototype for function 'shmem_init' [-Wmissing-prototypes]
   void __init shmem_init(void)
               ^
   mm/shmem.c:4031:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __init shmem_init(void)
   ^
   static 
   mm/shmem.c:4039:5: warning: no previous prototype for function 'shmem_unuse' [-Wmissing-prototypes]
   int shmem_unuse(unsigned int type)
       ^
   mm/shmem.c:4039:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int shmem_unuse(unsigned int type)
   ^
   static 
   mm/shmem.c:4044:5: warning: no previous prototype for function 'shmem_lock' [-Wmissing-prototypes]
   int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
       ^
   mm/shmem.c:4044:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
   ^
   static 
   mm/shmem.c:4049:6: warning: no previous prototype for function 'shmem_unlock_mapping' [-Wmissing-prototypes]
   void shmem_unlock_mapping(struct address_space *mapping)
        ^
   mm/shmem.c:4049:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void shmem_unlock_mapping(struct address_space *mapping)
   ^
   static 
   mm/shmem.c:4054:15: warning: no previous prototype for function 'shmem_get_unmapped_area' [-Wmissing-prototypes]
   unsigned long shmem_get_unmapped_area(struct file *file,
                 ^
   mm/shmem.c:4054:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   unsigned long shmem_get_unmapped_area(struct file *file,
   ^
   static 
   mm/shmem.c:4062:6: warning: no previous prototype for function 'shmem_truncate_range' [-Wmissing-prototypes]
   void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
        ^
   mm/shmem.c:4062:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
   ^
   static 
   mm/shmem.c:4121:14: warning: no previous prototype for function 'shmem_kernel_file_setup' [-Wmissing-prototypes]
   struct file *shmem_kernel_file_setup(const char *name, loff_t size, unsigned long flags)
                ^
   mm/shmem.c:4121:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct file *shmem_kernel_file_setup(const char *name, loff_t size, unsigned long flags)
   ^
   static 
   mm/shmem.c:4132:14: warning: no previous prototype for function 'shmem_file_setup' [-Wmissing-prototypes]
   struct file *shmem_file_setup(const char *name, loff_t size, unsigned long flags)
                ^
   mm/shmem.c:4132:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct file *shmem_file_setup(const char *name, loff_t size, unsigned long flags)
   ^
   static 
   mm/shmem.c:4145:14: warning: no previous prototype for function 'shmem_file_setup_with_mnt' [-Wmissing-prototypes]
   struct file *shmem_file_setup_with_mnt(struct vfsmount *mnt, const char *name,
                ^
   mm/shmem.c:4145:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct file *shmem_file_setup_with_mnt(struct vfsmount *mnt, const char *name,
   ^
   static 
   mm/shmem.c:4156:5: warning: no previous prototype for function 'shmem_zero_setup' [-Wmissing-prototypes]
   int shmem_zero_setup(struct vm_area_struct *vma)
       ^
   mm/shmem.c:4156:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int shmem_zero_setup(struct vm_area_struct *vma)
   ^
   static 
   mm/shmem.c:4194:14: warning: no previous prototype for function 'shmem_read_mapping_page_gfp' [-Wmissing-prototypes]
   struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
                ^
   mm/shmem.c:4194:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
   ^
   static 
   13 warnings and 11 errors generated.
..


vim +1199 include/linux/hugetlb.h

  1194	
  1195	static inline
> 1196	spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
  1197	{
  1198	
> 1199		BUG_ON(!hpte->ptep);
  1200		// Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
  1201		// the regular page table lock.
> 1202		if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> 1203			return huge_pte_lockptr(hugetlb_pte_shift(hpte),
  1204					mm, hpte->ptep);
  1205		return &mm->page_table_lock;
  1206	}
  1207	
  1208	static inline
  1209	spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
  1210	{
> 1211		spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
  1212	
  1213		spin_lock(ptl);
  1214		return ptl;
  1215	}
  1216	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 119522 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/hexagon 5.19.0-rc1 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="clang version 15.0.0 (git://gitmirror/llvm_project 6fa9120080c35a5ff851c3fc3358692c4ef7ce0d)"
CONFIG_GCC_VERSION=0
CONFIG_CC_IS_CLANG=y
CONFIG_CLANG_VERSION=150000
CONFIG_AS_IS_LLVM=y
CONFIG_AS_VERSION=150000
CONFIG_LD_VERSION=0
CONFIG_LD_IS_LLD=y
CONFIG_LLD_VERSION=150000
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_TOOLS_SUPPORT_RELR=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_COMPILE_TEST=y
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_BUILD_SALT=""
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
CONFIG_WATCH_QUEUE=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_SIM=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
# CONFIG_TIME_KUNIT_TEST is not set

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_BPF=y

#
# BPF subsystem
#
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
# end of BPF subsystem

CONFIG_PREEMPT_NONE_BUILD=y
CONFIG_PREEMPT_NONE=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_PSI=y
CONFIG_PSI_DEFAULT_DISABLED=y
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

# CONFIG_IKCONFIG is not set
# CONFIG_IKHEADERS is not set

#
# Scheduler features
#
# end of Scheduler features

CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough"
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
# CONFIG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_CGROUP_PIDS is not set
CONFIG_CGROUP_RDMA=y
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
# CONFIG_CGROUP_BPF is not set
# CONFIG_CGROUP_MISC is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_SCHED_AUTOGROUP=y
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_SYSFS_DEPRECATED_V2 is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
# CONFIG_RD_LZ4 is not set
CONFIG_RD_ZSTD=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_INITRAMFS_PRESERVE_MTIME=y
# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_LD_ORPHAN_WARN=y
CONFIG_EXPERT=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
# CONFIG_POSIX_TIMERS is not set
# CONFIG_PRINTK is not set
# CONFIG_BUG is not set
# CONFIG_BASE_FULL is not set
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
# CONFIG_TIMERFD is not set
CONFIG_EVENTFD=y
# CONFIG_SHMEM is not set
# CONFIG_AIO is not set
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_KCMP=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y
CONFIG_PC104=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
CONFIG_DEBUG_PERF_USE_VMALLOC=y
# end of Kernel Performance Events And Counters

CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

#
# Linux Kernel Configuration for Hexagon
#
CONFIG_HEXAGON=y
CONFIG_HEXAGON_PHYS_OFFSET=y
CONFIG_FRAME_POINTER=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_EARLY_PRINTK=y
CONFIG_MMU=y
CONFIG_GENERIC_CSUM=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_STACKTRACE_SUPPORT=y

#
# Machine selection
#
CONFIG_HEXAGON_COMET=y
CONFIG_HEXAGON_ARCH_VERSION=2
CONFIG_CMDLINE=""
# CONFIG_SMP is not set
CONFIG_NR_CPUS=1
# CONFIG_PAGE_SIZE_4KB is not set
# CONFIG_PAGE_SIZE_16KB is not set
CONFIG_PAGE_SIZE_64KB=y
# CONFIG_PAGE_SIZE_256KB is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_SCHED_HRTICK=y
# end of Machine selection

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_32BIT_OFF_T=y
CONFIG_LTO_NONE=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_ARCH_NO_PREEMPT=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y

#
# GCOV-based kernel profiling
#
CONFIG_GCOV_KERNEL=y
# end of GCOV-based kernel profiling
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=1
# CONFIG_MODULES is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLOCK_LEGACY_AUTOLOAD=y
CONFIG_BLK_DEV_BSG_COMMON=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=y
CONFIG_BLK_DEV_ZONED=y
CONFIG_BLK_WBT=y
# CONFIG_BLK_WBT_MQ is not set
# CONFIG_BLK_DEBUG_FS is not set
# CONFIG_BLK_SED_OPAL is not set
CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_EFI_PARTITION=y
# end of Partition Types

CONFIG_BLK_MQ_VIRTIO=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
# CONFIG_IOSCHED_BFQ is not set
# end of IO Schedulers

CONFIG_ASN1=y
CONFIG_UNINLINE_SPIN_UNLOCK=y

#
# Executable file formats
#
# CONFIG_BINFMT_ELF is not set
# CONFIG_BINFMT_SCRIPT is not set
# CONFIG_BINFMT_MISC is not set
# CONFIG_COREDUMP is not set
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SWAP=y
# CONFIG_ZSWAP is not set

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
# CONFIG_SLUB is not set
CONFIG_SLOB=y
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_FLATMEM=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_COMPACTION is not set
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_NEED_PER_CPU_KM=y
CONFIG_CMA=y
CONFIG_CMA_DEBUG=y
# CONFIG_CMA_DEBUGFS is not set
# CONFIG_CMA_SYSFS is not set
CONFIG_CMA_AREAS=7
# CONFIG_IDLE_PAGE_TRACKING is not set
# CONFIG_VM_EVENT_COUNTERS is not set
CONFIG_PERCPU_STATS=y
# CONFIG_GUP_TEST is not set
# CONFIG_ANON_VMA_NAME is not set
# CONFIG_USERFAULTFD is not set

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

# CONFIG_NET is not set

#
# Device Drivers
#
# CONFIG_PCCARD is not set

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
# CONFIG_DEVTMPFS is not set
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
CONFIG_FW_LOADER_COMPRESS=y
CONFIG_FW_LOADER_COMPRESS_XZ=y
# CONFIG_FW_LOADER_COMPRESS_ZSTD is not set
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

# CONFIG_ALLOW_DEV_COREDUMP is not set
CONFIG_DEBUG_DRIVER=y
CONFIG_DEBUG_DEVRES=y
CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
# CONFIG_PM_QOS_KUNIT_TEST is not set
# CONFIG_DRIVER_PE_KUNIT_TEST is not set
CONFIG_GENERIC_CPU_DEVICES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=y
CONFIG_REGMAP_W1=y
CONFIG_REGMAP_MMIO=y
CONFIG_REGMAP_IRQ=y
CONFIG_REGMAP_SOUNDWIRE=y
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_ARM_INTEGRATOR_LM is not set
# CONFIG_BT1_APB is not set
# CONFIG_BT1_AXI is not set
# CONFIG_INTEL_IXP4XX_EB is not set
# CONFIG_QCOM_EBI2 is not set
CONFIG_MHI_BUS=y
# CONFIG_MHI_BUS_DEBUG is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# CONFIG_ARM_SCMI_PROTOCOL is not set
CONFIG_ARM_SCMI_POWER_DOMAIN=y
# end of ARM System Control and Management Interface Protocol

# CONFIG_ARM_SCPI_PROTOCOL is not set
CONFIG_ARM_SCPI_POWER_DOMAIN=y
# CONFIG_FIRMWARE_MEMMAP is not set
# CONFIG_TURRIS_MOX_RWTM is not set
# CONFIG_BCM47XX_NVRAM is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

CONFIG_GNSS=y
CONFIG_GNSS_SERIAL=y
CONFIG_GNSS_MTK_SERIAL=y
CONFIG_GNSS_SIRF_SERIAL=y
# CONFIG_GNSS_UBX_SERIAL is not set
# CONFIG_GNSS_USB is not set
CONFIG_MTD=y

#
# Partition parsers
#
CONFIG_MTD_AR7_PARTS=y
# CONFIG_MTD_BCM63XX_PARTS is not set
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_OF_PARTS=y
# CONFIG_MTD_OF_PARTS_BCM4908 is not set
# CONFIG_MTD_OF_PARTS_LINKSYS_NS is not set
# CONFIG_MTD_PARSER_IMAGETAG is not set
# CONFIG_MTD_PARSER_TRX is not set
# CONFIG_MTD_SHARPSL_PARTS is not set
# CONFIG_MTD_REDBOOT_PARTS is not set
# end of Partition parsers

#
# User Modules And Translation Layers
#
CONFIG_MTD_BLKDEVS=y
CONFIG_MTD_BLOCK=y

#
# Note that in some cases UBI block is preferred. See MTD_UBI_BLOCK.
#
CONFIG_FTL=y
CONFIG_NFTL=y
# CONFIG_NFTL_RW is not set
CONFIG_INFTL=y
CONFIG_RFD_FTL=y
# CONFIG_SSFDC is not set
CONFIG_SM_FTL=y
CONFIG_MTD_OOPS=y
# CONFIG_MTD_SWAP is not set
CONFIG_MTD_PARTITIONED_MASTER=y

#
# RAM/ROM/Flash chip drivers
#
# CONFIG_MTD_CFI is not set
CONFIG_MTD_JEDECPROBE=y
CONFIG_MTD_GEN_PROBE=y
CONFIG_MTD_CFI_ADV_OPTIONS=y
# CONFIG_MTD_CFI_NOSWAP is not set
CONFIG_MTD_CFI_BE_BYTE_SWAP=y
# CONFIG_MTD_CFI_LE_BYTE_SWAP is not set
# CONFIG_MTD_CFI_GEOMETRY is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_OTP is not set
# CONFIG_MTD_CFI_INTELEXT is not set
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_CFI_STAA=y
CONFIG_MTD_CFI_UTIL=y
CONFIG_MTD_RAM=y
CONFIG_MTD_ROM=y
# CONFIG_MTD_ABSENT is not set
# end of RAM/ROM/Flash chip drivers

#
# Mapping drivers for chip access
#
# CONFIG_MTD_COMPLEX_MAPPINGS is not set
# CONFIG_MTD_PHYSMAP is not set
# CONFIG_MTD_TS5500 is not set
CONFIG_MTD_PLATRAM=y
# end of Mapping drivers for chip access

#
# Self-contained MTD device drivers
#
CONFIG_MTD_SPEAR_SMI=y
CONFIG_MTD_SLRAM=y
CONFIG_MTD_PHRAM=y
# CONFIG_MTD_MTDRAM is not set
CONFIG_MTD_BLOCK2MTD=y

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOCG3 is not set
# end of Self-contained MTD device drivers

#
# NAND
#
CONFIG_MTD_NAND_CORE=y
CONFIG_MTD_ONENAND=y
# CONFIG_MTD_ONENAND_VERIFY_WRITE is not set
CONFIG_MTD_ONENAND_GENERIC=y
# CONFIG_MTD_ONENAND_SAMSUNG is not set
CONFIG_MTD_ONENAND_OTP=y
# CONFIG_MTD_ONENAND_2X_PROGRAM is not set
# CONFIG_MTD_RAW_NAND is not set

#
# ECC engine support
#
CONFIG_MTD_NAND_ECC=y
CONFIG_MTD_NAND_ECC_SW_HAMMING=y
# CONFIG_MTD_NAND_ECC_SW_HAMMING_SMC is not set
# CONFIG_MTD_NAND_ECC_SW_BCH is not set
# CONFIG_MTD_NAND_ECC_MXIC is not set
# CONFIG_MTD_NAND_ECC_MEDIATEK is not set
# end of ECC engine support
# end of NAND

#
# LPDDR & LPDDR2 PCM memory drivers
#
CONFIG_MTD_LPDDR=y
CONFIG_MTD_QINFO_PROBE=y
# end of LPDDR & LPDDR2 PCM memory drivers

# CONFIG_MTD_UBI is not set
# CONFIG_MTD_HYPERBUS is not set
CONFIG_DTC=y
CONFIG_OF=y
CONFIG_OF_UNITTEST=y
# CONFIG_OF_ALL_DTBS is not set
CONFIG_OF_FLATTREE=y
CONFIG_OF_EARLY_FLATTREE=y
CONFIG_OF_KOBJ=y
# CONFIG_OF_DYNAMIC is not set
CONFIG_OF_ADDRESS=y
CONFIG_OF_IRQ=y
CONFIG_OF_RESERVED_MEM=y
CONFIG_OF_RESOLVE=y
# CONFIG_OF_OVERLAY is not set
# CONFIG_PARPORT is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
CONFIG_CDROM=y
# CONFIG_ZRAM is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8

#
# DRBD disabled because PROC_FS or INET not selected
#
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_VIRTIO_BLK is not set

#
# NVME Support
#
CONFIG_NVME_CORE=y
CONFIG_NVME_MULTIPATH=y
# CONFIG_NVME_VERBOSE_ERRORS is not set
# CONFIG_NVME_HWMON is not set
CONFIG_NVME_FABRICS=y
CONFIG_NVME_FC=y
# CONFIG_NVME_TARGET is not set
# end of NVME Support

#
# Misc devices
#
CONFIG_SENSORS_LIS3LV02D=y
CONFIG_AD525X_DPOT=y
CONFIG_AD525X_DPOT_I2C=y
# CONFIG_DUMMY_IRQ is not set
CONFIG_ICS932S401=y
# CONFIG_ATMEL_SSC is not set
CONFIG_ENCLOSURE_SERVICES=y
# CONFIG_QCOM_COINCELL is not set
# CONFIG_QCOM_FASTRPC is not set
CONFIG_APDS9802ALS=y
CONFIG_ISL29003=y
CONFIG_ISL29020=y
# CONFIG_SENSORS_TSL2550 is not set
CONFIG_SENSORS_BH1770=y
CONFIG_SENSORS_APDS990X=y
# CONFIG_HMC6352 is not set
CONFIG_DS1682=y
# CONFIG_SRAM is not set
CONFIG_XILINX_SDFEC=y
CONFIG_HISI_HIKEY_USB=y
# CONFIG_OPEN_DICE is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
CONFIG_EEPROM_LEGACY=y
CONFIG_EEPROM_MAX6875=y
CONFIG_EEPROM_93CX6=y
CONFIG_EEPROM_IDT_89HPESX=y
CONFIG_EEPROM_EE1004=y
# end of EEPROM support

#
# Texas Instruments shared transport line discipline
#
# end of Texas Instruments shared transport line discipline

CONFIG_SENSORS_LIS3_I2C=y
# CONFIG_ALTERA_STAPL is not set
CONFIG_ECHO=y
# CONFIG_MISC_RTSX_USB is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=y
CONFIG_SCSI_COMMON=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
CONFIG_BLK_DEV_SR=y
CONFIG_CHR_DEV_SG=y
CONFIG_BLK_DEV_BSG=y
CONFIG_CHR_DEV_SCH=y
# CONFIG_SCSI_ENCLOSURE is not set
# CONFIG_SCSI_CONSTANTS is not set
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_SAS_ATTRS=y
# CONFIG_SCSI_SAS_LIBSAS is not set
CONFIG_SCSI_SRP_ATTRS=y
# end of SCSI Transports

# CONFIG_SCSI_LOWLEVEL is not set
# CONFIG_SCSI_DH is not set
# end of SCSI device support

CONFIG_ATA=y
CONFIG_SATA_HOST=y
# CONFIG_ATA_VERBOSE_ERROR is not set
# CONFIG_ATA_FORCE is not set
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI_PLATFORM=y
# CONFIG_AHCI_BRCM is not set
# CONFIG_AHCI_DA850 is not set
# CONFIG_AHCI_DM816 is not set
CONFIG_AHCI_CEVA=y
# CONFIG_AHCI_MTK is not set
# CONFIG_AHCI_MVEBU is not set
# CONFIG_AHCI_SUNXI is not set
# CONFIG_AHCI_TEGRA is not set
# CONFIG_AHCI_XGENE is not set
# CONFIG_AHCI_QORIQ is not set
# CONFIG_SATA_FSL is not set
# CONFIG_SATA_GEMINI is not set
# CONFIG_SATA_AHCI_SEATTLE is not set
# CONFIG_ATA_SFF is not set
# CONFIG_MD is not set
CONFIG_TARGET_CORE=y
CONFIG_TCM_IBLOCK=y
# CONFIG_TCM_FILEIO is not set
# CONFIG_TCM_PSCSI is not set
CONFIG_LOOPBACK_TARGET=y

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# end of IEEE 1394 (FireWire) support

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_SPARSEKMAP=y
CONFIG_INPUT_MATRIXKMAP=y
CONFIG_INPUT_VIVALDIFMAP=y

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
CONFIG_INPUT_JOYDEV=y
# CONFIG_INPUT_EVDEV is not set
CONFIG_INPUT_EVBUG=y

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5520 is not set
CONFIG_KEYBOARD_ADP5588=y
CONFIG_KEYBOARD_ADP5589=y
# CONFIG_KEYBOARD_ATKBD is not set
CONFIG_KEYBOARD_QT1050=y
CONFIG_KEYBOARD_QT1070=y
CONFIG_KEYBOARD_QT2160=y
# CONFIG_KEYBOARD_CLPS711X is not set
CONFIG_KEYBOARD_DLINK_DIR685=y
CONFIG_KEYBOARD_LKKBD=y
# CONFIG_KEYBOARD_EP93XX is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
CONFIG_KEYBOARD_TCA6416=y
CONFIG_KEYBOARD_TCA8418=y
CONFIG_KEYBOARD_MATRIX=y
# CONFIG_KEYBOARD_LM8323 is not set
CONFIG_KEYBOARD_LM8333=y
CONFIG_KEYBOARD_MAX7359=y
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_SNVS_PWRKEY is not set
# CONFIG_KEYBOARD_IMX is not set
CONFIG_KEYBOARD_NEWTON=y
CONFIG_KEYBOARD_OPENCORES=y
# CONFIG_KEYBOARD_GOLDFISH_EVENTS is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_ST_KEYSCAN is not set
CONFIG_KEYBOARD_SUNKBD=y
# CONFIG_KEYBOARD_SH_KEYSC is not set
CONFIG_KEYBOARD_STMPE=y
CONFIG_KEYBOARD_IQS62X=y
CONFIG_KEYBOARD_OMAP4=y
# CONFIG_KEYBOARD_TC3589X is not set
CONFIG_KEYBOARD_TM2_TOUCHKEY=y
CONFIG_KEYBOARD_XTKBD=y
# CONFIG_KEYBOARD_CAP11XX is not set
# CONFIG_KEYBOARD_MT6779 is not set
CONFIG_KEYBOARD_MTK_PMIC=y
# CONFIG_KEYBOARD_CYPRESS_SF is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
# CONFIG_MOUSE_PS2_ALPS is not set
CONFIG_MOUSE_PS2_BYD=y
# CONFIG_MOUSE_PS2_LOGIPS2PP is not set
CONFIG_MOUSE_PS2_SYNAPTICS=y
# CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS is not set
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_MOUSE_PS2_ELANTECH=y
CONFIG_MOUSE_PS2_ELANTECH_SMBUS=y
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_PS2_SMBUS=y
CONFIG_MOUSE_SERIAL=y
CONFIG_MOUSE_APPLETOUCH=y
CONFIG_MOUSE_BCM5974=y
CONFIG_MOUSE_CYAPA=y
CONFIG_MOUSE_ELAN_I2C=y
# CONFIG_MOUSE_ELAN_I2C_I2C is not set
CONFIG_MOUSE_ELAN_I2C_SMBUS=y
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_GPIO is not set
CONFIG_MOUSE_SYNAPTICS_I2C=y
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_AD714X=y
CONFIG_INPUT_AD714X_I2C=y
CONFIG_INPUT_ARIZONA_HAPTICS=y
# CONFIG_INPUT_ATMEL_CAPTOUCH is not set
CONFIG_INPUT_BMA150=y
CONFIG_INPUT_E3X0_BUTTON=y
CONFIG_INPUT_MAX77693_HAPTIC=y
# CONFIG_INPUT_MAX8925_ONKEY is not set
# CONFIG_INPUT_MAX8997_HAPTIC is not set
# CONFIG_INPUT_MC13783_PWRBUTTON is not set
CONFIG_INPUT_MMA8450=y
CONFIG_INPUT_GPIO_BEEPER=y
# CONFIG_INPUT_GPIO_DECODER is not set
CONFIG_INPUT_GPIO_VIBRA=y
# CONFIG_INPUT_ATI_REMOTE2 is not set
CONFIG_INPUT_KEYSPAN_REMOTE=y
# CONFIG_INPUT_KXTJ9 is not set
CONFIG_INPUT_POWERMATE=y
# CONFIG_INPUT_YEALINK is not set
CONFIG_INPUT_CM109=y
# CONFIG_INPUT_REGULATOR_HAPTIC is not set
# CONFIG_INPUT_RETU_PWRBUTTON is not set
CONFIG_INPUT_TPS65218_PWRBUTTON=y
CONFIG_INPUT_AXP20X_PEK=y
# CONFIG_INPUT_UINPUT is not set
CONFIG_INPUT_PCF8574=y
# CONFIG_INPUT_PWM_BEEPER is not set
CONFIG_INPUT_PWM_VIBRA=y
CONFIG_INPUT_RK805_PWRKEY=y
CONFIG_INPUT_GPIO_ROTARY_ENCODER=y
# CONFIG_INPUT_DA7280_HAPTICS is not set
CONFIG_INPUT_DA9055_ONKEY=y
CONFIG_INPUT_DA9063_ONKEY=y
CONFIG_INPUT_ADXL34X=y
CONFIG_INPUT_ADXL34X_I2C=y
# CONFIG_INPUT_IMS_PCU is not set
CONFIG_INPUT_IQS269A=y
# CONFIG_INPUT_IQS626A is not set
# CONFIG_INPUT_IQS7222 is not set
CONFIG_INPUT_CMA3000=y
# CONFIG_INPUT_CMA3000_I2C is not set
CONFIG_INPUT_DRV260X_HAPTICS=y
CONFIG_INPUT_DRV2665_HAPTICS=y
# CONFIG_INPUT_DRV2667_HAPTICS is not set
# CONFIG_INPUT_HISI_POWERKEY is not set
# CONFIG_INPUT_RAVE_SP_PWRBUTTON is not set
# CONFIG_INPUT_SC27XX_VIBRA is not set
CONFIG_INPUT_STPMIC1_ONKEY=y
CONFIG_RMI4_CORE=y
CONFIG_RMI4_I2C=y
CONFIG_RMI4_SMB=y
# CONFIG_RMI4_F03 is not set
CONFIG_RMI4_2D_SENSOR=y
CONFIG_RMI4_F11=y
# CONFIG_RMI4_F12 is not set
CONFIG_RMI4_F30=y
CONFIG_RMI4_F34=y
CONFIG_RMI4_F3A=y
CONFIG_RMI4_F55=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_SERIO_ALTERA_PS2 is not set
# CONFIG_SERIO_PS2MULT is not set
CONFIG_SERIO_ARC_PS2=y
CONFIG_SERIO_APBPS2=y
# CONFIG_SERIO_OLPC_APSP is not set
# CONFIG_SERIO_SUN4I_PS2 is not set
CONFIG_SERIO_GPIO_PS2=y
CONFIG_USERIO=y
CONFIG_GAMEPORT=y
CONFIG_GAMEPORT_NS558=y
CONFIG_GAMEPORT_L4=y
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
# CONFIG_TTY is not set
CONFIG_SERIAL_DEV_BUS=y
CONFIG_IPMI_HANDLER=y
CONFIG_IPMI_PLAT_DATA=y
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=y
CONFIG_IPMI_SI=y
# CONFIG_IPMI_SSIF is not set
# CONFIG_IPMI_IPMB is not set
CONFIG_IPMI_WATCHDOG=y
CONFIG_IPMI_POWEROFF=y
# CONFIG_ASPEED_KCS_IPMI_BMC is not set
# CONFIG_NPCM7XX_KCS_IPMI_BMC is not set
CONFIG_IPMB_DEVICE_INTERFACE=y
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_TIMERIOMEM=y
# CONFIG_HW_RANDOM_BA431 is not set
CONFIG_HW_RANDOM_BCM2835=y
CONFIG_HW_RANDOM_IPROC_RNG200=y
CONFIG_HW_RANDOM_IXP4XX=y
CONFIG_HW_RANDOM_OMAP=y
CONFIG_HW_RANDOM_OMAP3_ROM=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_HW_RANDOM_NOMADIK=y
CONFIG_HW_RANDOM_STM32=y
CONFIG_HW_RANDOM_MESON=y
CONFIG_HW_RANDOM_MTK=y
CONFIG_HW_RANDOM_EXYNOS=y
CONFIG_HW_RANDOM_NPCM=y
CONFIG_HW_RANDOM_KEYSTONE=y
# CONFIG_HW_RANDOM_CCTRNG is not set
CONFIG_HW_RANDOM_XIPHERA=y
CONFIG_DEVMEM=y
CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_SYNQUACER is not set
# CONFIG_TCG_TIS_I2C_CR50 is not set
# CONFIG_TCG_TIS_I2C_ATMEL is not set
# CONFIG_TCG_TIS_I2C_INFINEON is not set
# CONFIG_TCG_TIS_I2C_NUVOTON is not set
CONFIG_TCG_ATMEL=y
# CONFIG_TCG_VTPM_PROXY is not set
CONFIG_TCG_TIS_ST33ZP24=y
CONFIG_TCG_TIS_ST33ZP24_I2C=y
CONFIG_XILLYBUS_CLASS=y
CONFIG_XILLYBUS=y
CONFIG_XILLYBUS_OF=y
# CONFIG_XILLYUSB is not set
CONFIG_RANDOM_TRUST_BOOTLOADER=y
# end of Character devices

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
# CONFIG_I2C_COMPAT is not set
# CONFIG_I2C_CHARDEV is not set
CONFIG_I2C_MUX=y

#
# Multiplexer I2C Chip support
#
CONFIG_I2C_ARB_GPIO_CHALLENGE=y
CONFIG_I2C_MUX_GPIO=y
# CONFIG_I2C_MUX_GPMUX is not set
CONFIG_I2C_MUX_LTC4306=y
# CONFIG_I2C_MUX_PCA9541 is not set
CONFIG_I2C_MUX_PCA954x=y
CONFIG_I2C_MUX_PINCTRL=y
CONFIG_I2C_MUX_REG=y
# CONFIG_I2C_DEMUX_PINCTRL is not set
CONFIG_I2C_MUX_MLXCPLD=y
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCA=y

#
# I2C Hardware Bus support
#
# CONFIG_I2C_HIX5HD2 is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_ALTERA is not set
# CONFIG_I2C_ASPEED is not set
# CONFIG_I2C_AT91 is not set
# CONFIG_I2C_AXXIA is not set
# CONFIG_I2C_BCM_IPROC is not set
# CONFIG_I2C_BCM_KONA is not set
CONFIG_I2C_BRCMSTB=y
# CONFIG_I2C_CADENCE is not set
# CONFIG_I2C_CBUS_GPIO is not set
# CONFIG_I2C_DAVINCI is not set
CONFIG_I2C_DESIGNWARE_CORE=y
# CONFIG_I2C_DESIGNWARE_SLAVE is not set
CONFIG_I2C_DESIGNWARE_PLATFORM=y
# CONFIG_I2C_DIGICOLOR is not set
# CONFIG_I2C_EXYNOS5 is not set
CONFIG_I2C_GPIO=y
# CONFIG_I2C_GPIO_FAULT_INJECTOR is not set
# CONFIG_I2C_HIGHLANDER is not set
# CONFIG_I2C_HISI is not set
# CONFIG_I2C_IMG is not set
# CONFIG_I2C_IMX is not set
# CONFIG_I2C_IMX_LPI2C is not set
# CONFIG_I2C_IOP3XX is not set
# CONFIG_I2C_JZ4780 is not set
# CONFIG_I2C_LPC2K is not set
# CONFIG_I2C_MT65XX is not set
# CONFIG_I2C_MT7621 is not set
# CONFIG_I2C_MV64XXX is not set
# CONFIG_I2C_MXS is not set
# CONFIG_I2C_NPCM7XX is not set
CONFIG_I2C_OCORES=y
# CONFIG_I2C_OMAP is not set
# CONFIG_I2C_OWL is not set
# CONFIG_I2C_APPLE is not set
CONFIG_I2C_PCA_PLATFORM=y
# CONFIG_I2C_PNX is not set
# CONFIG_I2C_PXA is not set
# CONFIG_I2C_QCOM_CCI is not set
# CONFIG_I2C_QUP is not set
# CONFIG_I2C_RIIC is not set
# CONFIG_I2C_S3C2410 is not set
# CONFIG_I2C_SH_MOBILE is not set
# CONFIG_I2C_SIMTEC is not set
# CONFIG_I2C_ST is not set
# CONFIG_I2C_STM32F4 is not set
# CONFIG_I2C_STM32F7 is not set
# CONFIG_I2C_SUN6I_P2WI is not set
# CONFIG_I2C_SYNQUACER is not set
# CONFIG_I2C_TEGRA_BPMP is not set
# CONFIG_I2C_UNIPHIER is not set
# CONFIG_I2C_UNIPHIER_F is not set
# CONFIG_I2C_VERSATILE is not set
# CONFIG_I2C_WMT is not set
# CONFIG_I2C_XILINX is not set
# CONFIG_I2C_XLP9XX is not set
# CONFIG_I2C_RCAR is not set

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_DIOLAN_U2C=y
# CONFIG_I2C_DLN2 is not set
# CONFIG_I2C_CP2615 is not set
CONFIG_I2C_ROBOTFUZZ_OSIF=y
CONFIG_I2C_TINY_USB=y
CONFIG_I2C_VIPERBOARD=y

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_MLXCPLD is not set
CONFIG_I2C_FSI=y
# CONFIG_I2C_VIRTIO is not set
# end of I2C Hardware Bus support

CONFIG_I2C_SLAVE=y
CONFIG_I2C_SLAVE_EEPROM=y
CONFIG_I2C_SLAVE_TESTUNIT=y
CONFIG_I2C_DEBUG_CORE=y
# CONFIG_I2C_DEBUG_ALGO is not set
CONFIG_I2C_DEBUG_BUS=y
# end of I2C support

CONFIG_I3C=y
CONFIG_CDNS_I3C_MASTER=y
# CONFIG_DW_I3C_MASTER is not set
# CONFIG_SVC_I3C_MASTER is not set
# CONFIG_MIPI_I3C_HCI is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set
CONFIG_NTP_PPS=y

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
CONFIG_PPS_CLIENT_GPIO=y

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_GENERIC_PINCTRL_GROUPS=y
CONFIG_PINMUX=y
CONFIG_GENERIC_PINMUX_FUNCTIONS=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
CONFIG_DEBUG_PINCTRL=y
# CONFIG_PINCTRL_AMD is not set
CONFIG_PINCTRL_AS3722=y
# CONFIG_PINCTRL_AT91PIO4 is not set
CONFIG_PINCTRL_AXP209=y
# CONFIG_PINCTRL_BM1880 is not set
# CONFIG_PINCTRL_DA850_PUPD is not set
CONFIG_PINCTRL_DA9062=y
# CONFIG_PINCTRL_EQUILIBRIUM is not set
# CONFIG_PINCTRL_INGENIC is not set
# CONFIG_PINCTRL_LPC18XX is not set
CONFIG_PINCTRL_MAX77620=y
CONFIG_PINCTRL_MCP23S08_I2C=y
CONFIG_PINCTRL_MCP23S08=y
# CONFIG_PINCTRL_MICROCHIP_SGPIO is not set
# CONFIG_PINCTRL_OCELOT is not set
# CONFIG_PINCTRL_PISTACHIO is not set
CONFIG_PINCTRL_RK805=y
# CONFIG_PINCTRL_ROCKCHIP is not set
CONFIG_PINCTRL_SINGLE=y
# CONFIG_PINCTRL_STARFIVE is not set
# CONFIG_PINCTRL_STMFX is not set
CONFIG_PINCTRL_SX150X=y
# CONFIG_PINCTRL_OWL is not set
# CONFIG_PINCTRL_ASPEED_G4 is not set
# CONFIG_PINCTRL_ASPEED_G5 is not set
# CONFIG_PINCTRL_ASPEED_G6 is not set
# CONFIG_PINCTRL_BCM281XX is not set
# CONFIG_PINCTRL_BCM2835 is not set
# CONFIG_PINCTRL_BCM4908 is not set
# CONFIG_PINCTRL_BCM6318 is not set
# CONFIG_PINCTRL_BCM6328 is not set
# CONFIG_PINCTRL_BCM6358 is not set
# CONFIG_PINCTRL_BCM6362 is not set
# CONFIG_PINCTRL_BCM6368 is not set
# CONFIG_PINCTRL_BCM63268 is not set
# CONFIG_PINCTRL_IPROC_GPIO is not set
# CONFIG_PINCTRL_CYGNUS_MUX is not set
# CONFIG_PINCTRL_NS is not set
# CONFIG_PINCTRL_NSP_GPIO is not set
# CONFIG_PINCTRL_NS2_MUX is not set
# CONFIG_PINCTRL_NSP_MUX is not set
# CONFIG_PINCTRL_AS370 is not set
# CONFIG_PINCTRL_BERLIN_BG4CT is not set
# CONFIG_PINCTRL_LOCHNAGAR is not set
CONFIG_PINCTRL_MADERA=y
CONFIG_PINCTRL_CS47L35=y

#
# Intel pinctrl drivers
#
# end of Intel pinctrl drivers

#
# MediaTek pinctrl drivers
#
CONFIG_EINT_MTK=y
CONFIG_PINCTRL_MTK=y
# CONFIG_PINCTRL_MT2701 is not set
# CONFIG_PINCTRL_MT7623 is not set
# CONFIG_PINCTRL_MT7629 is not set
# CONFIG_PINCTRL_MT8135 is not set
# CONFIG_PINCTRL_MT8127 is not set
# CONFIG_PINCTRL_MT2712 is not set
# CONFIG_PINCTRL_MT6765 is not set
# CONFIG_PINCTRL_MT6779 is not set
# CONFIG_PINCTRL_MT6795 is not set
# CONFIG_PINCTRL_MT6797 is not set
# CONFIG_PINCTRL_MT7622 is not set
# CONFIG_PINCTRL_MT7986 is not set
# CONFIG_PINCTRL_MT8167 is not set
# CONFIG_PINCTRL_MT8173 is not set
# CONFIG_PINCTRL_MT8183 is not set
# CONFIG_PINCTRL_MT8186 is not set
# CONFIG_PINCTRL_MT8192 is not set
# CONFIG_PINCTRL_MT8195 is not set
# CONFIG_PINCTRL_MT8365 is not set
# CONFIG_PINCTRL_MT8516 is not set
CONFIG_PINCTRL_MT6397=y
# end of MediaTek pinctrl drivers

CONFIG_PINCTRL_MESON=y
# CONFIG_PINCTRL_WPCM450 is not set
# CONFIG_PINCTRL_NPCM7XX is not set
# CONFIG_PINCTRL_PXA25X is not set
# CONFIG_PINCTRL_PXA27X is not set
# CONFIG_PINCTRL_MSM is not set
# CONFIG_PINCTRL_QCOM_SSBI_PMIC is not set
# CONFIG_PINCTRL_SM8450 is not set
# CONFIG_PINCTRL_LPASS_LPI is not set

#
# Renesas pinctrl drivers
#
# CONFIG_PINCTRL_RENESAS is not set
# CONFIG_PINCTRL_PFC_EMEV2 is not set
# CONFIG_PINCTRL_PFC_R8A77995 is not set
# CONFIG_PINCTRL_PFC_R8A7794 is not set
# CONFIG_PINCTRL_PFC_R8A77990 is not set
# CONFIG_PINCTRL_PFC_R8A7779 is not set
# CONFIG_PINCTRL_PFC_R8A7790 is not set
# CONFIG_PINCTRL_PFC_R8A77950 is not set
# CONFIG_PINCTRL_PFC_R8A77951 is not set
# CONFIG_PINCTRL_PFC_R8A7778 is not set
# CONFIG_PINCTRL_PFC_R8A7793 is not set
# CONFIG_PINCTRL_PFC_R8A7791 is not set
# CONFIG_PINCTRL_PFC_R8A77965 is not set
# CONFIG_PINCTRL_PFC_R8A77960 is not set
# CONFIG_PINCTRL_PFC_R8A77961 is not set
# CONFIG_PINCTRL_PFC_R8A779F0 is not set
# CONFIG_PINCTRL_PFC_R8A7792 is not set
# CONFIG_PINCTRL_PFC_R8A77980 is not set
# CONFIG_PINCTRL_PFC_R8A77970 is not set
# CONFIG_PINCTRL_PFC_R8A779A0 is not set
# CONFIG_PINCTRL_PFC_R8A7740 is not set
# CONFIG_PINCTRL_PFC_R8A73A4 is not set
# CONFIG_PINCTRL_RZA1 is not set
# CONFIG_PINCTRL_RZA2 is not set
# CONFIG_PINCTRL_RZG2L is not set
# CONFIG_PINCTRL_PFC_R8A77470 is not set
# CONFIG_PINCTRL_PFC_R8A7745 is not set
# CONFIG_PINCTRL_PFC_R8A7742 is not set
# CONFIG_PINCTRL_PFC_R8A7743 is not set
# CONFIG_PINCTRL_PFC_R8A7744 is not set
# CONFIG_PINCTRL_PFC_R8A774C0 is not set
# CONFIG_PINCTRL_PFC_R8A774E1 is not set
# CONFIG_PINCTRL_PFC_R8A774A1 is not set
# CONFIG_PINCTRL_PFC_R8A774B1 is not set
# CONFIG_PINCTRL_RZN1 is not set
# CONFIG_PINCTRL_PFC_SH7203 is not set
# CONFIG_PINCTRL_PFC_SH7264 is not set
# CONFIG_PINCTRL_PFC_SH7269 is not set
# CONFIG_PINCTRL_PFC_SH7720 is not set
# CONFIG_PINCTRL_PFC_SH7722 is not set
# CONFIG_PINCTRL_PFC_SH7734 is not set
# CONFIG_PINCTRL_PFC_SH7757 is not set
# CONFIG_PINCTRL_PFC_SH7785 is not set
# CONFIG_PINCTRL_PFC_SH7786 is not set
# CONFIG_PINCTRL_PFC_SH73A0 is not set
# CONFIG_PINCTRL_PFC_SH7723 is not set
# CONFIG_PINCTRL_PFC_SH7724 is not set
# CONFIG_PINCTRL_PFC_SHX3 is not set
# end of Renesas pinctrl drivers

# CONFIG_PINCTRL_EXYNOS is not set
# CONFIG_PINCTRL_S3C24XX is not set
# CONFIG_PINCTRL_S3C64XX is not set
# CONFIG_PINCTRL_SPRD_SC9860 is not set
# CONFIG_PINCTRL_STM32F429 is not set
# CONFIG_PINCTRL_STM32F469 is not set
# CONFIG_PINCTRL_STM32F746 is not set
# CONFIG_PINCTRL_STM32F769 is not set
# CONFIG_PINCTRL_STM32H743 is not set
# CONFIG_PINCTRL_STM32MP135 is not set
# CONFIG_PINCTRL_STM32MP157 is not set
# CONFIG_PINCTRL_TI_IODELAY is not set
# CONFIG_PINCTRL_TMPV7700 is not set
CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_OF_GPIO=y
CONFIG_GPIOLIB_IRQCHIP=y
CONFIG_DEBUG_GPIO=y
# CONFIG_GPIO_SYSFS is not set
# CONFIG_GPIO_CDEV is not set
CONFIG_GPIO_GENERIC=y

#
# Memory mapped GPIO drivers
#
CONFIG_GPIO_74XX_MMIO=y
CONFIG_GPIO_ALTERA=y
# CONFIG_GPIO_ASPEED is not set
# CONFIG_GPIO_ASPEED_SGPIO is not set
# CONFIG_GPIO_ATH79 is not set
# CONFIG_GPIO_RASPBERRYPI_EXP is not set
# CONFIG_GPIO_BCM_KONA is not set
# CONFIG_GPIO_BCM_XGS_IPROC is not set
# CONFIG_GPIO_BRCMSTB is not set
CONFIG_GPIO_CADENCE=y
# CONFIG_GPIO_CLPS711X is not set
CONFIG_GPIO_DWAPB=y
# CONFIG_GPIO_EIC_SPRD is not set
# CONFIG_GPIO_EM is not set
# CONFIG_GPIO_FTGPIO010 is not set
CONFIG_GPIO_GENERIC_PLATFORM=y
CONFIG_GPIO_GRGPIO=y
# CONFIG_GPIO_HISI is not set
# CONFIG_GPIO_HLWD is not set
# CONFIG_GPIO_IOP is not set
# CONFIG_GPIO_LPC18XX is not set
# CONFIG_GPIO_LPC32XX is not set
# CONFIG_GPIO_MB86S7X is not set
# CONFIG_GPIO_MPC8XXX is not set
# CONFIG_GPIO_MT7621 is not set
# CONFIG_GPIO_MXC is not set
# CONFIG_GPIO_MXS is not set
# CONFIG_GPIO_PMIC_EIC_SPRD is not set
# CONFIG_GPIO_PXA is not set
# CONFIG_GPIO_RCAR is not set
# CONFIG_GPIO_RDA is not set
# CONFIG_GPIO_ROCKCHIP is not set
CONFIG_GPIO_SIFIVE=y
# CONFIG_GPIO_SIOX is not set
# CONFIG_GPIO_SNPS_CREG is not set
# CONFIG_GPIO_SPRD is not set
# CONFIG_GPIO_STP_XWAY is not set
# CONFIG_GPIO_TEGRA is not set
# CONFIG_GPIO_TEGRA186 is not set
# CONFIG_GPIO_TS4800 is not set
# CONFIG_GPIO_UNIPHIER is not set
# CONFIG_GPIO_VISCONTI is not set
# CONFIG_GPIO_XGENE_SB is not set
CONFIG_GPIO_XILINX=y
# CONFIG_GPIO_XLP is not set
CONFIG_GPIO_AMD_FCH=y
# CONFIG_GPIO_IDT3243X is not set
# end of Memory mapped GPIO drivers

#
# I2C GPIO expanders
#
# CONFIG_GPIO_ADP5588 is not set
# CONFIG_GPIO_ADNP is not set
# CONFIG_GPIO_GW_PLD is not set
# CONFIG_GPIO_MAX7300 is not set
# CONFIG_GPIO_MAX732X is not set
# CONFIG_GPIO_PCA953X is not set
CONFIG_GPIO_PCA9570=y
# CONFIG_GPIO_PCF857X is not set
CONFIG_GPIO_TPIC2810=y
# CONFIG_GPIO_TS4900 is not set
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
# CONFIG_GPIO_ADP5520 is not set
CONFIG_GPIO_ARIZONA=y
# CONFIG_GPIO_BD71815 is not set
CONFIG_GPIO_BD71828=y
# CONFIG_GPIO_DA9055 is not set
CONFIG_GPIO_DLN2=y
# CONFIG_GPIO_LP3943 is not set
CONFIG_GPIO_LP873X=y
CONFIG_GPIO_LP87565=y
CONFIG_GPIO_MADERA=y
# CONFIG_GPIO_MAX77620 is not set
# CONFIG_GPIO_RC5T583 is not set
# CONFIG_GPIO_SL28CPLD is not set
CONFIG_GPIO_STMPE=y
CONFIG_GPIO_TC3589X=y
CONFIG_GPIO_TPS65086=y
CONFIG_GPIO_TPS65218=y
# CONFIG_GPIO_TPS65910 is not set
CONFIG_GPIO_TQMX86=y
CONFIG_GPIO_WM8350=y
# end of MFD GPIO expanders

#
# USB GPIO expanders
#
CONFIG_GPIO_VIPERBOARD=y
# end of USB GPIO expanders

#
# Virtual GPIO drivers
#
CONFIG_GPIO_AGGREGATOR=y
CONFIG_GPIO_MOCKUP=y
# CONFIG_GPIO_VIRTIO is not set
# CONFIG_GPIO_SIM is not set
# end of Virtual GPIO drivers

CONFIG_W1=y

#
# 1-wire Bus Masters
#
# CONFIG_W1_MASTER_DS2490 is not set
# CONFIG_W1_MASTER_DS2482 is not set
# CONFIG_W1_MASTER_MXC is not set
CONFIG_W1_MASTER_DS1WM=y
CONFIG_W1_MASTER_GPIO=y
CONFIG_W1_MASTER_SGI=y
# end of 1-wire Bus Masters

#
# 1-wire Slaves
#
CONFIG_W1_SLAVE_THERM=y
# CONFIG_W1_SLAVE_SMEM is not set
# CONFIG_W1_SLAVE_DS2405 is not set
CONFIG_W1_SLAVE_DS2408=y
# CONFIG_W1_SLAVE_DS2408_READBACK is not set
CONFIG_W1_SLAVE_DS2413=y
# CONFIG_W1_SLAVE_DS2406 is not set
CONFIG_W1_SLAVE_DS2423=y
CONFIG_W1_SLAVE_DS2805=y
CONFIG_W1_SLAVE_DS2430=y
# CONFIG_W1_SLAVE_DS2431 is not set
CONFIG_W1_SLAVE_DS2433=y
CONFIG_W1_SLAVE_DS2433_CRC=y
# CONFIG_W1_SLAVE_DS2438 is not set
CONFIG_W1_SLAVE_DS250X=y
CONFIG_W1_SLAVE_DS2780=y
CONFIG_W1_SLAVE_DS2781=y
CONFIG_W1_SLAVE_DS28E04=y
CONFIG_W1_SLAVE_DS28E17=y
# end of 1-wire Slaves

# CONFIG_POWER_RESET is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_POWER_SUPPLY_HWMON=y
# CONFIG_PDA_POWER is not set
# CONFIG_IP5XXX_POWER is not set
CONFIG_MAX8925_POWER=y
CONFIG_WM8350_POWER=y
CONFIG_TEST_POWER=y
CONFIG_CHARGER_ADP5061=y
# CONFIG_BATTERY_ACT8945A is not set
# CONFIG_BATTERY_CW2015 is not set
CONFIG_BATTERY_DS2760=y
CONFIG_BATTERY_DS2780=y
CONFIG_BATTERY_DS2781=y
CONFIG_BATTERY_DS2782=y
# CONFIG_BATTERY_SAMSUNG_SDI is not set
CONFIG_BATTERY_SBS=y
CONFIG_CHARGER_SBS=y
CONFIG_MANAGER_SBS=y
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_DA9150 is not set
# CONFIG_BATTERY_MAX17040 is not set
CONFIG_BATTERY_MAX17042=y
CONFIG_BATTERY_MAX1721X=y
CONFIG_CHARGER_ISP1704=y
CONFIG_CHARGER_MAX8903=y
CONFIG_CHARGER_LP8727=y
# CONFIG_CHARGER_GPIO is not set
# CONFIG_CHARGER_MANAGER is not set
CONFIG_CHARGER_LT3651=y
# CONFIG_CHARGER_LTC4162L is not set
CONFIG_CHARGER_MAX14577=y
CONFIG_CHARGER_DETECTOR_MAX14656=y
# CONFIG_CHARGER_MAX77693 is not set
# CONFIG_CHARGER_MAX77976 is not set
# CONFIG_CHARGER_QCOM_SMBB is not set
# CONFIG_CHARGER_BQ2415X is not set
CONFIG_CHARGER_BQ24190=y
CONFIG_CHARGER_BQ24257=y
# CONFIG_CHARGER_BQ24735 is not set
CONFIG_CHARGER_BQ2515X=y
CONFIG_CHARGER_BQ25890=y
CONFIG_CHARGER_BQ25980=y
# CONFIG_CHARGER_BQ256XX is not set
# CONFIG_CHARGER_SMB347 is not set
# CONFIG_CHARGER_TPS65090 is not set
# CONFIG_BATTERY_GAUGE_LTC2941 is not set
CONFIG_BATTERY_GOLDFISH=y
CONFIG_BATTERY_RT5033=y
CONFIG_CHARGER_RT9455=y
# CONFIG_CHARGER_SC2731 is not set
CONFIG_CHARGER_UCS1002=y
CONFIG_CHARGER_BD99954=y
# CONFIG_BATTERY_UG3105 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=y
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
CONFIG_SENSORS_AD7414=y
# CONFIG_SENSORS_AD7418 is not set
CONFIG_SENSORS_ADM1021=y
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
CONFIG_SENSORS_ADM1029=y
# CONFIG_SENSORS_ADM1031 is not set
CONFIG_SENSORS_ADM1177=y
CONFIG_SENSORS_ADM9240=y
CONFIG_SENSORS_ADT7X10=y
CONFIG_SENSORS_ADT7410=y
CONFIG_SENSORS_ADT7411=y
CONFIG_SENSORS_ADT7462=y
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_AHT10 is not set
CONFIG_SENSORS_AS370=y
CONFIG_SENSORS_ASC7621=y
CONFIG_SENSORS_AXI_FAN_CONTROL=y
CONFIG_SENSORS_ASPEED=y
CONFIG_SENSORS_ATXP1=y
# CONFIG_SENSORS_BT1_PVT is not set
CONFIG_SENSORS_CORSAIR_CPRO=y
# CONFIG_SENSORS_CORSAIR_PSU is not set
CONFIG_SENSORS_DRIVETEMP=y
# CONFIG_SENSORS_DS620 is not set
CONFIG_SENSORS_DS1621=y
CONFIG_SENSORS_DA9055=y
# CONFIG_SENSORS_SPARX5 is not set
# CONFIG_SENSORS_F71805F is not set
CONFIG_SENSORS_F71882FG=y
CONFIG_SENSORS_F75375S=y
# CONFIG_SENSORS_GSC is not set
# CONFIG_SENSORS_MC13783_ADC is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
CONFIG_SENSORS_G760A=y
# CONFIG_SENSORS_G762 is not set
# CONFIG_SENSORS_GPIO_FAN is not set
CONFIG_SENSORS_HIH6130=y
CONFIG_SENSORS_IBMAEM=y
CONFIG_SENSORS_IBMPEX=y
CONFIG_SENSORS_IT87=y
# CONFIG_SENSORS_JC42 is not set
CONFIG_SENSORS_POWR1220=y
# CONFIG_SENSORS_LAN966X is not set
CONFIG_SENSORS_LINEAGE=y
# CONFIG_SENSORS_LOCHNAGAR is not set
CONFIG_SENSORS_LTC2945=y
CONFIG_SENSORS_LTC2947=y
CONFIG_SENSORS_LTC2947_I2C=y
CONFIG_SENSORS_LTC2990=y
# CONFIG_SENSORS_LTC2992 is not set
CONFIG_SENSORS_LTC4151=y
# CONFIG_SENSORS_LTC4215 is not set
CONFIG_SENSORS_LTC4222=y
CONFIG_SENSORS_LTC4245=y
CONFIG_SENSORS_LTC4260=y
CONFIG_SENSORS_LTC4261=y
# CONFIG_SENSORS_MAX127 is not set
# CONFIG_SENSORS_MAX16065 is not set
CONFIG_SENSORS_MAX1619=y
CONFIG_SENSORS_MAX1668=y
# CONFIG_SENSORS_MAX197 is not set
# CONFIG_SENSORS_MAX31730 is not set
# CONFIG_SENSORS_MAX6620 is not set
CONFIG_SENSORS_MAX6621=y
# CONFIG_SENSORS_MAX6639 is not set
# CONFIG_SENSORS_MAX6642 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_MAX6697 is not set
CONFIG_SENSORS_MAX31790=y
CONFIG_SENSORS_MCP3021=y
CONFIG_SENSORS_TC654=y
# CONFIG_SENSORS_TPS23861 is not set
# CONFIG_SENSORS_MENF21BMC_HWMON is not set
CONFIG_SENSORS_MR75203=y
# CONFIG_SENSORS_LM63 is not set
CONFIG_SENSORS_LM73=y
CONFIG_SENSORS_LM75=y
CONFIG_SENSORS_LM77=y
CONFIG_SENSORS_LM78=y
# CONFIG_SENSORS_LM80 is not set
CONFIG_SENSORS_LM83=y
# CONFIG_SENSORS_LM85 is not set
CONFIG_SENSORS_LM87=y
CONFIG_SENSORS_LM90=y
# CONFIG_SENSORS_LM92 is not set
CONFIG_SENSORS_LM93=y
CONFIG_SENSORS_LM95234=y
CONFIG_SENSORS_LM95241=y
# CONFIG_SENSORS_LM95245 is not set
CONFIG_SENSORS_PC87360=y
# CONFIG_SENSORS_PC87427 is not set
CONFIG_SENSORS_NCT6683=y
CONFIG_SENSORS_NCT6775_CORE=y
CONFIG_SENSORS_NCT6775=y
# CONFIG_SENSORS_NCT6775_I2C is not set
CONFIG_SENSORS_NCT7802=y
CONFIG_SENSORS_NPCM7XX=y
# CONFIG_SENSORS_NSA320 is not set
# CONFIG_SENSORS_OCC_P8_I2C is not set
# CONFIG_SENSORS_OCC_P9_SBE is not set
# CONFIG_SENSORS_PCF8591 is not set
CONFIG_PMBUS=y
CONFIG_SENSORS_PMBUS=y
CONFIG_SENSORS_ADM1266=y
# CONFIG_SENSORS_ADM1275 is not set
# CONFIG_SENSORS_BEL_PFE is not set
# CONFIG_SENSORS_BPA_RS600 is not set
# CONFIG_SENSORS_DELTA_AHE50DC_FAN is not set
# CONFIG_SENSORS_FSP_3Y is not set
# CONFIG_SENSORS_IBM_CFFPS is not set
# CONFIG_SENSORS_DPS920AB is not set
CONFIG_SENSORS_INSPUR_IPSPS=y
CONFIG_SENSORS_IR35221=y
# CONFIG_SENSORS_IR36021 is not set
# CONFIG_SENSORS_IR38064 is not set
# CONFIG_SENSORS_IRPS5401 is not set
CONFIG_SENSORS_ISL68137=y
# CONFIG_SENSORS_LM25066 is not set
CONFIG_SENSORS_LTC2978=y
# CONFIG_SENSORS_LTC2978_REGULATOR is not set
# CONFIG_SENSORS_LTC3815 is not set
# CONFIG_SENSORS_MAX15301 is not set
# CONFIG_SENSORS_MAX16064 is not set
CONFIG_SENSORS_MAX16601=y
CONFIG_SENSORS_MAX20730=y
CONFIG_SENSORS_MAX20751=y
CONFIG_SENSORS_MAX31785=y
CONFIG_SENSORS_MAX34440=y
CONFIG_SENSORS_MAX8688=y
# CONFIG_SENSORS_MP2888 is not set
CONFIG_SENSORS_MP2975=y
# CONFIG_SENSORS_MP5023 is not set
# CONFIG_SENSORS_PIM4328 is not set
# CONFIG_SENSORS_PLI1209BC is not set
# CONFIG_SENSORS_PM6764TR is not set
CONFIG_SENSORS_PXE1610=y
# CONFIG_SENSORS_Q54SJ108A2 is not set
# CONFIG_SENSORS_STPDDC60 is not set
CONFIG_SENSORS_TPS40422=y
CONFIG_SENSORS_TPS53679=y
CONFIG_SENSORS_UCD9000=y
CONFIG_SENSORS_UCD9200=y
# CONFIG_SENSORS_XDPE152 is not set
# CONFIG_SENSORS_XDPE122 is not set
CONFIG_SENSORS_ZL6100=y
# CONFIG_SENSORS_PWM_FAN is not set
# CONFIG_SENSORS_RASPBERRYPI_HWMON is not set
# CONFIG_SENSORS_SL28CPLD is not set
# CONFIG_SENSORS_SBTSI is not set
# CONFIG_SENSORS_SBRMI is not set
CONFIG_SENSORS_SHT15=y
CONFIG_SENSORS_SHT21=y
CONFIG_SENSORS_SHT3x=y
# CONFIG_SENSORS_SHT4x is not set
# CONFIG_SENSORS_SHTC1 is not set
# CONFIG_SENSORS_SY7636A is not set
CONFIG_SENSORS_DME1737=y
# CONFIG_SENSORS_EMC1403 is not set
CONFIG_SENSORS_EMC2103=y
CONFIG_SENSORS_EMC6W201=y
CONFIG_SENSORS_SMSC47M1=y
CONFIG_SENSORS_SMSC47M192=y
CONFIG_SENSORS_SMSC47B397=y
CONFIG_SENSORS_STTS751=y
CONFIG_SENSORS_SMM665=y
CONFIG_SENSORS_ADC128D818=y
# CONFIG_SENSORS_ADS7828 is not set
CONFIG_SENSORS_AMC6821=y
CONFIG_SENSORS_INA209=y
CONFIG_SENSORS_INA2XX=y
# CONFIG_SENSORS_INA238 is not set
# CONFIG_SENSORS_INA3221 is not set
CONFIG_SENSORS_TC74=y
# CONFIG_SENSORS_THMC50 is not set
CONFIG_SENSORS_TMP102=y
CONFIG_SENSORS_TMP103=y
CONFIG_SENSORS_TMP108=y
# CONFIG_SENSORS_TMP401 is not set
CONFIG_SENSORS_TMP421=y
# CONFIG_SENSORS_TMP464 is not set
CONFIG_SENSORS_TMP513=y
CONFIG_SENSORS_VT1211=y
CONFIG_SENSORS_W83773G=y
CONFIG_SENSORS_W83781D=y
CONFIG_SENSORS_W83791D=y
CONFIG_SENSORS_W83792D=y
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83795 is not set
# CONFIG_SENSORS_W83L785TS is not set
CONFIG_SENSORS_W83L786NG=y
CONFIG_SENSORS_W83627HF=y
CONFIG_SENSORS_W83627EHF=y
# CONFIG_SENSORS_WM8350 is not set
CONFIG_THERMAL=y
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
# CONFIG_THERMAL_HWMON is not set
CONFIG_THERMAL_OF=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
# CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE is not set
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE=y
CONFIG_THERMAL_GOV_FAIR_SHARE=y
# CONFIG_THERMAL_GOV_STEP_WISE is not set
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
CONFIG_CPU_THERMAL=y
# CONFIG_THERMAL_EMULATION is not set
CONFIG_THERMAL_MMIO=y
CONFIG_HISI_THERMAL=y
# CONFIG_IMX8MM_THERMAL is not set
# CONFIG_K3_THERMAL is not set
CONFIG_MAX77620_THERMAL=y
# CONFIG_QORIQ_THERMAL is not set
# CONFIG_SPEAR_THERMAL is not set
# CONFIG_ROCKCHIP_THERMAL is not set
# CONFIG_RCAR_THERMAL is not set
# CONFIG_RCAR_GEN3_THERMAL is not set
# CONFIG_RZG2L_THERMAL is not set
# CONFIG_KIRKWOOD_THERMAL is not set
# CONFIG_DOVE_THERMAL is not set
# CONFIG_ARMADA_THERMAL is not set
# CONFIG_DA9062_THERMAL is not set
CONFIG_MTK_THERMAL=y

#
# Intel thermal drivers
#

#
# ACPI INT340X thermal drivers
#
# end of ACPI INT340X thermal drivers
# end of Intel thermal drivers

#
# Broadcom thermal drivers
#
# CONFIG_BCM2835_THERMAL is not set
# CONFIG_BRCMSTB_THERMAL is not set
# CONFIG_BCM_NS_THERMAL is not set
# CONFIG_BCM_SR_THERMAL is not set
# end of Broadcom thermal drivers

#
# Texas Instruments thermal drivers
#
# CONFIG_TI_SOC_THERMAL is not set
# end of Texas Instruments thermal drivers

#
# Samsung thermal drivers
#
# CONFIG_EXYNOS_THERMAL is not set
# end of Samsung thermal drivers

#
# NVIDIA Tegra thermal drivers
#
# CONFIG_TEGRA_SOCTHERM is not set
# CONFIG_TEGRA_BPMP_THERMAL is not set
# CONFIG_TEGRA30_TSENSOR is not set
# end of NVIDIA Tegra thermal drivers

#
# Qualcomm thermal drivers
#
# end of Qualcomm thermal drivers

# CONFIG_SPRD_THERMAL is not set
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
CONFIG_SSB=y
CONFIG_SSB_SDIOHOST_POSSIBLE=y
# CONFIG_SSB_SDIOHOST is not set
CONFIG_SSB_DRIVER_GPIO=y
CONFIG_BCMA_POSSIBLE=y
CONFIG_BCMA=y
# CONFIG_BCMA_HOST_SOC is not set
# CONFIG_BCMA_DRIVER_MIPS is not set
# CONFIG_BCMA_DRIVER_GMAC_CMN is not set
# CONFIG_BCMA_DRIVER_GPIO is not set
# CONFIG_BCMA_DEBUG is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
CONFIG_MFD_ACT8945A=y
# CONFIG_MFD_SUN4I_GPADC is not set
# CONFIG_MFD_AS3711 is not set
CONFIG_MFD_AS3722=y
CONFIG_PMIC_ADP5520=y
CONFIG_MFD_AAT2870_CORE=y
# CONFIG_MFD_AT91_USART is not set
# CONFIG_MFD_ATMEL_FLEXCOM is not set
CONFIG_MFD_ATMEL_HLCDC=y
CONFIG_MFD_BCM590XX=y
# CONFIG_MFD_BD9571MWV is not set
CONFIG_MFD_AXP20X=y
CONFIG_MFD_AXP20X_I2C=y
CONFIG_MFD_MADERA=y
CONFIG_MFD_MADERA_I2C=y
# CONFIG_MFD_CS47L15 is not set
CONFIG_MFD_CS47L35=y
# CONFIG_MFD_CS47L85 is not set
# CONFIG_MFD_CS47L90 is not set
# CONFIG_MFD_CS47L92 is not set
# CONFIG_MFD_ASIC3 is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_DA9052_I2C is not set
CONFIG_MFD_DA9055=y
CONFIG_MFD_DA9062=y
CONFIG_MFD_DA9063=y
CONFIG_MFD_DA9150=y
CONFIG_MFD_DLN2=y
# CONFIG_MFD_ENE_KB3930 is not set
# CONFIG_MFD_EXYNOS_LPASS is not set
CONFIG_MFD_GATEWORKS_GSC=y
CONFIG_MFD_MC13XXX=y
CONFIG_MFD_MC13XXX_I2C=y
CONFIG_MFD_MP2629=y
# CONFIG_MFD_MXS_LRADC is not set
# CONFIG_MFD_MX25_TSADC is not set
# CONFIG_MFD_HI6421_PMIC is not set
# CONFIG_MFD_HI655X_PMIC is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_HTC_I2CPLD is not set
CONFIG_MFD_IQS62X=y
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
# CONFIG_MFD_88PM805 is not set
# CONFIG_MFD_88PM860X is not set
CONFIG_MFD_MAX14577=y
CONFIG_MFD_MAX77620=y
# CONFIG_MFD_MAX77650 is not set
CONFIG_MFD_MAX77686=y
CONFIG_MFD_MAX77693=y
# CONFIG_MFD_MAX77714 is not set
# CONFIG_MFD_MAX77843 is not set
# CONFIG_MFD_MAX8907 is not set
CONFIG_MFD_MAX8925=y
CONFIG_MFD_MAX8997=y
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_MT6360 is not set
CONFIG_MFD_MT6397=y
CONFIG_MFD_MENF21BMC=y
CONFIG_MFD_VIPERBOARD=y
# CONFIG_MFD_NTXEC is not set
CONFIG_MFD_RETU=y
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_PM8XXX is not set
# CONFIG_MFD_RT4831 is not set
CONFIG_MFD_RT5033=y
CONFIG_MFD_RC5T583=y
CONFIG_MFD_RK808=y
CONFIG_MFD_RN5T618=y
CONFIG_MFD_SEC_CORE=y
CONFIG_MFD_SI476X_CORE=y
# CONFIG_MFD_SIMPLE_MFD_I2C is not set
# CONFIG_MFD_SL28CPLD is not set
CONFIG_MFD_SM501=y
# CONFIG_MFD_SM501_GPIO is not set
CONFIG_MFD_SKY81452=y
CONFIG_ABX500_CORE=y
CONFIG_MFD_STMPE=y

#
# STMicroelectronics STMPE Interface Drivers
#
CONFIG_STMPE_I2C=y
# end of STMicroelectronics STMPE Interface Drivers

# CONFIG_MFD_SUN6I_PRCM is not set
# CONFIG_MFD_SYSCON is not set
CONFIG_MFD_TI_AM335X_TSCADC=y
CONFIG_MFD_LP3943=y
# CONFIG_MFD_LP8788 is not set
CONFIG_MFD_TI_LMU=y
# CONFIG_MFD_PALMAS is not set
# CONFIG_TPS6105X is not set
CONFIG_TPS65010=y
CONFIG_TPS6507X=y
CONFIG_MFD_TPS65086=y
CONFIG_MFD_TPS65090=y
# CONFIG_MFD_TPS65217 is not set
CONFIG_MFD_TI_LP873X=y
CONFIG_MFD_TI_LP87565=y
CONFIG_MFD_TPS65218=y
# CONFIG_MFD_TPS6586X is not set
CONFIG_MFD_TPS65910=y
# CONFIG_MFD_TPS65912_I2C is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_TWL6040_CORE is not set
CONFIG_MFD_WL1273_CORE=y
# CONFIG_MFD_LM3533 is not set
CONFIG_MFD_TC3589X=y
CONFIG_MFD_TQMX86=y
CONFIG_MFD_LOCHNAGAR=y
CONFIG_MFD_ARIZONA=y
CONFIG_MFD_ARIZONA_I2C=y
CONFIG_MFD_CS47L24=y
CONFIG_MFD_WM5102=y
# CONFIG_MFD_WM5110 is not set
CONFIG_MFD_WM8997=y
CONFIG_MFD_WM8998=y
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
CONFIG_MFD_WM8350=y
CONFIG_MFD_WM8350_I2C=y
# CONFIG_MFD_WM8994 is not set
# CONFIG_MFD_STW481X is not set
# CONFIG_MFD_ROHM_BD718XX is not set
CONFIG_MFD_ROHM_BD71828=y
# CONFIG_MFD_ROHM_BD957XMUF is not set
# CONFIG_MFD_STM32_LPTIMER is not set
# CONFIG_MFD_STM32_TIMERS is not set
CONFIG_MFD_STPMIC1=y
CONFIG_MFD_STMFX=y
# CONFIG_MFD_ATC260X_I2C is not set
# CONFIG_MFD_KHADAS_MCU is not set
# CONFIG_MFD_ACER_A500_EC is not set
# CONFIG_MFD_QCOM_PM8008 is not set
CONFIG_RAVE_SP_CORE=y
# CONFIG_MFD_RSMU_I2C is not set
# end of Multifunction device drivers

CONFIG_REGULATOR=y
# CONFIG_REGULATOR_DEBUG is not set
CONFIG_REGULATOR_FIXED_VOLTAGE=y
CONFIG_REGULATOR_VIRTUAL_CONSUMER=y
# CONFIG_REGULATOR_USERSPACE_CONSUMER is not set
CONFIG_REGULATOR_88PG86X=y
# CONFIG_REGULATOR_ACT8865 is not set
# CONFIG_REGULATOR_ACT8945A is not set
CONFIG_REGULATOR_AD5398=y
CONFIG_REGULATOR_AAT2870=y
CONFIG_REGULATOR_ARIZONA_LDO1=y
CONFIG_REGULATOR_ARIZONA_MICSUPP=y
CONFIG_REGULATOR_AS3722=y
CONFIG_REGULATOR_AXP20X=y
CONFIG_REGULATOR_BCM590XX=y
# CONFIG_REGULATOR_BD71815 is not set
# CONFIG_REGULATOR_BD71828 is not set
CONFIG_REGULATOR_DA9055=y
CONFIG_REGULATOR_DA9062=y
CONFIG_REGULATOR_DA9063=y
# CONFIG_REGULATOR_DA9121 is not set
CONFIG_REGULATOR_DA9210=y
CONFIG_REGULATOR_DA9211=y
# CONFIG_REGULATOR_FAN53555 is not set
CONFIG_REGULATOR_FAN53880=y
CONFIG_REGULATOR_GPIO=y
CONFIG_REGULATOR_ISL9305=y
# CONFIG_REGULATOR_ISL6271A is not set
CONFIG_REGULATOR_LM363X=y
CONFIG_REGULATOR_LOCHNAGAR=y
CONFIG_REGULATOR_LP3971=y
CONFIG_REGULATOR_LP3972=y
CONFIG_REGULATOR_LP872X=y
# CONFIG_REGULATOR_LP873X is not set
CONFIG_REGULATOR_LP8755=y
CONFIG_REGULATOR_LP87565=y
CONFIG_REGULATOR_LTC3589=y
CONFIG_REGULATOR_LTC3676=y
CONFIG_REGULATOR_MAX14577=y
CONFIG_REGULATOR_MAX1586=y
CONFIG_REGULATOR_MAX77620=y
# CONFIG_REGULATOR_MAX77650 is not set
CONFIG_REGULATOR_MAX8649=y
CONFIG_REGULATOR_MAX8660=y
# CONFIG_REGULATOR_MAX8893 is not set
# CONFIG_REGULATOR_MAX8907 is not set
CONFIG_REGULATOR_MAX8925=y
CONFIG_REGULATOR_MAX8952=y
CONFIG_REGULATOR_MAX8973=y
# CONFIG_REGULATOR_MAX8997 is not set
# CONFIG_REGULATOR_MAX20086 is not set
# CONFIG_REGULATOR_MAX77686 is not set
# CONFIG_REGULATOR_MAX77693 is not set
CONFIG_REGULATOR_MAX77802=y
CONFIG_REGULATOR_MAX77826=y
CONFIG_REGULATOR_MC13XXX_CORE=y
CONFIG_REGULATOR_MC13783=y
CONFIG_REGULATOR_MC13892=y
CONFIG_REGULATOR_MCP16502=y
CONFIG_REGULATOR_MP5416=y
CONFIG_REGULATOR_MP8859=y
# CONFIG_REGULATOR_MP886X is not set
# CONFIG_REGULATOR_MPQ7920 is not set
CONFIG_REGULATOR_MT6311=y
CONFIG_REGULATOR_MT6323=y
CONFIG_REGULATOR_MT6358=y
# CONFIG_REGULATOR_MT6359 is not set
CONFIG_REGULATOR_MT6397=y
CONFIG_REGULATOR_PCA9450=y
# CONFIG_REGULATOR_PF8X00 is not set
CONFIG_REGULATOR_PFUZE100=y
CONFIG_REGULATOR_PV88060=y
# CONFIG_REGULATOR_PV88080 is not set
# CONFIG_REGULATOR_PV88090 is not set
# CONFIG_REGULATOR_PWM is not set
# CONFIG_REGULATOR_QCOM_RPMH is not set
# CONFIG_REGULATOR_QCOM_SPMI is not set
# CONFIG_REGULATOR_QCOM_USB_VBUS is not set
# CONFIG_REGULATOR_RASPBERRYPI_TOUCHSCREEN_ATTINY is not set
# CONFIG_REGULATOR_RC5T583 is not set
CONFIG_REGULATOR_RK808=y
# CONFIG_REGULATOR_RN5T618 is not set
# CONFIG_REGULATOR_RT4801 is not set
CONFIG_REGULATOR_RT5033=y
# CONFIG_REGULATOR_RT5190A is not set
# CONFIG_REGULATOR_RT5759 is not set
# CONFIG_REGULATOR_RT6160 is not set
# CONFIG_REGULATOR_RT6245 is not set
# CONFIG_REGULATOR_RTQ2134 is not set
CONFIG_REGULATOR_RTMV20=y
# CONFIG_REGULATOR_RTQ6752 is not set
CONFIG_REGULATOR_S2MPA01=y
CONFIG_REGULATOR_S2MPS11=y
CONFIG_REGULATOR_S5M8767=y
# CONFIG_REGULATOR_SC2731 is not set
# CONFIG_REGULATOR_SKY81452 is not set
# CONFIG_REGULATOR_SLG51000 is not set
# CONFIG_REGULATOR_STM32_BOOSTER is not set
# CONFIG_REGULATOR_STM32_VREFBUF is not set
# CONFIG_REGULATOR_STM32_PWR is not set
# CONFIG_REGULATOR_STPMIC1 is not set
# CONFIG_REGULATOR_TI_ABB is not set
# CONFIG_REGULATOR_STW481X_VMMC is not set
# CONFIG_REGULATOR_SY7636A is not set
CONFIG_REGULATOR_SY8106A=y
CONFIG_REGULATOR_SY8824X=y
# CONFIG_REGULATOR_SY8827N is not set
# CONFIG_REGULATOR_TPS51632 is not set
CONFIG_REGULATOR_TPS62360=y
# CONFIG_REGULATOR_TPS6286X is not set
# CONFIG_REGULATOR_TPS65023 is not set
# CONFIG_REGULATOR_TPS6507X is not set
CONFIG_REGULATOR_TPS65086=y
# CONFIG_REGULATOR_TPS65090 is not set
# CONFIG_REGULATOR_TPS65132 is not set
CONFIG_REGULATOR_TPS65218=y
CONFIG_REGULATOR_TPS65910=y
# CONFIG_REGULATOR_TPS68470 is not set
# CONFIG_REGULATOR_UNIPHIER is not set
# CONFIG_REGULATOR_VCTRL is not set
CONFIG_REGULATOR_WM8350=y
# CONFIG_REGULATOR_QCOM_LABIBB is not set
CONFIG_RC_CORE=y
# CONFIG_BPF_LIRC_MODE2 is not set
CONFIG_LIRC=y
# CONFIG_RC_MAP is not set
CONFIG_RC_DECODERS=y
CONFIG_IR_IMON_DECODER=y
CONFIG_IR_JVC_DECODER=y
# CONFIG_IR_MCE_KBD_DECODER is not set
CONFIG_IR_NEC_DECODER=y
CONFIG_IR_RC5_DECODER=y
# CONFIG_IR_RC6_DECODER is not set
CONFIG_IR_RCMM_DECODER=y
# CONFIG_IR_SANYO_DECODER is not set
# CONFIG_IR_SHARP_DECODER is not set
CONFIG_IR_SONY_DECODER=y
CONFIG_IR_XMP_DECODER=y
# CONFIG_RC_DEVICES is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

CONFIG_MEDIA_SUPPORT=y
# CONFIG_MEDIA_SUPPORT_FILTER is not set
CONFIG_MEDIA_SUBDRV_AUTOSELECT=y

#
# Media device types
#
CONFIG_MEDIA_CAMERA_SUPPORT=y
CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
CONFIG_MEDIA_RADIO_SUPPORT=y
CONFIG_MEDIA_SDR_SUPPORT=y
CONFIG_MEDIA_PLATFORM_SUPPORT=y
CONFIG_MEDIA_TEST_SUPPORT=y
# end of Media device types

#
# Media core support
#
# CONFIG_VIDEO_DEV is not set
# CONFIG_MEDIA_CONTROLLER is not set
CONFIG_DVB_CORE=y
# end of Media core support

#
# Digital TV options
#
CONFIG_DVB_MAX_ADAPTERS=16
# CONFIG_DVB_DYNAMIC_MINORS is not set
# CONFIG_DVB_DEMUX_SECTION_LOSS_LOG is not set
CONFIG_DVB_ULE_DEBUG=y
# end of Digital TV options

#
# Media drivers
#

#
# Media drivers
#
# CONFIG_MEDIA_USB_SUPPORT is not set
CONFIG_MEDIA_PLATFORM_DRIVERS=y
CONFIG_V4L_PLATFORM_DRIVERS=y
CONFIG_SDR_PLATFORM_DRIVERS=y
# CONFIG_DVB_PLATFORM_DRIVERS is not set

#
# Allegro DVT media platform drivers
#

#
# Amlogic media platform drivers
#

#
# Amphion drivers
#

#
# Aspeed media platform drivers
#

#
# Atmel media platform drivers
#

#
# Cadence media platform drivers
#

#
# Chips&Media media platform drivers
#

#
# Intel media platform drivers
#

#
# Marvell media platform drivers
#

#
# Mediatek media platform drivers
#

#
# NVidia media platform drivers
#

#
# NXP media platform drivers
#

#
# Qualcomm media platform drivers
#

#
# Renesas media platform drivers
#

#
# Rockchip media platform drivers
#

#
# Samsung media platform drivers
#

#
# STMicroelectronics media platform drivers
#

#
# Sunxi media platform drivers
#

#
# Texas Instruments drivers
#

#
# VIA media platform drivers
#

#
# Xilinx media platform drivers
#

#
# MMC/SDIO DVB adapters
#
CONFIG_SMS_SDIO_DRV=y
# CONFIG_DVB_TEST_DRIVERS is not set
CONFIG_MEDIA_COMMON_OPTIONS=y

#
# common driver options
#
CONFIG_SMS_SIANO_MDTV=y
# CONFIG_SMS_SIANO_RC is not set
# end of Media drivers

#
# Media ancillary drivers
#
CONFIG_MEDIA_TUNER=y

#
# Customize TV tuners
#
CONFIG_MEDIA_TUNER_FC0011=y
CONFIG_MEDIA_TUNER_FC0012=y
CONFIG_MEDIA_TUNER_FC0013=y
CONFIG_MEDIA_TUNER_IT913X=y
# CONFIG_MEDIA_TUNER_M88RS6000T is not set
CONFIG_MEDIA_TUNER_MAX2165=y
CONFIG_MEDIA_TUNER_MC44S803=y
CONFIG_MEDIA_TUNER_MT2060=y
CONFIG_MEDIA_TUNER_MT2063=y
CONFIG_MEDIA_TUNER_MT20XX=y
CONFIG_MEDIA_TUNER_MT2131=y
CONFIG_MEDIA_TUNER_MT2266=y
# CONFIG_MEDIA_TUNER_MXL301RF is not set
# CONFIG_MEDIA_TUNER_MXL5005S is not set
CONFIG_MEDIA_TUNER_MXL5007T=y
# CONFIG_MEDIA_TUNER_QM1D1B0004 is not set
# CONFIG_MEDIA_TUNER_QM1D1C0042 is not set
CONFIG_MEDIA_TUNER_QT1010=y
CONFIG_MEDIA_TUNER_R820T=y
CONFIG_MEDIA_TUNER_SI2157=y
CONFIG_MEDIA_TUNER_SIMPLE=y
CONFIG_MEDIA_TUNER_TDA18212=y
CONFIG_MEDIA_TUNER_TDA18218=y
CONFIG_MEDIA_TUNER_TDA18250=y
CONFIG_MEDIA_TUNER_TDA18271=y
CONFIG_MEDIA_TUNER_TDA827X=y
CONFIG_MEDIA_TUNER_TDA8290=y
CONFIG_MEDIA_TUNER_TDA9887=y
CONFIG_MEDIA_TUNER_TEA5761=y
CONFIG_MEDIA_TUNER_TEA5767=y
# CONFIG_MEDIA_TUNER_TUA9001 is not set
CONFIG_MEDIA_TUNER_XC2028=y
CONFIG_MEDIA_TUNER_XC4000=y
CONFIG_MEDIA_TUNER_XC5000=y
# end of Customize TV tuners

#
# Customise DVB Frontends
#

#
# Multistandard (satellite) frontends
#
CONFIG_DVB_M88DS3103=y
CONFIG_DVB_MXL5XX=y
CONFIG_DVB_STB0899=y
CONFIG_DVB_STB6100=y
# CONFIG_DVB_STV090x is not set
CONFIG_DVB_STV0910=y
CONFIG_DVB_STV6110x=y
# CONFIG_DVB_STV6111 is not set

#
# Multistandard (cable + terrestrial) frontends
#
CONFIG_DVB_DRXK=y
CONFIG_DVB_MN88472=y
# CONFIG_DVB_MN88473 is not set
CONFIG_DVB_SI2165=y
# CONFIG_DVB_TDA18271C2DD is not set

#
# DVB-S (satellite) frontends
#
CONFIG_DVB_CX24110=y
CONFIG_DVB_CX24116=y
CONFIG_DVB_CX24117=y
CONFIG_DVB_CX24120=y
# CONFIG_DVB_CX24123 is not set
# CONFIG_DVB_DS3000 is not set
CONFIG_DVB_MB86A16=y
CONFIG_DVB_MT312=y
CONFIG_DVB_S5H1420=y
CONFIG_DVB_SI21XX=y
CONFIG_DVB_STB6000=y
CONFIG_DVB_STV0288=y
CONFIG_DVB_STV0299=y
CONFIG_DVB_STV0900=y
CONFIG_DVB_STV6110=y
# CONFIG_DVB_TDA10071 is not set
CONFIG_DVB_TDA10086=y
CONFIG_DVB_TDA8083=y
CONFIG_DVB_TDA8261=y
# CONFIG_DVB_TDA826X is not set
# CONFIG_DVB_TS2020 is not set
CONFIG_DVB_TUA6100=y
# CONFIG_DVB_TUNER_CX24113 is not set
# CONFIG_DVB_TUNER_ITD1000 is not set
# CONFIG_DVB_VES1X93 is not set
# CONFIG_DVB_ZL10036 is not set
# CONFIG_DVB_ZL10039 is not set

#
# DVB-T (terrestrial) frontends
#
CONFIG_DVB_AF9013=y
# CONFIG_DVB_CX22700 is not set
CONFIG_DVB_CX22702=y
CONFIG_DVB_CXD2820R=y
# CONFIG_DVB_CXD2841ER is not set
# CONFIG_DVB_DIB3000MB is not set
# CONFIG_DVB_DIB3000MC is not set
# CONFIG_DVB_DIB7000M is not set
CONFIG_DVB_DIB7000P=y
CONFIG_DVB_DIB9000=y
# CONFIG_DVB_DRXD is not set
# CONFIG_DVB_EC100 is not set
CONFIG_DVB_L64781=y
CONFIG_DVB_MT352=y
# CONFIG_DVB_NXT6000 is not set
# CONFIG_DVB_RTL2830 is not set
CONFIG_DVB_RTL2832=y
CONFIG_DVB_S5H1432=y
CONFIG_DVB_SI2168=y
CONFIG_DVB_SP887X=y
CONFIG_DVB_STV0367=y
# CONFIG_DVB_TDA10048 is not set
CONFIG_DVB_TDA1004X=y
# CONFIG_DVB_ZD1301_DEMOD is not set
CONFIG_DVB_ZL10353=y

#
# DVB-C (cable) frontends
#
# CONFIG_DVB_STV0297 is not set
CONFIG_DVB_TDA10021=y
# CONFIG_DVB_TDA10023 is not set
CONFIG_DVB_VES1820=y

#
# ATSC (North American/Korean Terrestrial/Cable DTV) frontends
#
CONFIG_DVB_AU8522=y
CONFIG_DVB_AU8522_DTV=y
CONFIG_DVB_BCM3510=y
CONFIG_DVB_LG2160=y
# CONFIG_DVB_LGDT3305 is not set
CONFIG_DVB_LGDT3306A=y
CONFIG_DVB_LGDT330X=y
# CONFIG_DVB_MXL692 is not set
CONFIG_DVB_NXT200X=y
# CONFIG_DVB_OR51132 is not set
CONFIG_DVB_OR51211=y
CONFIG_DVB_S5H1409=y
CONFIG_DVB_S5H1411=y

#
# ISDB-T (terrestrial) frontends
#
CONFIG_DVB_DIB8000=y
CONFIG_DVB_MB86A20S=y
CONFIG_DVB_S921=y

#
# ISDB-S (satellite) & ISDB-T (terrestrial) frontends
#
CONFIG_DVB_MN88443X=y
CONFIG_DVB_TC90522=y

#
# Digital terrestrial only tuners/PLL
#
CONFIG_DVB_PLL=y
CONFIG_DVB_TUNER_DIB0070=y
CONFIG_DVB_TUNER_DIB0090=y

#
# SEC control devices for DVB-S
#
CONFIG_DVB_A8293=y
CONFIG_DVB_AF9033=y
CONFIG_DVB_ASCOT2E=y
CONFIG_DVB_ATBM8830=y
CONFIG_DVB_HELENE=y
# CONFIG_DVB_HORUS3A is not set
CONFIG_DVB_ISL6405=y
CONFIG_DVB_ISL6421=y
CONFIG_DVB_ISL6423=y
CONFIG_DVB_IX2505V=y
CONFIG_DVB_LGS8GL5=y
CONFIG_DVB_LGS8GXX=y
CONFIG_DVB_LNBH25=y
# CONFIG_DVB_LNBH29 is not set
CONFIG_DVB_LNBP21=y
CONFIG_DVB_LNBP22=y
CONFIG_DVB_M88RS2000=y
CONFIG_DVB_TDA665x=y
# CONFIG_DVB_DRX39XYJ is not set

#
# Common Interface (EN50221) controller drivers
#
CONFIG_DVB_CXD2099=y
CONFIG_DVB_SP2=y
# end of Customise DVB Frontends

#
# Tools to develop new frontends
#
CONFIG_DVB_DUMMY_FE=y
# end of Media ancillary drivers

#
# Graphics support
#
# CONFIG_IMX_IPUV3_CORE is not set
# CONFIG_DRM is not set
CONFIG_DRM_DEBUG_MODESET_LOCK=y

#
# ARM devices
#
# end of ARM devices

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_FB_SYS_FILLRECT=y
CONFIG_FB_SYS_COPYAREA=y
CONFIG_FB_SYS_IMAGEBLIT=y
CONFIG_FB_FOREIGN_ENDIAN=y
# CONFIG_FB_BOTH_ENDIAN is not set
CONFIG_FB_BIG_ENDIAN=y
# CONFIG_FB_LITTLE_ENDIAN is not set
CONFIG_FB_SYS_FOPS=y
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_MODE_HELPERS=y
# CONFIG_FB_TILEBLITTING is not set

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CLPS711X is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_CONTROL is not set
# CONFIG_FB_GBE is not set
# CONFIG_FB_PVR2 is not set
CONFIG_FB_OPENCORES=y
CONFIG_FB_S1D13XXX=y
# CONFIG_FB_WM8505 is not set
# CONFIG_FB_W100 is not set
# CONFIG_FB_TMIO is not set
# CONFIG_FB_SM501 is not set
CONFIG_FB_SMSCUFX=y
CONFIG_FB_UDL=y
# CONFIG_FB_IBM_GXT4500 is not set
CONFIG_FB_GOLDFISH=y
# CONFIG_FB_VIRTUAL is not set
CONFIG_FB_METRONOME=y
# CONFIG_FB_BROADSHEET is not set
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SSD1307 is not set
CONFIG_FB_OMAP_LCD_H3=y
# CONFIG_FB_OMAP2 is not set
# CONFIG_MMP_DISP is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=y
# CONFIG_LCD_PLATFORM is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_KTD253=y
# CONFIG_BACKLIGHT_OMAP1 is not set
CONFIG_BACKLIGHT_PWM=y
# CONFIG_BACKLIGHT_MAX8925 is not set
# CONFIG_BACKLIGHT_QCOM_WLED is not set
CONFIG_BACKLIGHT_ADP5520=y
CONFIG_BACKLIGHT_ADP8860=y
# CONFIG_BACKLIGHT_ADP8870 is not set
CONFIG_BACKLIGHT_AAT2870=y
CONFIG_BACKLIGHT_LM3630A=y
CONFIG_BACKLIGHT_LM3639=y
CONFIG_BACKLIGHT_LP855X=y
# CONFIG_BACKLIGHT_SKY81452 is not set
CONFIG_BACKLIGHT_GPIO=y
# CONFIG_BACKLIGHT_LV5207LP is not set
CONFIG_BACKLIGHT_BD6107=y
CONFIG_BACKLIGHT_ARCXCNN=y
CONFIG_BACKLIGHT_RAVE_SP=y
CONFIG_BACKLIGHT_LED=y
# end of Backlight & LCD device support

CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# end of Graphics support

CONFIG_SOUND=y
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_DMAENGINE_PCM=y
CONFIG_SND_SEQ_DEVICE=y
CONFIG_SND_RAWMIDI=y
CONFIG_SND_JACK=y
CONFIG_SND_JACK_INPUT_DEV=y
# CONFIG_SND_OSSEMUL is not set
CONFIG_SND_PCM_TIMER=y
CONFIG_SND_HRTIMER=y
# CONFIG_SND_DYNAMIC_MINORS is not set
CONFIG_SND_SUPPORT_OLD_API=y
CONFIG_SND_PROC_FS=y
# CONFIG_SND_VERBOSE_PROCFS is not set
CONFIG_SND_VERBOSE_PRINTK=y
CONFIG_SND_DEBUG=y
# CONFIG_SND_DEBUG_VERBOSE is not set
# CONFIG_SND_CTL_VALIDATION is not set
# CONFIG_SND_JACK_INJECTION_DEBUG is not set
CONFIG_SND_SEQUENCER=y
CONFIG_SND_SEQ_DUMMY=y
# CONFIG_SND_SEQ_HRTIMER_DEFAULT is not set
CONFIG_SND_SEQ_MIDI_EVENT=y
CONFIG_SND_SEQ_MIDI=y
CONFIG_SND_SEQ_VIRMIDI=y
CONFIG_SND_MPU401_UART=y
CONFIG_SND_DRIVERS=y
# CONFIG_SND_DUMMY is not set
CONFIG_SND_ALOOP=y
CONFIG_SND_VIRMIDI=y
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_SERIAL_U16550 is not set
# CONFIG_SND_SERIAL_GENERIC is not set
CONFIG_SND_MPU401=y

#
# HD-Audio
#
# end of HD-Audio

CONFIG_SND_HDA_PREALLOC_SIZE=64
# CONFIG_SND_USB is not set
CONFIG_SND_SOC=y
CONFIG_SND_SOC_GENERIC_DMAENGINE_PCM=y
# CONFIG_SND_SOC_UTILS_KUNIT_TEST is not set
# CONFIG_SND_SOC_ADI is not set
# CONFIG_SND_SOC_AMD_ACP is not set
# CONFIG_SND_AMD_ACP_CONFIG is not set
# CONFIG_SND_ATMEL_SOC is not set
# CONFIG_SND_BCM2835_SOC_I2S is not set
# CONFIG_SND_SOC_CYGNUS is not set
CONFIG_SND_BCM63XX_I2S_WHISTLER=y
# CONFIG_SND_EP93XX_SOC is not set

#
# SoC Audio for Freescale CPUs
#

#
# Common SoC Audio options for Freescale CPUs:
#
CONFIG_SND_SOC_FSL_ASRC=y
CONFIG_SND_SOC_FSL_SAI=y
CONFIG_SND_SOC_FSL_MQS=y
# CONFIG_SND_SOC_FSL_AUDMIX is not set
# CONFIG_SND_SOC_FSL_SSI is not set
# CONFIG_SND_SOC_FSL_SPDIF is not set
CONFIG_SND_SOC_FSL_ESAI=y
# CONFIG_SND_SOC_FSL_MICFIL is not set
# CONFIG_SND_SOC_FSL_EASRC is not set
# CONFIG_SND_SOC_FSL_XCVR is not set
# CONFIG_SND_SOC_FSL_AUD2HTX is not set
CONFIG_SND_SOC_IMX_AUDMUX=y
# CONFIG_SND_IMX_SOC is not set
# end of SoC Audio for Freescale CPUs

CONFIG_SND_I2S_HI6210_I2S=y
# CONFIG_SND_JZ4740_SOC_I2S is not set
# CONFIG_SND_KIRKWOOD_SOC is not set
CONFIG_SND_SOC_IMG=y
CONFIG_SND_SOC_IMG_I2S_IN=y
CONFIG_SND_SOC_IMG_I2S_OUT=y
CONFIG_SND_SOC_IMG_PARALLEL_OUT=y
# CONFIG_SND_SOC_IMG_SPDIF_IN is not set
CONFIG_SND_SOC_IMG_SPDIF_OUT=y
CONFIG_SND_SOC_IMG_PISTACHIO_INTERNAL_DAC=y
CONFIG_SND_SOC_INTEL_SST_TOPLEVEL=y
CONFIG_SND_SOC_INTEL_MACH=y
# CONFIG_SND_SOC_INTEL_USER_FRIENDLY_LONG_NAMES is not set
# CONFIG_SND_SOC_MTK_BTCVSD is not set
# CONFIG_SND_PXA2XX_SOC is not set
# CONFIG_SND_SOC_QCOM is not set
# CONFIG_SND_SOC_ROCKCHIP is not set

#
# SoC Audio support for Renesas SoCs
#
# CONFIG_SND_SOC_RZ is not set
# end of SoC Audio support for Renesas SoCs

CONFIG_SND_SOC_SOF_TOPLEVEL=y
# CONFIG_SND_SOC_SOF_ACPI is not set
CONFIG_SND_SOC_SOF_OF=y
# CONFIG_SND_SOC_SOF_AMD_TOPLEVEL is not set
# CONFIG_SND_SOC_SOF_IMX_TOPLEVEL is not set
# CONFIG_SND_SOC_SOF_INTEL_TOPLEVEL is not set
# CONFIG_SND_SOC_SOF_MTK_TOPLEVEL is not set
# CONFIG_SND_SOC_SPRD is not set
# CONFIG_SND_SOC_STI is not set

#
# STMicroelectronics STM32 SOC audio support
#
# CONFIG_SND_SOC_STM32_SPDIFRX is not set
# end of STMicroelectronics STM32 SOC audio support

#
# Allwinner SoC Audio support
#
# CONFIG_SND_SUN4I_CODEC is not set
# CONFIG_SND_SUN8I_CODEC_ANALOG is not set
# CONFIG_SND_SUN50I_CODEC_ANALOG is not set
# CONFIG_SND_SUN4I_I2S is not set
# CONFIG_SND_SUN4I_SPDIF is not set
# end of Allwinner SoC Audio support

#
# Audio support for Texas Instruments SoCs
#

#
# Texas Instruments DAI support for:
#
# CONFIG_SND_SOC_DAVINCI_ASP is not set
# CONFIG_SND_SOC_DAVINCI_VCIF is not set
# CONFIG_SND_SOC_OMAP_MCPDM is not set

#
# Audio support for boards with Texas Instruments SoCs
#
# CONFIG_SND_SOC_OMAP_HDMI is not set
# end of Audio support for Texas Instruments SoCs

# CONFIG_SND_SOC_UNIPHIER is not set
# CONFIG_SND_SOC_XILINX_I2S is not set
CONFIG_SND_SOC_XILINX_AUDIO_FORMATTER=y
# CONFIG_SND_SOC_XILINX_SPDIF is not set
CONFIG_SND_SOC_XTFPGA_I2S=y
CONFIG_SND_SOC_I2C_AND_SPI=y

#
# CODEC drivers
#
# CONFIG_SND_SOC_ALL_CODECS is not set
# CONFIG_SND_SOC_AC97_CODEC is not set
# CONFIG_SND_SOC_ADAU1372_I2C is not set
CONFIG_SND_SOC_ADAU1701=y
# CONFIG_SND_SOC_ADAU1761_I2C is not set
CONFIG_SND_SOC_ADAU7002=y
CONFIG_SND_SOC_ADAU7118=y
CONFIG_SND_SOC_ADAU7118_HW=y
CONFIG_SND_SOC_ADAU7118_I2C=y
# CONFIG_SND_SOC_AK4118 is not set
# CONFIG_SND_SOC_AK4375 is not set
CONFIG_SND_SOC_AK4458=y
# CONFIG_SND_SOC_AK4554 is not set
# CONFIG_SND_SOC_AK4613 is not set
CONFIG_SND_SOC_AK4642=y
# CONFIG_SND_SOC_AK5386 is not set
# CONFIG_SND_SOC_AK5558 is not set
# CONFIG_SND_SOC_ALC5623 is not set
# CONFIG_SND_SOC_AW8738 is not set
# CONFIG_SND_SOC_BD28623 is not set
# CONFIG_SND_SOC_BT_SCO is not set
CONFIG_SND_SOC_CS35L32=y
CONFIG_SND_SOC_CS35L33=y
# CONFIG_SND_SOC_CS35L34 is not set
CONFIG_SND_SOC_CS35L35=y
CONFIG_SND_SOC_CS35L36=y
# CONFIG_SND_SOC_CS35L41_I2C is not set
# CONFIG_SND_SOC_CS35L45_I2C is not set
CONFIG_SND_SOC_CS42L42=y
CONFIG_SND_SOC_CS42L51=y
CONFIG_SND_SOC_CS42L51_I2C=y
CONFIG_SND_SOC_CS42L52=y
CONFIG_SND_SOC_CS42L56=y
# CONFIG_SND_SOC_CS42L73 is not set
CONFIG_SND_SOC_CS4234=y
# CONFIG_SND_SOC_CS4265 is not set
# CONFIG_SND_SOC_CS4270 is not set
# CONFIG_SND_SOC_CS4271_I2C is not set
CONFIG_SND_SOC_CS42XX8=y
CONFIG_SND_SOC_CS42XX8_I2C=y
# CONFIG_SND_SOC_CS43130 is not set
# CONFIG_SND_SOC_CS4341 is not set
# CONFIG_SND_SOC_CS4349 is not set
CONFIG_SND_SOC_CS53L30=y
CONFIG_SND_SOC_CX2072X=y
# CONFIG_SND_SOC_JZ4740_CODEC is not set
# CONFIG_SND_SOC_JZ4725B_CODEC is not set
# CONFIG_SND_SOC_JZ4760_CODEC is not set
# CONFIG_SND_SOC_JZ4770_CODEC is not set
CONFIG_SND_SOC_DA7213=y
CONFIG_SND_SOC_DMIC=y
# CONFIG_SND_SOC_ES7134 is not set
CONFIG_SND_SOC_ES7241=y
# CONFIG_SND_SOC_ES8316 is not set
CONFIG_SND_SOC_ES8328=y
CONFIG_SND_SOC_ES8328_I2C=y
# CONFIG_SND_SOC_GTM601 is not set
# CONFIG_SND_SOC_ICS43432 is not set
CONFIG_SND_SOC_INNO_RK3036=y
CONFIG_SND_SOC_LOCHNAGAR_SC=y
CONFIG_SND_SOC_MAX98088=y
CONFIG_SND_SOC_MAX98357A=y
# CONFIG_SND_SOC_MAX98504 is not set
CONFIG_SND_SOC_MAX9867=y
# CONFIG_SND_SOC_MAX98927 is not set
# CONFIG_SND_SOC_MAX98520 is not set
CONFIG_SND_SOC_MAX98373=y
# CONFIG_SND_SOC_MAX98373_I2C is not set
CONFIG_SND_SOC_MAX98373_SDW=y
CONFIG_SND_SOC_MAX98390=y
# CONFIG_SND_SOC_MAX98396 is not set
# CONFIG_SND_SOC_MAX9860 is not set
# CONFIG_SND_SOC_MSM8916_WCD_ANALOG is not set
CONFIG_SND_SOC_MSM8916_WCD_DIGITAL=y
# CONFIG_SND_SOC_PCM1681 is not set
# CONFIG_SND_SOC_PCM1789_I2C is not set
CONFIG_SND_SOC_PCM179X=y
CONFIG_SND_SOC_PCM179X_I2C=y
CONFIG_SND_SOC_PCM186X=y
CONFIG_SND_SOC_PCM186X_I2C=y
CONFIG_SND_SOC_PCM3060=y
CONFIG_SND_SOC_PCM3060_I2C=y
CONFIG_SND_SOC_PCM3168A=y
CONFIG_SND_SOC_PCM3168A_I2C=y
# CONFIG_SND_SOC_PCM5102A is not set
CONFIG_SND_SOC_PCM512x=y
CONFIG_SND_SOC_PCM512x_I2C=y
CONFIG_SND_SOC_RK3328=y
# CONFIG_SND_SOC_RK817 is not set
CONFIG_SND_SOC_RL6231=y
# CONFIG_SND_SOC_RT1308_SDW is not set
# CONFIG_SND_SOC_RT1316_SDW is not set
CONFIG_SND_SOC_RT5616=y
# CONFIG_SND_SOC_RT5631 is not set
# CONFIG_SND_SOC_RT5640 is not set
# CONFIG_SND_SOC_RT5659 is not set
CONFIG_SND_SOC_RT5682=y
CONFIG_SND_SOC_RT5682_SDW=y
CONFIG_SND_SOC_RT700=y
CONFIG_SND_SOC_RT700_SDW=y
CONFIG_SND_SOC_RT711=y
CONFIG_SND_SOC_RT711_SDW=y
# CONFIG_SND_SOC_RT711_SDCA_SDW is not set
# CONFIG_SND_SOC_RT715_SDW is not set
# CONFIG_SND_SOC_RT715_SDCA_SDW is not set
# CONFIG_SND_SOC_RT9120 is not set
# CONFIG_SND_SOC_SDW_MOCKUP is not set
# CONFIG_SND_SOC_SGTL5000 is not set
CONFIG_SND_SOC_SIGMADSP=y
CONFIG_SND_SOC_SIGMADSP_I2C=y
CONFIG_SND_SOC_SIMPLE_AMPLIFIER=y
# CONFIG_SND_SOC_SIMPLE_MUX is not set
# CONFIG_SND_SOC_SPDIF is not set
CONFIG_SND_SOC_SSM2305=y
# CONFIG_SND_SOC_SSM2518 is not set
CONFIG_SND_SOC_SSM2602=y
CONFIG_SND_SOC_SSM2602_I2C=y
CONFIG_SND_SOC_SSM4567=y
# CONFIG_SND_SOC_STA32X is not set
# CONFIG_SND_SOC_STA350 is not set
CONFIG_SND_SOC_STI_SAS=y
CONFIG_SND_SOC_TAS2552=y
CONFIG_SND_SOC_TAS2562=y
CONFIG_SND_SOC_TAS2764=y
# CONFIG_SND_SOC_TAS2770 is not set
CONFIG_SND_SOC_TAS5086=y
# CONFIG_SND_SOC_TAS571X is not set
CONFIG_SND_SOC_TAS5720=y
# CONFIG_SND_SOC_TAS5805M is not set
CONFIG_SND_SOC_TAS6424=y
CONFIG_SND_SOC_TDA7419=y
# CONFIG_SND_SOC_TFA9879 is not set
# CONFIG_SND_SOC_TFA989X is not set
# CONFIG_SND_SOC_TLV320ADC3XXX is not set
# CONFIG_SND_SOC_TLV320AIC23_I2C is not set
# CONFIG_SND_SOC_TLV320AIC31XX is not set
# CONFIG_SND_SOC_TLV320AIC3X_I2C is not set
# CONFIG_SND_SOC_TLV320ADCX140 is not set
CONFIG_SND_SOC_TS3A227E=y
CONFIG_SND_SOC_TSCS42XX=y
# CONFIG_SND_SOC_TSCS454 is not set
CONFIG_SND_SOC_UDA1334=y
# CONFIG_SND_SOC_WCD938X_SDW is not set
CONFIG_SND_SOC_WM8510=y
CONFIG_SND_SOC_WM8523=y
CONFIG_SND_SOC_WM8524=y
# CONFIG_SND_SOC_WM8580 is not set
CONFIG_SND_SOC_WM8711=y
# CONFIG_SND_SOC_WM8728 is not set
# CONFIG_SND_SOC_WM8731_I2C is not set
CONFIG_SND_SOC_WM8737=y
CONFIG_SND_SOC_WM8741=y
# CONFIG_SND_SOC_WM8750 is not set
CONFIG_SND_SOC_WM8753=y
CONFIG_SND_SOC_WM8776=y
# CONFIG_SND_SOC_WM8782 is not set
CONFIG_SND_SOC_WM8804=y
CONFIG_SND_SOC_WM8804_I2C=y
CONFIG_SND_SOC_WM8903=y
CONFIG_SND_SOC_WM8904=y
# CONFIG_SND_SOC_WM8940 is not set
CONFIG_SND_SOC_WM8960=y
CONFIG_SND_SOC_WM8962=y
# CONFIG_SND_SOC_WM8974 is not set
CONFIG_SND_SOC_WM8978=y
CONFIG_SND_SOC_WM8985=y
# CONFIG_SND_SOC_WSA881X is not set
CONFIG_SND_SOC_MAX9759=y
# CONFIG_SND_SOC_MT6351 is not set
# CONFIG_SND_SOC_MT6358 is not set
CONFIG_SND_SOC_MT6660=y
# CONFIG_SND_SOC_NAU8315 is not set
CONFIG_SND_SOC_NAU8540=y
CONFIG_SND_SOC_NAU8810=y
# CONFIG_SND_SOC_NAU8821 is not set
CONFIG_SND_SOC_NAU8822=y
CONFIG_SND_SOC_NAU8824=y
CONFIG_SND_SOC_TPA6130A2=y
# end of CODEC drivers

CONFIG_SND_SIMPLE_CARD_UTILS=y
CONFIG_SND_SIMPLE_CARD=y
CONFIG_SND_AUDIO_GRAPH_CARD=y
# CONFIG_SND_AUDIO_GRAPH_CARD2 is not set
# CONFIG_SND_TEST_COMPONENT is not set
# CONFIG_SND_VIRTIO is not set

#
# HID support
#
CONFIG_HID=y
CONFIG_HID_BATTERY_STRENGTH=y
CONFIG_HIDRAW=y
# CONFIG_UHID is not set
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=y
# CONFIG_HID_ACRUX is not set
CONFIG_HID_APPLE=y
CONFIG_HID_AUREAL=y
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=y
# CONFIG_HID_COUGAR is not set
# CONFIG_HID_MACALLY is not set
CONFIG_HID_CMEDIA=y
CONFIG_HID_CYPRESS=y
# CONFIG_HID_DRAGONRISE is not set
CONFIG_HID_EMS_FF=y
# CONFIG_HID_ELECOM is not set
CONFIG_HID_EZKEY=y
CONFIG_HID_GEMBIRD=y
# CONFIG_HID_GFRM is not set
CONFIG_HID_GLORIOUS=y
CONFIG_HID_VIVALDI_COMMON=y
CONFIG_HID_VIVALDI=y
CONFIG_HID_KEYTOUCH=y
CONFIG_HID_KYE=y
CONFIG_HID_WALTOP=y
CONFIG_HID_VIEWSONIC=y
# CONFIG_HID_XIAOMI is not set
CONFIG_HID_GYRATION=y
CONFIG_HID_ICADE=y
# CONFIG_HID_ITE is not set
# CONFIG_HID_JABRA is not set
CONFIG_HID_TWINHAN=y
# CONFIG_HID_KENSINGTON is not set
# CONFIG_HID_LCPOWER is not set
CONFIG_HID_LED=y
# CONFIG_HID_LENOVO is not set
CONFIG_HID_MAGICMOUSE=y
CONFIG_HID_MALTRON=y
# CONFIG_HID_MAYFLASH is not set
CONFIG_HID_REDRAGON=y
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
CONFIG_HID_MULTITOUCH=y
# CONFIG_HID_NINTENDO is not set
CONFIG_HID_NTI=y
CONFIG_HID_ORTEK=y
CONFIG_HID_PANTHERLORD=y
CONFIG_PANTHERLORD_FF=y
CONFIG_HID_PETALYNX=y
CONFIG_HID_PICOLCD=y
# CONFIG_HID_PICOLCD_FB is not set
CONFIG_HID_PICOLCD_BACKLIGHT=y
CONFIG_HID_PICOLCD_LCD=y
CONFIG_HID_PICOLCD_LEDS=y
# CONFIG_HID_PICOLCD_CIR is not set
CONFIG_HID_PLANTRONICS=y
# CONFIG_HID_PLAYSTATION is not set
# CONFIG_HID_RAZER is not set
CONFIG_HID_PRIMAX=y
CONFIG_HID_SAITEK=y
# CONFIG_HID_SEMITEK is not set
CONFIG_HID_SPEEDLINK=y
# CONFIG_HID_STEAM is not set
CONFIG_HID_STEELSERIES=y
# CONFIG_HID_SUNPLUS is not set
# CONFIG_HID_RMI is not set
# CONFIG_HID_GREENASIA is not set
CONFIG_HID_SMARTJOYPLUS=y
# CONFIG_SMARTJOYPLUS_FF is not set
CONFIG_HID_TIVO=y
# CONFIG_HID_TOPSEED is not set
CONFIG_HID_THINGM=y
# CONFIG_HID_UDRAW_PS3 is not set
CONFIG_HID_WIIMOTE=y
CONFIG_HID_XINMO=y
CONFIG_HID_ZEROPLUS=y
# CONFIG_ZEROPLUS_FF is not set
CONFIG_HID_ZYDACRON=y
# CONFIG_HID_SENSOR_HUB is not set
CONFIG_HID_ALPS=y
# end of Special HID drivers

#
# USB HID support
#
# CONFIG_USB_HID is not set
# CONFIG_HID_PID is not set

#
# USB HID Boot Protocol drivers
#
CONFIG_USB_KBD=y
CONFIG_USB_MOUSE=y
# end of USB HID Boot Protocol drivers
# end of USB HID support

#
# I2C HID support
#
# CONFIG_I2C_HID_OF is not set
# CONFIG_I2C_HID_OF_GOODIX is not set
# end of I2C HID support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
CONFIG_USB_ULPI_BUS=y
CONFIG_USB_CONN_GPIO=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEFAULT_PERSIST=y
CONFIG_USB_FEW_INIT_RETRIES=y
CONFIG_USB_DYNAMIC_MINORS=y
# CONFIG_USB_OTG_PRODUCTLIST is not set
# CONFIG_USB_OTG_DISABLE_EXTERNAL_HUB is not set
CONFIG_USB_AUTOSUSPEND_DELAY=2
CONFIG_USB_MON=y

#
# USB Host Controller Drivers
#
CONFIG_USB_C67X00_HCD=y
# CONFIG_USB_XHCI_HCD is not set
# CONFIG_USB_BRCMSTB is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_EHCI_FSL=y
# CONFIG_USB_EHCI_HCD_NPCM7XX is not set
CONFIG_USB_EHCI_HCD_OMAP=y
# CONFIG_USB_EHCI_HCD_ORION is not set
# CONFIG_USB_EHCI_HCD_SPEAR is not set
# CONFIG_USB_EHCI_HCD_STI is not set
# CONFIG_USB_EHCI_HCD_AT91 is not set
# CONFIG_USB_EHCI_SH is not set
# CONFIG_USB_EHCI_EXYNOS is not set
# CONFIG_USB_EHCI_MV is not set
# CONFIG_USB_CNS3XXX_EHCI is not set
CONFIG_USB_EHCI_HCD_PLATFORM=y
CONFIG_USB_OXU210HP_HCD=y
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1362_HCD is not set
# CONFIG_USB_FOTG210_HCD is not set
# CONFIG_USB_OHCI_HCD is not set
# CONFIG_USB_U132_HCD is not set
CONFIG_USB_SL811_HCD=y
# CONFIG_USB_SL811_HCD_ISO is not set
# CONFIG_USB_R8A66597_HCD is not set
CONFIG_USB_HCD_BCMA=y
# CONFIG_USB_HCD_SSB is not set
# CONFIG_USB_HCD_TEST_MODE is not set
# CONFIG_USB_RENESAS_USBHS is not set

#
# USB Device Class drivers
#
# CONFIG_USB_PRINTER is not set
CONFIG_USB_WDM=y
CONFIG_USB_TMC=y

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=y
# CONFIG_USB_STORAGE_DEBUG is not set
CONFIG_USB_STORAGE_REALTEK=y
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_USBAT=y
# CONFIG_USB_STORAGE_SDDR09 is not set
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
# CONFIG_USB_STORAGE_ALAUDA is not set
CONFIG_USB_STORAGE_ONETOUCH=y
# CONFIG_USB_STORAGE_KARMA is not set
CONFIG_USB_STORAGE_CYPRESS_ATACB=y
# CONFIG_USB_STORAGE_ENE_UB6250 is not set
# CONFIG_USB_UAS is not set

#
# USB Imaging devices
#
CONFIG_USB_MDC800=y
CONFIG_USB_MICROTEK=y
# CONFIG_USB_CDNS_SUPPORT is not set
# CONFIG_USB_MTU3 is not set
CONFIG_USB_MUSB_HDRC=y
# CONFIG_USB_MUSB_HOST is not set
# CONFIG_USB_MUSB_GADGET is not set
CONFIG_USB_MUSB_DUAL_ROLE=y

#
# Platform Glue Layer
#
# CONFIG_USB_MUSB_TUSB6010 is not set
# CONFIG_USB_MUSB_DSPS is not set
# CONFIG_USB_MUSB_UX500 is not set
# CONFIG_USB_MUSB_MEDIATEK is not set

#
# MUSB DMA mode
#
CONFIG_MUSB_PIO_ONLY=y
# CONFIG_USB_DWC3 is not set
# CONFIG_USB_DWC2 is not set
CONFIG_USB_CHIPIDEA=y
# CONFIG_USB_CHIPIDEA_UDC is not set
# CONFIG_USB_CHIPIDEA_HOST is not set
CONFIG_USB_CHIPIDEA_MSM=y
CONFIG_USB_CHIPIDEA_IMX=y
CONFIG_USB_CHIPIDEA_GENERIC=y
CONFIG_USB_CHIPIDEA_TEGRA=y
# CONFIG_USB_ISP1760 is not set

#
# USB port drivers
#

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
CONFIG_USB_EMI26=y
CONFIG_USB_ADUTUX=y
CONFIG_USB_SEVSEG=y
CONFIG_USB_LEGOTOWER=y
CONFIG_USB_LCD=y
CONFIG_USB_CYPRESS_CY7C63=y
CONFIG_USB_CYTHERM=y
CONFIG_USB_IDMOUSE=y
CONFIG_USB_FTDI_ELAN=y
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_QCOM_EUD is not set
CONFIG_APPLE_MFI_FASTCHARGE=y
CONFIG_USB_SISUSBVGA=y
# CONFIG_USB_LD is not set
CONFIG_USB_TRANCEVIBRATOR=y
# CONFIG_USB_IOWARRIOR is not set
CONFIG_USB_TEST=y
CONFIG_USB_EHSET_TEST_FIXTURE=y
CONFIG_USB_ISIGHTFW=y
# CONFIG_USB_YUREX is not set
CONFIG_USB_EZUSB_FX2=y
# CONFIG_USB_HUB_USB251XB is not set
# CONFIG_USB_HSIC_USB3503 is not set
# CONFIG_USB_HSIC_USB4604 is not set
# CONFIG_USB_LINK_LAYER_TEST is not set
# CONFIG_USB_CHAOSKEY is not set
# CONFIG_BRCM_USB_PINMAP is not set

#
# USB Physical Layer drivers
#
CONFIG_USB_PHY=y
# CONFIG_KEYSTONE_USB_PHY is not set
CONFIG_NOP_USB_XCEIV=y
# CONFIG_AM335X_PHY_USB is not set
CONFIG_USB_GPIO_VBUS=y
CONFIG_TAHVO_USB=y
CONFIG_TAHVO_USB_HOST_BY_DEFAULT=y
CONFIG_USB_ISP1301=y
# CONFIG_USB_TEGRA_PHY is not set
# CONFIG_USB_ULPI is not set
# CONFIG_JZ4770_PHY is not set
# end of USB Physical Layer drivers

CONFIG_USB_GADGET=y
CONFIG_USB_GADGET_DEBUG=y
CONFIG_USB_GADGET_VERBOSE=y
# CONFIG_USB_GADGET_DEBUG_FILES is not set
CONFIG_USB_GADGET_DEBUG_FS=y
CONFIG_USB_GADGET_VBUS_DRAW=2
CONFIG_USB_GADGET_STORAGE_NUM_BUFFERS=2

#
# USB Peripheral Controller
#
# CONFIG_USB_LPC32XX is not set
CONFIG_USB_FUSB300=y
# CONFIG_USB_FOTG210_UDC is not set
# CONFIG_USB_GR_UDC is not set
CONFIG_USB_R8A66597=y
# CONFIG_USB_RENESAS_USB3 is not set
# CONFIG_USB_PXA27X is not set
CONFIG_USB_MV_UDC=y
CONFIG_USB_MV_U3D=y
# CONFIG_USB_SNP_UDC_PLAT is not set
# CONFIG_USB_M66592 is not set
CONFIG_USB_BDC_UDC=y
CONFIG_USB_NET2272=y
CONFIG_USB_NET2272_DMA=y
CONFIG_USB_GADGET_XILINX=y
# CONFIG_USB_ASPEED_VHUB is not set
CONFIG_USB_DUMMY_HCD=y
# end of USB Peripheral Controller

CONFIG_USB_LIBCOMPOSITE=y
CONFIG_USB_F_MASS_STORAGE=y
CONFIG_USB_F_FS=y
CONFIG_USB_F_UAC1_LEGACY=y
CONFIG_USB_F_MIDI=y
CONFIG_USB_F_HID=y
CONFIG_USB_F_TCM=y
# CONFIG_USB_CONFIGFS is not set

#
# USB Gadget precomposed configurations
#
# CONFIG_USB_ZERO is not set
CONFIG_USB_AUDIO=y
CONFIG_GADGET_UAC1=y
CONFIG_GADGET_UAC1_LEGACY=y
CONFIG_USB_GADGETFS=y
CONFIG_USB_FUNCTIONFS=y
CONFIG_USB_FUNCTIONFS_GENERIC=y
CONFIG_USB_MASS_STORAGE=y
CONFIG_USB_GADGET_TARGET=y
CONFIG_USB_MIDI_GADGET=y
# CONFIG_USB_G_PRINTER is not set
CONFIG_USB_G_HID=y
# CONFIG_USB_RAW_GADGET is not set
# end of USB Gadget precomposed configurations

# CONFIG_TYPEC is not set
CONFIG_USB_ROLE_SWITCH=y
CONFIG_MMC=y
CONFIG_PWRSEQ_EMMC=y
CONFIG_PWRSEQ_SIMPLE=y
# CONFIG_MMC_BLOCK is not set
# CONFIG_MMC_TEST is not set
# CONFIG_MMC_CRYPTO is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_DEBUG is not set
# CONFIG_MMC_SDHCI is not set
# CONFIG_MMC_MOXART is not set
# CONFIG_MMC_OMAP_HS is not set
# CONFIG_MMC_DAVINCI is not set
# CONFIG_MMC_S3C is not set
# CONFIG_MMC_TMIO is not set
# CONFIG_MMC_SDHI is not set
# CONFIG_MMC_UNIPHIER is not set
# CONFIG_MMC_DW is not set
# CONFIG_MMC_SH_MMCIF is not set
CONFIG_MMC_VUB300=y
# CONFIG_MMC_USHC is not set
CONFIG_MMC_USDHI6ROL0=y
CONFIG_MMC_CQHCI=y
CONFIG_MMC_HSQ=y
# CONFIG_MMC_BCM2835 is not set
# CONFIG_MMC_OWL is not set
# CONFIG_MMC_LITEX is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_CLASS_FLASH=y
CONFIG_LEDS_CLASS_MULTICOLOR=y
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set

#
# LED drivers
#
CONFIG_LEDS_AN30259A=y
# CONFIG_LEDS_ARIEL is not set
# CONFIG_LEDS_AW2013 is not set
# CONFIG_LEDS_BCM6328 is not set
CONFIG_LEDS_BCM6358=y
# CONFIG_LEDS_TURRIS_OMNIA is not set
CONFIG_LEDS_LM3530=y
CONFIG_LEDS_LM3532=y
CONFIG_LEDS_LM3642=y
CONFIG_LEDS_LM3692X=y
CONFIG_LEDS_MT6323=y
# CONFIG_LEDS_S3C24XX is not set
# CONFIG_LEDS_COBALT_QUBE is not set
# CONFIG_LEDS_COBALT_RAQ is not set
CONFIG_LEDS_PCA9532=y
CONFIG_LEDS_PCA9532_GPIO=y
CONFIG_LEDS_GPIO=y
# CONFIG_LEDS_LP3944 is not set
CONFIG_LEDS_LP3952=y
CONFIG_LEDS_LP50XX=y
# CONFIG_LEDS_LP55XX_COMMON is not set
CONFIG_LEDS_LP8860=y
CONFIG_LEDS_PCA955X=y
CONFIG_LEDS_PCA955X_GPIO=y
# CONFIG_LEDS_PCA963X is not set
CONFIG_LEDS_WM8350=y
CONFIG_LEDS_PWM=y
# CONFIG_LEDS_REGULATOR is not set
CONFIG_LEDS_BD2802=y
CONFIG_LEDS_LT3593=y
CONFIG_LEDS_ADP5520=y
CONFIG_LEDS_MC13783=y
CONFIG_LEDS_NS2=y
CONFIG_LEDS_NETXBIG=y
CONFIG_LEDS_TCA6507=y
CONFIG_LEDS_TLC591XX=y
CONFIG_LEDS_MAX8997=y
CONFIG_LEDS_LM355x=y
# CONFIG_LEDS_OT200 is not set
CONFIG_LEDS_MENF21BMC=y
# CONFIG_LEDS_IS31FL319X is not set
CONFIG_LEDS_IS31FL32XX=y

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=y
CONFIG_LEDS_MLXREG=y
CONFIG_LEDS_USER=y
CONFIG_LEDS_TI_LMU_COMMON=y
# CONFIG_LEDS_LM3697 is not set
# CONFIG_LEDS_LM36274 is not set
# CONFIG_LEDS_IP30 is not set

#
# Flash and Torch LED drivers
#
# CONFIG_LEDS_AAT1290 is not set
# CONFIG_LEDS_AS3645A is not set
CONFIG_LEDS_KTD2692=y
CONFIG_LEDS_LM3601X=y
CONFIG_LEDS_MAX77693=y
# CONFIG_LEDS_RT4505 is not set
# CONFIG_LEDS_RT8515 is not set
CONFIG_LEDS_SGM3140=y

#
# RGB LED drivers
#
# CONFIG_LEDS_PWM_MULTICOLOR is not set

#
# LED Triggers
#
# CONFIG_LEDS_TRIGGERS is not set

#
# Simple LED drivers
#
CONFIG_ACCESSIBILITY=y

#
# Speakup console speech
#
# end of Speakup console speech

# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
CONFIG_SW_SYNC=y
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
# CONFIG_DMABUF_DEBUG is not set
CONFIG_DMABUF_SELFTESTS=y
# CONFIG_DMABUF_HEAPS is not set
CONFIG_DMABUF_SYSFS_STATS=y
# end of DMABUF options

# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=y
CONFIG_UIO_PDRV_GENIRQ=y
# CONFIG_UIO_DMEM_GENIRQ is not set
CONFIG_UIO_PRUSS=y
# CONFIG_VFIO is not set
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_MENU=y
# CONFIG_VIRTIO_BALLOON is not set
# CONFIG_VIRTIO_INPUT is not set
CONFIG_VIRTIO_MMIO=y
# CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES is not set
CONFIG_VHOST_IOTLB=y
CONFIG_VHOST=y
CONFIG_VHOST_MENU=y
CONFIG_VHOST_SCSI=y
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

CONFIG_GREYBUS=y
CONFIG_GREYBUS_ES2=y
CONFIG_COMEDI=y
CONFIG_COMEDI_DEBUG=y
CONFIG_COMEDI_DEFAULT_BUF_SIZE_KB=2048
CONFIG_COMEDI_DEFAULT_BUF_MAXSIZE_KB=20480
CONFIG_COMEDI_MISC_DRIVERS=y
CONFIG_COMEDI_BOND=y
# CONFIG_COMEDI_TEST is not set
CONFIG_COMEDI_PARPORT=y
# CONFIG_COMEDI_SSV_DNP is not set
# CONFIG_COMEDI_ISA_DRIVERS is not set
CONFIG_COMEDI_USB_DRIVERS=y
CONFIG_COMEDI_DT9812=y
CONFIG_COMEDI_NI_USB6501=y
# CONFIG_COMEDI_USBDUX is not set
CONFIG_COMEDI_USBDUXFAST=y
CONFIG_COMEDI_USBDUXSIGMA=y
# CONFIG_COMEDI_VMK80XX is not set
# CONFIG_COMEDI_8255_SA is not set
CONFIG_COMEDI_KCOMEDILIB=y
# CONFIG_COMEDI_TESTS is not set
CONFIG_STAGING=y
# CONFIG_USB_EMXX is not set
CONFIG_STAGING_MEDIA=y
# CONFIG_VIDEO_SUNXI is not set
CONFIG_MOST_COMPONENTS=y
# CONFIG_MOST_DIM2 is not set
CONFIG_MOST_I2C=y
CONFIG_GREYBUS_AUDIO=y
CONFIG_GREYBUS_AUDIO_APB_CODEC=y
CONFIG_GREYBUS_BOOTROM=y
CONFIG_GREYBUS_HID=y
# CONFIG_GREYBUS_LIGHT is not set
CONFIG_GREYBUS_LOG=y
# CONFIG_GREYBUS_LOOPBACK is not set
CONFIG_GREYBUS_POWER=y
CONFIG_GREYBUS_RAW=y
CONFIG_GREYBUS_VIBRATOR=y
CONFIG_GREYBUS_BRIDGED_PHY=y
CONFIG_GREYBUS_GPIO=y
# CONFIG_GREYBUS_I2C is not set
CONFIG_GREYBUS_PWM=y
# CONFIG_GREYBUS_SDIO is not set
# CONFIG_GREYBUS_USB is not set
# CONFIG_GREYBUS_ARCHE is not set
CONFIG_BCM_VIDEOCORE=y
# CONFIG_BCM2835_VCHIQ is not set
# CONFIG_SND_BCM2835 is not set
# CONFIG_XIL_AXIS_FIFO is not set
CONFIG_FIELDBUS_DEV=y
CONFIG_HMS_ANYBUSS_BUS=y
# CONFIG_ARCX_ANYBUS_CONTROLLER is not set
# CONFIG_HMS_PROFINET is not set

#
# VME Device Drivers
#
CONFIG_GOLDFISH=y
CONFIG_GOLDFISH_PIPE=y
# CONFIG_CHROME_PLATFORMS is not set
# CONFIG_MELLANOX_PLATFORM is not set
# CONFIG_OLPC_XO175 is not set
CONFIG_SURFACE_PLATFORMS=y
# CONFIG_COMMON_CLK is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
# CONFIG_BCM2835_TIMER is not set
# CONFIG_BCM_KONA_TIMER is not set
# CONFIG_DAVINCI_TIMER is not set
# CONFIG_DIGICOLOR_TIMER is not set
# CONFIG_DW_APB_TIMER is not set
# CONFIG_FTTMR010_TIMER is not set
# CONFIG_IXP4XX_TIMER is not set
# CONFIG_MESON6_TIMER is not set
# CONFIG_OWL_TIMER is not set
# CONFIG_RDA_TIMER is not set
# CONFIG_SUN4I_TIMER is not set
# CONFIG_TEGRA_TIMER is not set
# CONFIG_VT8500_TIMER is not set
# CONFIG_NPCM7XX_TIMER is not set
# CONFIG_ASM9260_TIMER is not set
# CONFIG_CLKSRC_DBX500_PRCMU is not set
# CONFIG_CLPS711X_TIMER is not set
# CONFIG_MXS_TIMER is not set
# CONFIG_NSPIRE_TIMER is not set
# CONFIG_INTEGRATOR_AP_TIMER is not set
# CONFIG_CLKSRC_PISTACHIO is not set
# CONFIG_CLKSRC_STM32_LP is not set
# CONFIG_ARMV7M_SYSTICK is not set
# CONFIG_ATMEL_PIT is not set
# CONFIG_ATMEL_ST is not set
# CONFIG_CLKSRC_SAMSUNG_PWM is not set
# CONFIG_FSL_FTM_TIMER is not set
# CONFIG_OXNAS_RPS_TIMER is not set
# CONFIG_MTK_TIMER is not set
# CONFIG_SPRD_TIMER is not set
# CONFIG_CLKSRC_JCORE_PIT is not set
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_RENESAS_OSTM is not set
# CONFIG_SH_TIMER_TMU is not set
# CONFIG_EM_TIMER_STI is not set
# CONFIG_CLKSRC_PXA is not set
# CONFIG_TIMER_IMX_SYS_CTR is not set
# CONFIG_CLKSRC_ST_LPC is not set
# CONFIG_GXP_TIMER is not set
# CONFIG_MSC313E_TIMER is not set
# CONFIG_MICROCHIP_PIT64B is not set
# end of Clock Source drivers

CONFIG_MAILBOX=y
# CONFIG_IMX_MBOX is not set
CONFIG_PLATFORM_MHU=y
# CONFIG_ARMADA_37XX_RWTM_MBOX is not set
# CONFIG_ROCKCHIP_MBOX is not set
# CONFIG_ALTERA_MBOX is not set
# CONFIG_HI3660_MBOX is not set
# CONFIG_HI6220_MBOX is not set
CONFIG_MAILBOX_TEST=y
# CONFIG_POLARFIRE_SOC_MAILBOX is not set
# CONFIG_QCOM_APCS_IPC is not set
# CONFIG_BCM_PDC_MBOX is not set
# CONFIG_STM32_IPCC is not set
# CONFIG_MTK_ADSP_MBOX is not set
# CONFIG_MTK_CMDQ_MBOX is not set
# CONFIG_SUN6I_MSGBOX is not set
# CONFIG_SPRD_MBOX is not set
# CONFIG_QCOM_IPCC is not set
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# CONFIG_IOMMU_IO_PGTABLE_ARMV7S is not set
# end of Generic IOMMU Pagetable Support

CONFIG_IOMMU_DEBUGFS=y
# CONFIG_OMAP_IOMMU is not set
# CONFIG_ROCKCHIP_IOMMU is not set
# CONFIG_SUN50I_IOMMU is not set
# CONFIG_EXYNOS_IOMMU is not set
# CONFIG_S390_CCW_IOMMU is not set
# CONFIG_S390_AP_IOMMU is not set
# CONFIG_MTK_IOMMU is not set
# CONFIG_SPRD_IOMMU is not set

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
CONFIG_RPMSG=y
# CONFIG_RPMSG_CTRL is not set
# CONFIG_RPMSG_NS is not set
CONFIG_RPMSG_QCOM_GLINK=y
CONFIG_RPMSG_QCOM_GLINK_RPM=y
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

CONFIG_SOUNDWIRE=y

#
# SoundWire Devices
#
CONFIG_SOUNDWIRE_QCOM=y

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# CONFIG_MESON_CANVAS is not set
# CONFIG_MESON_CLK_MEASURE is not set
# CONFIG_MESON_GX_SOCINFO is not set
# CONFIG_MESON_MX_SOCINFO is not set
# end of Amlogic SoC drivers

#
# Apple SoC drivers
#
# CONFIG_APPLE_RTKIT is not set
# CONFIG_APPLE_SART is not set
# end of Apple SoC drivers

#
# ASPEED SoC drivers
#
# CONFIG_ASPEED_LPC_CTRL is not set
# CONFIG_ASPEED_LPC_SNOOP is not set
# CONFIG_ASPEED_UART_ROUTING is not set
# CONFIG_ASPEED_P2A_CTRL is not set
# CONFIG_ASPEED_SOCINFO is not set
# end of ASPEED SoC drivers

# CONFIG_AT91_SOC_ID is not set
# CONFIG_AT91_SOC_SFR is not set

#
# Broadcom SoC drivers
#
# CONFIG_BCM2835_POWER is not set
# CONFIG_SOC_BCM63XX is not set
# CONFIG_SOC_BRCMSTB is not set
# CONFIG_BCM_PMB is not set
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# CONFIG_QUICC_ENGINE is not set
CONFIG_DPAA2_CONSOLE=y
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# CONFIG_SOC_IMX8M is not set
# end of i.MX SoC drivers

#
# IXP4xx SoC drivers
#
# CONFIG_IXP4XX_QMGR is not set
# CONFIG_IXP4XX_NPE is not set
# end of IXP4xx SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# CONFIG_LITEX_SOC_CONTROLLER is not set
# end of Enable LiteX SoC Builder specific drivers

#
# MediaTek SoC drivers
#
# CONFIG_MTK_CMDQ is not set
# CONFIG_MTK_DEVAPC is not set
# CONFIG_MTK_INFRACFG is not set
# CONFIG_MTK_PMIC_WRAP is not set
# CONFIG_MTK_SCPSYS is not set
# CONFIG_MTK_MMSYS is not set
# end of MediaTek SoC drivers

#
# Qualcomm SoC drivers
#
# CONFIG_QCOM_COMMAND_DB is not set
# CONFIG_QCOM_GENI_SE is not set
# CONFIG_QCOM_GSBI is not set
# CONFIG_QCOM_LLCC is not set
# CONFIG_QCOM_RPMH is not set
# CONFIG_QCOM_SMD_RPM is not set
# CONFIG_QCOM_SPM is not set
# CONFIG_QCOM_WCNSS_CTRL is not set
# end of Qualcomm SoC drivers

# CONFIG_SOC_RENESAS is not set
# CONFIG_ROCKCHIP_GRF is not set
# CONFIG_ROCKCHIP_IODOMAIN is not set
# CONFIG_SOC_SAMSUNG is not set
# CONFIG_SOC_TEGRA20_VOLTAGE_COUPLER is not set
# CONFIG_SOC_TEGRA30_VOLTAGE_COUPLER is not set
CONFIG_SOC_TI=y
# CONFIG_UX500_SOC_ID is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
CONFIG_EXTCON=y

#
# Extcon Device Drivers
#
CONFIG_EXTCON_FSA9480=y
CONFIG_EXTCON_GPIO=y
# CONFIG_EXTCON_MAX14577 is not set
CONFIG_EXTCON_MAX3355=y
# CONFIG_EXTCON_MAX77693 is not set
# CONFIG_EXTCON_MAX8997 is not set
CONFIG_EXTCON_PTN5150=y
# CONFIG_EXTCON_QCOM_SPMI_MISC is not set
# CONFIG_EXTCON_RT8973A is not set
CONFIG_EXTCON_SM5502=y
# CONFIG_EXTCON_USB_GPIO is not set
# CONFIG_EXTCON_USBC_TUSB320 is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
CONFIG_PWM_DEBUG=y
# CONFIG_PWM_ATMEL is not set
# CONFIG_PWM_ATMEL_TCB is not set
# CONFIG_PWM_BCM2835 is not set
# CONFIG_PWM_BERLIN is not set
# CONFIG_PWM_BRCMSTB is not set
# CONFIG_PWM_CLPS711X is not set
# CONFIG_PWM_EP93XX is not set
CONFIG_PWM_FSL_FTM=y
# CONFIG_PWM_HIBVT is not set
# CONFIG_PWM_IMX1 is not set
# CONFIG_PWM_IMX27 is not set
# CONFIG_PWM_INTEL_LGM is not set
CONFIG_PWM_IQS620A=y
CONFIG_PWM_LP3943=y
# CONFIG_PWM_LPC18XX_SCT is not set
# CONFIG_PWM_LPC32XX is not set
# CONFIG_PWM_LPSS_PLATFORM is not set
# CONFIG_PWM_MTK_DISP is not set
# CONFIG_PWM_MEDIATEK is not set
# CONFIG_PWM_MXS is not set
# CONFIG_PWM_OMAP_DMTIMER is not set
# CONFIG_PWM_PCA9685 is not set
# CONFIG_PWM_PXA is not set
# CONFIG_PWM_RASPBERRYPI_POE is not set
# CONFIG_PWM_RCAR is not set
# CONFIG_PWM_RENESAS_TPU is not set
# CONFIG_PWM_ROCKCHIP is not set
# CONFIG_PWM_SAMSUNG is not set
# CONFIG_PWM_SL28CPLD is not set
# CONFIG_PWM_SPEAR is not set
# CONFIG_PWM_SPRD is not set
# CONFIG_PWM_STI is not set
# CONFIG_PWM_STM32 is not set
# CONFIG_PWM_STM32_LP is not set
CONFIG_PWM_STMPE=y
# CONFIG_PWM_SUNPLUS is not set
# CONFIG_PWM_TEGRA is not set
# CONFIG_PWM_TIECAP is not set
# CONFIG_PWM_TIEHRPWM is not set
# CONFIG_PWM_VISCONTI is not set
# CONFIG_PWM_VT8500 is not set

#
# IRQ chip support
#
CONFIG_IRQCHIP=y
# CONFIG_AL_FIC is not set
CONFIG_MADERA_IRQ=y
# CONFIG_JCORE_AIC is not set
# CONFIG_RENESAS_INTC_IRQPIN is not set
# CONFIG_RENESAS_IRQC is not set
# CONFIG_RENESAS_RZA1_IRQC is not set
# CONFIG_SL28CPLD_INTC is not set
# CONFIG_TS4800_IRQ is not set
# CONFIG_INGENIC_TCU_IRQ is not set
# CONFIG_IRQ_UNIPHIER_AIDET is not set
# CONFIG_MESON_IRQ_GPIO is not set
# CONFIG_IMX_IRQSTEER is not set
# CONFIG_IMX_INTMUX is not set
# CONFIG_EXYNOS_IRQ_COMBINER is not set
# CONFIG_LOONGSON_PCH_PIC is not set
# CONFIG_MST_IRQ is not set
# CONFIG_MCHP_EIC is not set
# end of IRQ chip support

CONFIG_IPACK_BUS=y
CONFIG_RESET_CONTROLLER=y
# CONFIG_RESET_ATH79 is not set
# CONFIG_RESET_AXS10X is not set
# CONFIG_RESET_BCM6345 is not set
# CONFIG_RESET_BERLIN is not set
# CONFIG_RESET_BRCMSTB is not set
# CONFIG_RESET_BRCMSTB_RESCAL is not set
# CONFIG_RESET_HSDK is not set
# CONFIG_RESET_IMX7 is not set
# CONFIG_RESET_INTEL_GW is not set
# CONFIG_RESET_K210 is not set
# CONFIG_RESET_LANTIQ is not set
# CONFIG_RESET_LPC18XX is not set
# CONFIG_RESET_MCHP_SPARX5 is not set
# CONFIG_RESET_MESON is not set
# CONFIG_RESET_MESON_AUDIO_ARB is not set
# CONFIG_RESET_NPCM is not set
# CONFIG_RESET_PISTACHIO is not set
# CONFIG_RESET_QCOM_AOSS is not set
# CONFIG_RESET_QCOM_PDC is not set
# CONFIG_RESET_RASPBERRYPI is not set
# CONFIG_RESET_RZG2L_USBPHY_CTRL is not set
# CONFIG_RESET_SCMI is not set
# CONFIG_RESET_SIMPLE is not set
# CONFIG_RESET_SOCFPGA is not set
# CONFIG_RESET_STARFIVE_JH7100 is not set
# CONFIG_RESET_SUNXI is not set
# CONFIG_RESET_TI_SCI is not set
# CONFIG_RESET_TI_SYSCON is not set
# CONFIG_RESET_TN48M_CPLD is not set
# CONFIG_RESET_UNIPHIER_GLUE is not set
# CONFIG_RESET_ZYNQ is not set
# CONFIG_COMMON_RESET_HI3660 is not set
# CONFIG_COMMON_RESET_HI6220 is not set

#
# PHY Subsystem
#
CONFIG_GENERIC_PHY=y
CONFIG_GENERIC_PHY_MIPI_DPHY=y
# CONFIG_PHY_PISTACHIO_USB is not set
# CONFIG_PHY_XGENE is not set
# CONFIG_USB_LGM_PHY is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set
# CONFIG_PHY_SUN4I_USB is not set
# CONFIG_PHY_SUN9I_USB is not set
# CONFIG_PHY_SUN50I_USB3 is not set
# CONFIG_PHY_MESON8_HDMI_TX is not set
# CONFIG_PHY_MESON8B_USB2 is not set
# CONFIG_PHY_MESON_GXL_USB2 is not set
# CONFIG_PHY_MESON_G12A_USB2 is not set
# CONFIG_PHY_MESON_G12A_USB3_PCIE is not set
# CONFIG_PHY_MESON_AXG_PCIE is not set
# CONFIG_PHY_MESON_AXG_MIPI_PCIE_ANALOG is not set
# CONFIG_PHY_MESON_AXG_MIPI_DPHY is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_PHY_BCM63XX_USBH is not set
# CONFIG_PHY_CYGNUS_PCIE is not set
# CONFIG_PHY_BCM_SR_USB is not set
CONFIG_BCM_KONA_USB2_PHY=y
# CONFIG_PHY_BCM_NS_USB2 is not set
# CONFIG_PHY_NS2_USB_DRD is not set
# CONFIG_PHY_BRCM_SATA is not set
# CONFIG_PHY_BRCM_USB is not set
# CONFIG_PHY_BCM_SR_PCIE is not set
# end of PHY drivers for Broadcom platforms

CONFIG_PHY_CADENCE_DPHY=y
# CONFIG_PHY_CADENCE_DPHY_RX is not set
CONFIG_PHY_CADENCE_SALVO=y
# CONFIG_PHY_FSL_IMX8MQ_USB is not set
# CONFIG_PHY_MIXEL_MIPI_DPHY is not set
# CONFIG_PHY_FSL_IMX8M_PCIE is not set
# CONFIG_PHY_FSL_LYNX_28G is not set
# CONFIG_PHY_HI6220_USB is not set
# CONFIG_PHY_HI3660_USB is not set
# CONFIG_PHY_HI3670_USB is not set
# CONFIG_PHY_HI3670_PCIE is not set
# CONFIG_PHY_HISTB_COMBPHY is not set
# CONFIG_PHY_HISI_INNO_USB2 is not set
# CONFIG_PHY_INGENIC_USB is not set
# CONFIG_PHY_LANTIQ_VRX200_PCIE is not set
# CONFIG_PHY_LANTIQ_RCU_USB2 is not set
# CONFIG_ARMADA375_USBCLUSTER_PHY is not set
# CONFIG_PHY_BERLIN_SATA is not set
# CONFIG_PHY_BERLIN_USB is not set
CONFIG_PHY_MVEBU_A3700_UTMI=y
# CONFIG_PHY_MVEBU_A38X_COMPHY is not set
# CONFIG_PHY_MVEBU_CP110_UTMI is not set
CONFIG_PHY_PXA_28NM_HSIC=y
CONFIG_PHY_PXA_28NM_USB2=y
# CONFIG_PHY_PXA_USB is not set
# CONFIG_PHY_MMP3_USB is not set
# CONFIG_PHY_MMP3_HSIC is not set
# CONFIG_PHY_MTK_TPHY is not set
# CONFIG_PHY_MTK_UFS is not set
# CONFIG_PHY_MTK_XSPHY is not set
# CONFIG_PHY_SPARX5_SERDES is not set
CONFIG_PHY_MAPPHONE_MDM6600=y
CONFIG_PHY_ATH79_USB=y
# CONFIG_PHY_QCOM_IPQ4019_USB is not set
# CONFIG_PHY_QCOM_QUSB2 is not set
CONFIG_PHY_QCOM_USB_HS=y
# CONFIG_PHY_QCOM_USB_SNPS_FEMTO_V2 is not set
CONFIG_PHY_QCOM_USB_HSIC=y
# CONFIG_PHY_QCOM_USB_HS_28NM is not set
# CONFIG_PHY_QCOM_USB_SS is not set
# CONFIG_PHY_QCOM_IPQ806X_USB is not set
# CONFIG_PHY_MT7621_PCI is not set
# CONFIG_PHY_RALINK_USB is not set
# CONFIG_PHY_RCAR_GEN3_USB3 is not set
# CONFIG_PHY_ROCKCHIP_DPHY_RX0 is not set
# CONFIG_PHY_ROCKCHIP_INNO_CSIDPHY is not set
# CONFIG_PHY_ROCKCHIP_INNO_DSIDPHY is not set
# CONFIG_PHY_ROCKCHIP_PCIE is not set
# CONFIG_PHY_ROCKCHIP_TYPEC is not set
# CONFIG_PHY_EXYNOS_DP_VIDEO is not set
# CONFIG_PHY_EXYNOS_MIPI_VIDEO is not set
# CONFIG_PHY_EXYNOS_PCIE is not set
# CONFIG_PHY_SAMSUNG_UFS is not set
# CONFIG_PHY_SAMSUNG_USB2 is not set
# CONFIG_PHY_UNIPHIER_USB2 is not set
# CONFIG_PHY_UNIPHIER_USB3 is not set
# CONFIG_PHY_UNIPHIER_PCIE is not set
CONFIG_PHY_UNIPHIER_AHCI=y
# CONFIG_PHY_ST_SPEAR1310_MIPHY is not set
# CONFIG_PHY_ST_SPEAR1340_MIPHY is not set
# CONFIG_PHY_STIH407_USB is not set
# CONFIG_PHY_TEGRA194_P2U is not set
# CONFIG_PHY_DA8XX_USB is not set
# CONFIG_PHY_DM816X_USB is not set
# CONFIG_OMAP_CONTROL_PHY is not set
# CONFIG_TI_PIPE3 is not set
CONFIG_PHY_TUSB1210=y
# CONFIG_PHY_INTEL_KEEMBAY_EMMC is not set
# CONFIG_PHY_INTEL_KEEMBAY_USB is not set
# CONFIG_PHY_INTEL_LGM_COMBO is not set
# CONFIG_PHY_INTEL_LGM_EMMC is not set
# CONFIG_PHY_INTEL_THUNDERBAY_EMMC is not set
# CONFIG_PHY_XILINX_ZYNQMP is not set
# end of PHY Subsystem

# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# CONFIG_ARM_CCN is not set
# CONFIG_ARM_CMN is not set
# CONFIG_FSL_IMX8_DDR_PMU is not set
# CONFIG_ARM_DMC620_PMU is not set
# end of Performance monitor support

CONFIG_RAS=y

#
# Android
#
CONFIG_ANDROID=y
CONFIG_ANDROID_BINDER_IPC=y
# CONFIG_ANDROID_BINDERFS is not set
CONFIG_ANDROID_BINDER_DEVICES="binder,hwbinder,vndbinder"
# CONFIG_ANDROID_BINDER_IPC_SELFTEST is not set
# end of Android

# CONFIG_DAX is not set
# CONFIG_NVMEM is not set

#
# HW tracing support
#
CONFIG_STM=y
CONFIG_STM_PROTO_BASIC=y
# CONFIG_STM_PROTO_SYS_T is not set
# CONFIG_STM_DUMMY is not set
CONFIG_STM_SOURCE_CONSOLE=y
# CONFIG_STM_SOURCE_HEARTBEAT is not set
CONFIG_STM_SOURCE_FTRACE=y
CONFIG_INTEL_TH=y
CONFIG_INTEL_TH_GTH=y
# CONFIG_INTEL_TH_STH is not set
# CONFIG_INTEL_TH_MSU is not set
CONFIG_INTEL_TH_PTI=y
# CONFIG_INTEL_TH_DEBUG is not set
# end of HW tracing support

CONFIG_FPGA=y
# CONFIG_FPGA_MGR_SOCFPGA is not set
# CONFIG_FPGA_MGR_SOCFPGA_A10 is not set
CONFIG_ALTERA_PR_IP_CORE=y
# CONFIG_ALTERA_PR_IP_CORE_PLAT is not set
# CONFIG_FPGA_MGR_ZYNQ_FPGA is not set
# CONFIG_FPGA_BRIDGE is not set
# CONFIG_FPGA_DFL is not set
# CONFIG_FPGA_MGR_ZYNQMP_FPGA is not set
# CONFIG_FPGA_MGR_VERSAL_FPGA is not set
CONFIG_FSI=y
# CONFIG_FSI_NEW_DEV_NODE is not set
CONFIG_FSI_MASTER_GPIO=y
CONFIG_FSI_MASTER_HUB=y
CONFIG_FSI_MASTER_ASPEED=y
CONFIG_FSI_SCOM=y
CONFIG_FSI_SBEFIFO=y
CONFIG_FSI_OCC=y
# CONFIG_TEE is not set
CONFIG_SIOX=y
# CONFIG_SIOX_BUS_GPIO is not set
# CONFIG_SLIMBUS is not set
CONFIG_INTERCONNECT=y
# CONFIG_INTERCONNECT_IMX is not set
# CONFIG_INTERCONNECT_QCOM_OSM_L3 is not set
# CONFIG_INTERCONNECT_SAMSUNG is not set
CONFIG_COUNTER=y
# CONFIG_104_QUAD_8 is not set
# CONFIG_INTERRUPT_CNT is not set
# CONFIG_STM32_TIMER_CNT is not set
# CONFIG_STM32_LPTIMER_CNT is not set
# CONFIG_TI_EQEP is not set
CONFIG_FTM_QUADDEC=y
CONFIG_MICROCHIP_TCB_CAPTURE=y
CONFIG_MOST=y
CONFIG_MOST_USB_HDM=y
# CONFIG_MOST_CDEV is not set
# CONFIG_MOST_SND is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
CONFIG_VALIDATE_FS_PARSER=y
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
# CONFIG_EXT4_FS_POSIX_ACL is not set
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_DEBUG is not set
CONFIG_EXT4_KUNIT_TESTS=y
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
CONFIG_JFS_FS=y
# CONFIG_JFS_POSIX_ACL is not set
CONFIG_JFS_SECURITY=y
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
CONFIG_XFS_FS=y
# CONFIG_XFS_SUPPORT_V4 is not set
# CONFIG_XFS_QUOTA is not set
CONFIG_XFS_POSIX_ACL=y
# CONFIG_XFS_RT is not set
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_XFS_ONLINE_REPAIR=y
CONFIG_XFS_DEBUG=y
CONFIG_XFS_ASSERT_FATAL=y
CONFIG_GFS2_FS=y
CONFIG_BTRFS_FS=y
# CONFIG_BTRFS_FS_POSIX_ACL is not set
CONFIG_BTRFS_FS_CHECK_INTEGRITY=y
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
CONFIG_NILFS2_FS=y
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
# CONFIG_F2FS_FS_POSIX_ACL is not set
CONFIG_F2FS_FS_SECURITY=y
# CONFIG_F2FS_CHECK_FS is not set
CONFIG_F2FS_FAULT_INJECTION=y
# CONFIG_F2FS_FS_COMPRESSION is not set
CONFIG_F2FS_IOSTAT=y
CONFIG_ZONEFS_FS=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
# CONFIG_EXPORTFS_BLOCK_OPS is not set
# CONFIG_FILE_LOCKING is not set
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_ENCRYPTION_ALGS=y
# CONFIG_FS_ENCRYPTION_INLINE_CRYPT is not set
# CONFIG_FS_VERITY is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
# CONFIG_INOTIFY_USER is not set
CONFIG_FANOTIFY=y
# CONFIG_FANOTIFY_ACCESS_PERMISSIONS is not set
# CONFIG_QUOTA is not set
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS4_FS is not set
# CONFIG_AUTOFS_FS is not set
CONFIG_FUSE_FS=y
# CONFIG_CUSE is not set
CONFIG_VIRTIO_FS=y
CONFIG_OVERLAY_FS=y
CONFIG_OVERLAY_FS_REDIRECT_DIR=y
# CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
# CONFIG_FSCACHE is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
CONFIG_UDF_FS=y
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
CONFIG_FAT_DEFAULT_UTF8=y
# CONFIG_FAT_KUNIT_TEST is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
# CONFIG_PROC_SYSCTL is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_CONFIGFS_FS=y
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
CONFIG_ORANGEFS_FS=y
CONFIG_ADFS_FS=y
# CONFIG_ADFS_FS_RW is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
CONFIG_HFSPLUS_FS=y
# CONFIG_BEFS_FS is not set
CONFIG_BFS_FS=y
# CONFIG_EFS_FS is not set
CONFIG_JFFS2_FS=y
CONFIG_JFFS2_FS_DEBUG=0
CONFIG_JFFS2_FS_WRITEBUFFER=y
CONFIG_JFFS2_FS_WBUF_VERIFY=y
# CONFIG_JFFS2_SUMMARY is not set
CONFIG_JFFS2_FS_XATTR=y
# CONFIG_JFFS2_FS_POSIX_ACL is not set
CONFIG_JFFS2_FS_SECURITY=y
# CONFIG_JFFS2_COMPRESSION_OPTIONS is not set
CONFIG_JFFS2_ZLIB=y
CONFIG_JFFS2_RTIME=y
CONFIG_CRAMFS=y
# CONFIG_CRAMFS_BLOCKDEV is not set
CONFIG_CRAMFS_MTD=y
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
CONFIG_OMFS_FS=y
# CONFIG_HPFS_FS is not set
CONFIG_QNX4FS_FS=y
CONFIG_QNX6FS_FS=y
# CONFIG_QNX6FS_DEBUG is not set
CONFIG_ROMFS_FS=y
# CONFIG_ROMFS_BACKED_BY_BLOCK is not set
CONFIG_ROMFS_BACKED_BY_MTD=y
# CONFIG_ROMFS_BACKED_BY_BOTH is not set
CONFIG_ROMFS_ON_MTD=y
# CONFIG_PSTORE is not set
CONFIG_SYSV_FS=y
CONFIG_UFS_FS=y
CONFIG_UFS_FS_WRITE=y
# CONFIG_UFS_DEBUG is not set
# CONFIG_EROFS_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
CONFIG_NLS_CODEPAGE_737=y
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_CODEPAGE_852=y
CONFIG_NLS_CODEPAGE_855=y
CONFIG_NLS_CODEPAGE_857=y
# CONFIG_NLS_CODEPAGE_860 is not set
CONFIG_NLS_CODEPAGE_861=y
# CONFIG_NLS_CODEPAGE_862 is not set
CONFIG_NLS_CODEPAGE_863=y
CONFIG_NLS_CODEPAGE_864=y
CONFIG_NLS_CODEPAGE_865=y
# CONFIG_NLS_CODEPAGE_866 is not set
CONFIG_NLS_CODEPAGE_869=y
# CONFIG_NLS_CODEPAGE_936 is not set
CONFIG_NLS_CODEPAGE_950=y
CONFIG_NLS_CODEPAGE_932=y
# CONFIG_NLS_CODEPAGE_949 is not set
CONFIG_NLS_CODEPAGE_874=y
# CONFIG_NLS_ISO8859_8 is not set
CONFIG_NLS_CODEPAGE_1250=y
CONFIG_NLS_CODEPAGE_1251=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_2=y
CONFIG_NLS_ISO8859_3=y
# CONFIG_NLS_ISO8859_4 is not set
CONFIG_NLS_ISO8859_5=y
CONFIG_NLS_ISO8859_6=y
CONFIG_NLS_ISO8859_7=y
CONFIG_NLS_ISO8859_9=y
CONFIG_NLS_ISO8859_13=y
CONFIG_NLS_ISO8859_14=y
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_MAC_ROMAN is not set
CONFIG_NLS_MAC_CELTIC=y
CONFIG_NLS_MAC_CENTEURO=y
# CONFIG_NLS_MAC_CROATIAN is not set
# CONFIG_NLS_MAC_CYRILLIC is not set
# CONFIG_NLS_MAC_GAELIC is not set
CONFIG_NLS_MAC_GREEK=y
# CONFIG_NLS_MAC_ICELAND is not set
CONFIG_NLS_MAC_INUIT=y
# CONFIG_NLS_MAC_ROMANIAN is not set
CONFIG_NLS_MAC_TURKISH=y
CONFIG_NLS_UTF8=y
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_REQUEST_CACHE=y
CONFIG_PERSISTENT_KEYRINGS=y
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_USER_DECRYPTED_DATA is not set
CONFIG_KEY_DH_OPERATIONS=y
# CONFIG_KEY_NOTIFICATIONS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
# CONFIG_SECURITY_NETWORK is not set
# CONFIG_SECURITY_PATH is not set
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH="/sbin/usermode-helper"
# CONFIG_SECURITY_LOADPIN is not set
CONFIG_SECURITY_YAMA=y
# CONFIG_SECURITY_SAFESETID is not set
CONFIG_SECURITY_LOCKDOWN_LSM=y
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_LOCK_DOWN_KERNEL_FORCE_NONE=y
# CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY is not set
# CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY is not set
# CONFIG_SECURITY_LANDLOCK is not set
# CONFIG_INTEGRITY is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_STACK_ALL_PATTERN is not set
# CONFIG_INIT_STACK_ALL_ZERO is not set
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_INIT_ON_FREE_DEFAULT_ON=y
# end of Memory initialization

CONFIG_CC_HAS_RANDSTRUCT=y
# CONFIG_RANDSTRUCT_NONE is not set
CONFIG_RANDSTRUCT_FULL=y
CONFIG_RANDSTRUCT=y
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=y
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=y
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=y
# CONFIG_CRYPTO_TEST is not set
CONFIG_CRYPTO_ENGINE=y

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=y
# CONFIG_CRYPTO_DH_RFC7919_GROUPS is not set
CONFIG_CRYPTO_ECC=y
CONFIG_CRYPTO_ECDH=y
# CONFIG_CRYPTO_ECDSA is not set
CONFIG_CRYPTO_ECRDSA=y
CONFIG_CRYPTO_SM2=y
# CONFIG_CRYPTO_CURVE25519 is not set

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=y
CONFIG_CRYPTO_AEGIS128=y
# CONFIG_CRYPTO_SEQIV is not set
CONFIG_CRYPTO_ECHAINIV=y

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_CFB is not set
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
# CONFIG_CRYPTO_OFB is not set
CONFIG_CRYPTO_PCBC=y
CONFIG_CRYPTO_XTS=y
# CONFIG_CRYPTO_KEYWRAP is not set
CONFIG_CRYPTO_NHPOLY1305=y
CONFIG_CRYPTO_ADIANTUM=y
CONFIG_CRYPTO_ESSIV=y

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=y
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=y
# CONFIG_CRYPTO_VMAC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=y
CONFIG_CRYPTO_XXHASH=y
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_BLAKE2S=y
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRC64_ROCKSOFT=y
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_POLY1305=y
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=y
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_SHA1 is not set
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=y
CONFIG_CRYPTO_SM3=y
# CONFIG_CRYPTO_SM3_GENERIC is not set
CONFIG_CRYPTO_STREEBOG=y
CONFIG_CRYPTO_WP512=y

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_TI=y
CONFIG_CRYPTO_BLOWFISH=y
CONFIG_CRYPTO_BLOWFISH_COMMON=y
CONFIG_CRYPTO_CAMELLIA=y
CONFIG_CRYPTO_CAST_COMMON=y
CONFIG_CRYPTO_CAST5=y
CONFIG_CRYPTO_CAST6=y
CONFIG_CRYPTO_DES=y
# CONFIG_CRYPTO_FCRYPT is not set
CONFIG_CRYPTO_CHACHA20=y
CONFIG_CRYPTO_SERPENT=y
# CONFIG_CRYPTO_SM4_GENERIC is not set
# CONFIG_CRYPTO_TWOFISH is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
# CONFIG_CRYPTO_LZO is not set
# CONFIG_CRYPTO_842 is not set
CONFIG_CRYPTO_LZ4=y
CONFIG_CRYPTO_LZ4HC=y
CONFIG_CRYPTO_ZSTD=y

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_KDF800108_CTR=y
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_ALLWINNER is not set
# CONFIG_CRYPTO_DEV_EXYNOS_RNG is not set
# CONFIG_CRYPTO_DEV_S5P is not set
# CONFIG_CRYPTO_DEV_ATMEL_AES is not set
# CONFIG_CRYPTO_DEV_ATMEL_TDES is not set
# CONFIG_CRYPTO_DEV_ATMEL_SHA is not set
CONFIG_CRYPTO_DEV_ATMEL_I2C=y
CONFIG_CRYPTO_DEV_ATMEL_ECC=y
CONFIG_CRYPTO_DEV_ATMEL_SHA204A=y
# CONFIG_CRYPTO_DEV_QCE is not set
# CONFIG_CRYPTO_DEV_QCOM_RNG is not set
# CONFIG_CRYPTO_DEV_IMGTEC_HASH is not set
# CONFIG_CRYPTO_DEV_ZYNQMP_AES is not set
# CONFIG_CRYPTO_DEV_ZYNQMP_SHA3 is not set
CONFIG_CRYPTO_DEV_VIRTIO=y
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_CCREE is not set
# CONFIG_CRYPTO_DEV_HISI_SEC is not set
CONFIG_CRYPTO_DEV_AMLOGIC_GXL=y
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL_DEBUG is not set
# CONFIG_CRYPTO_DEV_SA2UL is not set
# CONFIG_CRYPTO_DEV_KEEMBAY_OCS_AES_SM4 is not set
# CONFIG_CRYPTO_DEV_KEEMBAY_OCS_ECC is not set
# CONFIG_CRYPTO_DEV_KEEMBAY_OCS_HCU is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
# CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE is not set

#
# Certificates for signature checking
#
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
CONFIG_SYSTEM_EXTRA_CERTIFICATE=y
CONFIG_SYSTEM_EXTRA_CERTIFICATE_SIZE=4096
CONFIG_SECONDARY_TRUSTED_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
CONFIG_RAID6_PQ_BENCHMARK=y
CONFIG_LINEAR_RANGES=y
CONFIG_PACKING=y
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_CORDIC=y
CONFIG_PRIME_NUMBERS=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=y
CONFIG_CRYPTO_LIB_CHACHA=y
CONFIG_CRYPTO_LIB_CURVE25519_GENERIC=y
CONFIG_CRYPTO_LIB_CURVE25519=y
CONFIG_CRYPTO_LIB_DES=y
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=y
CONFIG_CRYPTO_LIB_POLY1305=y
CONFIG_CRYPTO_LIB_CHACHA20POLY1305=y
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC64_ROCKSOFT=y
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
CONFIG_CRC32_SELFTEST=y
# CONFIG_CRC32_SLICEBY8 is not set
CONFIG_CRC32_SLICEBY4=y
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC64=y
CONFIG_CRC4=y
CONFIG_CRC7=y
CONFIG_LIBCRC32C=y
CONFIG_CRC8=y
CONFIG_XXHASH=y
CONFIG_RANDOM32_SELFTEST=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=y
CONFIG_LZ4HC_COMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=y
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
# CONFIG_XZ_DEC_X86 is not set
# CONFIG_XZ_DEC_POWERPC is not set
CONFIG_XZ_DEC_IA64=y
# CONFIG_XZ_DEC_ARM is not set
# CONFIG_XZ_DEC_ARMTHUMB is not set
# CONFIG_XZ_DEC_SPARC is not set
# CONFIG_XZ_DEC_MICROLZMA is not set
CONFIG_XZ_DEC_BCJ=y
CONFIG_XZ_DEC_TEST=y
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=y
CONFIG_REED_SOLOMON_ENC16=y
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_DMA_DECLARE_COHERENT=y
CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE=y
CONFIG_DMA_GLOBAL_POOL=y
CONFIG_DMA_API_DEBUG=y
# CONFIG_DMA_API_DEBUG_SG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_GENERIC_ATOMIC64=y
CONFIG_CLZ_TAB=y
# CONFIG_IRQ_POLL is not set
CONFIG_MPILIB=y
CONFIG_LIBFDT=y
CONFIG_OID_REGISTRY=y
CONFIG_SG_POOL=y
CONFIG_STACKDEPOT=y
CONFIG_STACK_HASH_ORDER=20
CONFIG_SBITMAP=y
# CONFIG_PARMAN is not set
# CONFIG_OBJAGG is not set
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_SYMBOLIC_ERRNAME=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_MISC is not set

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_NONE is not set
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_DEBUG_INFO_DWARF5 is not set
CONFIG_DEBUG_INFO_REDUCED=y
CONFIG_DEBUG_INFO_SPLIT=y
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
CONFIG_PAHOLE_HAS_BTF_TAG=y
# CONFIG_GDB_SCRIPTS is not set
CONFIG_FRAME_WARN=1024
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_HEADERS_INSTALL is not set
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# CONFIG_VMLINUX_MAP is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
# CONFIG_MAGIC_SYSRQ is not set
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_UBSAN=y
CONFIG_CC_HAS_UBSAN_BOUNDS=y
CONFIG_CC_HAS_UBSAN_ARRAY_BOUNDS=y
CONFIG_UBSAN_BOUNDS=y
CONFIG_UBSAN_ARRAY_BOUNDS=y
CONFIG_UBSAN_SHIFT=y
# CONFIG_UBSAN_DIV_ZERO is not set
# CONFIG_UBSAN_UNREACHABLE is not set
CONFIG_UBSAN_BOOL=y
CONFIG_UBSAN_ENUM=y
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# CONFIG_NET_DEV_REFCNT_TRACKER is not set
# CONFIG_NET_NS_REFCNT_TRACKER is not set
# end of Networking Debugging

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
CONFIG_DEBUG_PAGE_REF=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_VMACACHE=y
# CONFIG_DEBUG_VM_RB is not set
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# end of Memory Debugging

CONFIG_DEBUG_SHIRQ=y

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
# CONFIG_DETECT_HUNG_TASK is not set
CONFIG_WQ_WATCHDOG=y
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

CONFIG_DEBUG_TIMEKEEPING=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_PROVE_LOCKING is not set
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
CONFIG_DEBUG_LOCKDEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
CONFIG_WW_MUTEX_SELFTEST=y
# CONFIG_SCF_TORTURE_TEST is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_DEBUG_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
CONFIG_DEBUG_KOBJECT=y

#
# Debug kernel data structures
#
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_PLIST is not set
CONFIG_DEBUG_SG=y
CONFIG_DEBUG_NOTIFIERS=y
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# end of Debug kernel data structures

CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_TORTURE_TEST=y
CONFIG_RCU_SCALE_TEST=y
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_EQS_DEBUG=y
# end of RCU Debugging

CONFIG_DEBUG_WQ_FORCE_RR_CPU=y
CONFIG_LATENCYTOP=y
CONFIG_NOP_TRACER=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_BOOTTIME_TRACING is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
CONFIG_HWLAT_TRACER=y
# CONFIG_OSNOISE_TRACER is not set
# CONFIG_TIMERLAT_TRACER is not set
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_TRACE_BRANCH_PROFILING=y
# CONFIG_BRANCH_PROFILE_NONE is not set
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
CONFIG_PROFILE_ALL_BRANCHES=y
# CONFIG_BRANCH_TRACER is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_DYNAMIC_EVENTS=y
CONFIG_SYNTH_EVENTS=y
# CONFIG_USER_EVENTS is not set
# CONFIG_TRACE_EVENT_INJECT is not set
CONFIG_TRACEPOINT_BENCHMARK=y
CONFIG_RING_BUFFER_BENCHMARK=y
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_GCOV_PROFILE_FTRACE is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
CONFIG_RING_BUFFER_STARTUP_TEST=y
# CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
# CONFIG_SYNTH_EVENT_GEN_TEST is not set
# CONFIG_SAMPLES is not set

#
# hexagon Debugging
#
# end of hexagon Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=y
CONFIG_KUNIT_DEBUGFS=y
CONFIG_KUNIT_TEST=y
CONFIG_KUNIT_EXAMPLE_TEST=y
# CONFIG_KUNIT_ALL_TESTS is not set
CONFIG_NOTIFIER_ERROR_INJECTION=y
# CONFIG_FAULT_INJECTION is not set
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
CONFIG_RUNTIME_TESTING_MENU=y
# CONFIG_LKDTM is not set
CONFIG_TEST_LIST_SORT=y
CONFIG_TEST_MIN_HEAP=y
# CONFIG_TEST_SORT is not set
# CONFIG_TEST_DIV64 is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_TEST_REF_TRACKER is not set
CONFIG_RBTREE_TEST=y
CONFIG_REED_SOLOMON_TEST=y
# CONFIG_INTERVAL_TREE_TEST is not set
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_TEST_HEXDUMP is not set
CONFIG_STRING_SELFTEST=y
CONFIG_TEST_STRING_HELPERS=y
CONFIG_TEST_STRSCPY=y
CONFIG_TEST_KSTRTOX=y
CONFIG_TEST_PRINTF=y
# CONFIG_TEST_SCANF is not set
CONFIG_TEST_BITMAP=y
CONFIG_TEST_UUID=y
# CONFIG_TEST_XARRAY is not set
CONFIG_TEST_RHASHTABLE=y
# CONFIG_TEST_SIPHASH is not set
# CONFIG_TEST_IDA is not set
CONFIG_FIND_BIT_BENCHMARK=y
CONFIG_TEST_FIRMWARE=y
CONFIG_BITFIELD_KUNIT=y
# CONFIG_HASH_KUNIT_TEST is not set
# CONFIG_RESOURCE_KUNIT_TEST is not set
CONFIG_SYSCTL_KUNIT_TEST=y
# CONFIG_LIST_KUNIT_TEST is not set
CONFIG_LINEAR_RANGES_TEST=y
# CONFIG_CMDLINE_KUNIT_TEST is not set
CONFIG_BITS_TEST=y
# CONFIG_MEMCPY_KUNIT_TEST is not set
# CONFIG_OVERFLOW_KUNIT_TEST is not set
# CONFIG_STACKINIT_KUNIT_TEST is not set
CONFIG_TEST_UDELAY=y
CONFIG_TEST_MEMCAT_P=y
CONFIG_TEST_MEMINIT=y
# CONFIG_TEST_FREE_PAGES is not set
# end of Kernel Testing and Coverage

# CONFIG_WARN_MISSING_DOCUMENTS is not set
# CONFIG_WARN_ABI_ERRORS is not set
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range
  2022-06-24 17:36 ` [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
  2022-06-25  2:58   ` kernel test robot
@ 2022-06-25  2:59   ` kernel test robot
  1 sibling, 0 replies; 128+ messages in thread
From: kernel test robot @ 2022-06-25  2:59 UTC (permalink / raw)
  To: James Houghton; +Cc: llvm, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 9606 bytes --]

Hi James,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on kvm/queue]
[also build test ERROR on arnd-asm-generic/master arm64/for-next/core linus/master v5.19-rc3]
[cannot apply to next-20220624]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/James-Houghton/hugetlb-Introduce-HugeTLB-high-granularity-mapping/20220625-014117
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: s390-randconfig-r044-20220624
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 6fa9120080c35a5ff851c3fc3358692c4ef7ce0d)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/c404da72c79a05fc3ed42e3dd616734ab17c7093
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review James-Houghton/hugetlb-Introduce-HugeTLB-high-granularity-mapping/20220625-014117
        git checkout c404da72c79a05fc3ed42e3dd616734ab17c7093
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/s390/mm/gmap.c:12:
   In file included from include/linux/pagewalk.h:6:
   include/linux/hugetlb.h:1205:38: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
                                        ^
   include/linux/hugetlb.h:1212:58: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
                                                            ^
   include/linux/hugetlb.h:1245:62: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
   include/linux/hugetlb.h:1248:14: error: incomplete definition of type 'struct hugetlb_pte'
           BUG_ON(!hpte->ptep);
                   ~~~~^
   include/asm-generic/bug.h:71:45: note: expanded from macro 'BUG_ON'
   #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                               ^~~~~~~~~
   include/linux/compiler.h:78:42: note: expanded from macro 'unlikely'
   # define unlikely(x)    __builtin_expect(!!(x), 0)
                                               ^
   include/linux/hugetlb.h:1245:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
   include/linux/hugetlb.h:1251:6: error: call to undeclared function 'hugetlb_pte_none'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
               ^
   include/linux/hugetlb.h:1251:32: error: call to undeclared function 'hugetlb_pte_present_leaf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
                                         ^
   include/linux/hugetlb.h:1252:27: error: call to undeclared function 'hugetlb_pte_shift'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                   return huge_pte_lockptr(hugetlb_pte_shift(hpte),
                                           ^
   include/linux/hugetlb.h:1253:13: error: incomplete definition of type 'struct hugetlb_pte'
                                   mm, hpte->ptep);
                                       ~~~~^
   include/linux/hugetlb.h:1245:62: note: forward declaration of 'struct hugetlb_pte'
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                ^
   include/linux/hugetlb.h:1258:59: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                             ^
   include/linux/hugetlb.h:1260:44: error: incompatible pointer types passing 'struct hugetlb_pte *' to parameter of type 'struct hugetlb_pte *' [-Werror,-Wincompatible-pointer-types]
           spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
                                                     ^~~~
   include/linux/hugetlb.h:1245:75: note: passing argument to parameter 'hpte' here
   spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
                                                                             ^
   In file included from arch/s390/mm/gmap.c:12:
   include/linux/pagewalk.h:51:30: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
           int (*hugetlb_entry)(struct hugetlb_pte *hpte,
                                       ^
   In file included from arch/s390/mm/gmap.c:19:
   include/linux/mman.h:154:9: warning: division by zero is undefined [-Wdivision-by-zero]
                  _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mman.h:132:21: note: expanded from macro '_calc_vm_trans'
      : ((x) & (bit1)) / ((bit1) / (bit2))))
                       ^ ~~~~~~~~~~~~~~~~~
   arch/s390/mm/gmap.c:2623:46: warning: declaration of 'struct hugetlb_pte' will not be visible outside of this function [-Wvisibility]
   static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte,
                                                ^
>> arch/s390/mm/gmap.c:2627:7: error: call to undeclared function 'hugetlb_pte_present_leaf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           if (!hugetlb_pte_present_leaf(hpte) ||
                ^
>> arch/s390/mm/gmap.c:2628:4: error: call to undeclared function 'hugetlb_pte_size'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                           hugetlb_pte_size(hpte) != PMD_SIZE)
                           ^
   arch/s390/mm/gmap.c:2628:4: note: did you mean 'hugetlb_pte_lock'?
   include/linux/hugetlb.h:1258:13: note: 'hugetlb_pte_lock' declared here
   spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
               ^
>> arch/s390/mm/gmap.c:2631:24: error: use of undeclared identifier 'pte'
           pmd_t *pmd = (pmd_t *)pte;
                                 ^
   arch/s390/mm/gmap.c:2631:9: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
           pmd_t *pmd = (pmd_t *)pte;
                  ^
>> arch/s390/mm/gmap.c:2654:20: error: incompatible function pointer types initializing 'int (*)(struct hugetlb_pte *, unsigned long, unsigned long, struct mm_walk *)' with an expression of type 'int (struct hugetlb_pte *, unsigned long, unsigned long, struct mm_walk *)' [-Werror,-Wincompatible-function-pointer-types]
           .hugetlb_entry          = __s390_enable_skey_hugetlb,
                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~
   8 warnings and 10 errors generated.


vim +/hugetlb_pte_present_leaf +2627 arch/s390/mm/gmap.c

  2622	
  2623	static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte,
  2624					      unsigned long addr, unsigned long next,
  2625					      struct mm_walk *walk)
  2626	{
> 2627		if (!hugetlb_pte_present_leaf(hpte) ||
> 2628				hugetlb_pte_size(hpte) != PMD_SIZE)
  2629			return -EINVAL;
  2630	
> 2631		pmd_t *pmd = (pmd_t *)pte;
  2632		unsigned long start, end;
  2633		struct page *page = pmd_page(*pmd);
  2634	
  2635		/*
  2636		 * The write check makes sure we do not set a key on shared
  2637		 * memory. This is needed as the walker does not differentiate
  2638		 * between actual guest memory and the process executable or
  2639		 * shared libraries.
  2640		 */
  2641		if (pmd_val(*pmd) & _SEGMENT_ENTRY_INVALID ||
  2642		    !(pmd_val(*pmd) & _SEGMENT_ENTRY_WRITE))
  2643			return 0;
  2644	
  2645		start = pmd_val(*pmd) & HPAGE_MASK;
  2646		end = start + HPAGE_SIZE - 1;
  2647		__storage_key_init_range(start, end);
  2648		set_bit(PG_arch_1, &page->flags);
  2649		cond_resched();
  2650		return 0;
  2651	}
  2652	
  2653	static const struct mm_walk_ops enable_skey_walk_ops = {
> 2654		.hugetlb_entry		= __s390_enable_skey_hugetlb,
  2655		.pte_entry		= __s390_enable_skey_pte,
  2656		.pmd_entry		= __s390_enable_skey_pmd,
  2657	};
  2658	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 80687 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/s390 5.19.0-rc1 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="clang version 15.0.0 (git://gitmirror/llvm_project 6fa9120080c35a5ff851c3fc3358692c4ef7ce0d)"
CONFIG_GCC_VERSION=0
CONFIG_CC_IS_CLANG=y
CONFIG_CLANG_VERSION=150000
CONFIG_AS_IS_LLVM=y
CONFIG_AS_VERSION=150000
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23800
CONFIG_LLD_VERSION=0
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_HAVE_KERNEL_UNCOMPRESSED=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_KERNEL_UNCOMPRESSED=y
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_SIM=y
CONFIG_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_DEBUGFS=y
# end of IRQ subsystem

CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
# CONFIG_TIME_KUNIT_TEST is not set

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_DEFAULT_ON=y
# end of BPF subsystem

CONFIG_PREEMPT_VOLUNTARY_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
# CONFIG_IKCONFIG_PROC is not set
CONFIG_IKHEADERS=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
# CONFIG_PRINTK_INDEX is not set

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough"
# CONFIG_CGROUPS is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
CONFIG_NET_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_RD_GZIP is not set
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
# CONFIG_RD_ZSTD is not set
# CONFIG_BOOT_CONFIG is not set
CONFIG_INITRAMFS_PRESERVE_MTIME=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
# CONFIG_EXPERT is not set
CONFIG_MULTIUSER=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_RSEQ=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_MMU=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_PGSTE=y
CONFIG_AUDIT_ARCH=y
CONFIG_NO_IOPORT_MAP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_KASAN_SHADOW_OFFSET=0x1C000000000000
CONFIG_S390=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_HAVE_LIVEPATCH=y

#
# Processor type and features
#
CONFIG_HAVE_MARCH_Z10_FEATURES=y
CONFIG_HAVE_MARCH_Z196_FEATURES=y
# CONFIG_MARCH_Z10 is not set
CONFIG_MARCH_Z196=y
# CONFIG_MARCH_ZEC12 is not set
# CONFIG_MARCH_Z13 is not set
# CONFIG_MARCH_Z14 is not set
# CONFIG_MARCH_Z15 is not set
# CONFIG_MARCH_Z16 is not set
CONFIG_MARCH_Z196_TUNE=y
CONFIG_TUNE_DEFAULT=y
# CONFIG_TUNE_Z10 is not set
# CONFIG_TUNE_Z196 is not set
# CONFIG_TUNE_ZEC12 is not set
# CONFIG_TUNE_Z13 is not set
# CONFIG_TUNE_Z14 is not set
# CONFIG_TUNE_Z15 is not set
# CONFIG_TUNE_Z16 is not set
CONFIG_64BIT=y
CONFIG_COMMAND_LINE_SIZE=4096
CONFIG_SMP=y
CONFIG_NR_CPUS=64
CONFIG_HOTPLUG_CPU=y
# CONFIG_SCHED_TOPOLOGY is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_KEXEC=y
# CONFIG_ARCH_RANDOM is not set
CONFIG_KERNEL_NOBP=y
# CONFIG_RELOCATABLE is not set
# end of Processor type and features

#
# Memory setup
#
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_MAX_PHYSMEM_BITS=46
# CONFIG_CHECK_STACK is not set
# end of Memory setup

#
# I/O subsystem
#
CONFIG_QDIO=y
CONFIG_CHSC_SCH=m
# CONFIG_SCM_BUS is not set
# CONFIG_VFIO_CCW is not set
# end of I/O subsystem

#
# Dump support
#
CONFIG_CRASH_DUMP=y
# end of Dump support

CONFIG_CCW=y
CONFIG_HAVE_PNETID=m

#
# Virtualization
#
CONFIG_PROTECTED_VIRTUALIZATION_GUEST=y
CONFIG_PFAULT=y
CONFIG_CMM=m
# CONFIG_APPLDATA_BASE is not set
# CONFIG_S390_HYPFS_FS is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_KVM_ASYNC_PF_SYNC=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_HAVE_KVM_INVALID_WAKEUPS=y
CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=y
# CONFIG_KVM_S390_UCONTROL is not set
CONFIG_S390_GUEST=y
# end of Virtualization

#
# Selftests
#
CONFIG_S390_UNWIND_SELFTEST=m
# CONFIG_S390_KPROBES_SANITY_TEST is not set
# CONFIG_S390_MODULES_SANITY_TEST is not set
# end of Selftests

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_ARCH_32BIT_USTAT_F_TINODE=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_MMU_GATHER_NO_GATHER=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_LTO_NONE=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE=y
CONFIG_ARCH_HAS_SCALED_CPUTIME=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ALTERNATE_USER_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_CLONE_BACKWARDS2=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_ARCH_HAS_VDSO_DATA=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y

#
# GCOV-based kernel profiling
#
CONFIG_GCOV_KERNEL=y
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
CONFIG_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
# CONFIG_MODULE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
# CONFIG_MODULE_SIG_SHA256 is not set
# CONFIG_MODULE_SIG_SHA384 is not set
CONFIG_MODULE_SIG_SHA512=y
CONFIG_MODULE_SIG_HASH="sha512"
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLOCK_LEGACY_AUTOLOAD=y
CONFIG_BLK_DEV_BSG_COMMON=y
CONFIG_BLK_ICQ=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=m
CONFIG_BLK_DEV_ZONED=y
# CONFIG_BLK_WBT is not set
# CONFIG_BLK_DEBUG_FS is not set
CONFIG_BLK_SED_OPAL=y
CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_EFI_PARTITION=y
# end of Partition Types

CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLOCK_HOLDER_DEPRECATED=y
CONFIG_BLK_MQ_STACKING=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK=y
CONFIG_ARCH_INLINE_SPIN_LOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_READ_TRYLOCK=y
CONFIG_ARCH_INLINE_READ_LOCK=y
CONFIG_ARCH_INLINE_READ_LOCK_BH=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_READ_UNLOCK=y
CONFIG_ARCH_INLINE_READ_UNLOCK_BH=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_WRITE_TRYLOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y

#
# Executable file formats
#
# CONFIG_BINFMT_ELF is not set
CONFIG_ARCH_BINFMT_ELF_STATE=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=y
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SWAP=y
# CONFIG_ZSWAP is not set

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLAB_MERGE_DEFAULT is not set
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SLUB_STATS=y
CONFIG_SLUB_CPU_PARTIAL=y
# end of SLAB allocator options

CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
CONFIG_COMPAT_BRK=y
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK_PHYS_MAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_EXCLUSIVE_SYSTEM_RAM=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
# CONFIG_KSM is not set
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
CONFIG_READ_ONLY_THP_FOR_FS=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
CONFIG_CMA_DEBUGFS=y
# CONFIG_CMA_SYSFS is not set
CONFIG_CMA_AREAS=7
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ZONE_DMA=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_PERCPU_STATS is not set
# CONFIG_GUP_TEST is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
# CONFIG_ANON_VMA_NAME is not set
CONFIG_USERFAULTFD=y

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
# CONFIG_UNIX is not set
CONFIG_IUCV=y
CONFIG_AFIUCV=y
# CONFIG_INET is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
CONFIG_NETWORK_PHY_TIMESTAMPING=y
# CONFIG_NETFILTER is not set
CONFIG_ATM=y
CONFIG_ATM_LANE=y
CONFIG_STP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_MRP=y
# CONFIG_BRIDGE_CFM is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
CONFIG_X25=y
CONFIG_LAPB=y
# CONFIG_PHONET is not set
CONFIG_IEEE802154=m
CONFIG_IEEE802154_NL802154_EXPERIMENTAL=y
CONFIG_IEEE802154_SOCKET=m
CONFIG_MAC802154=m
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=y
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=y
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_CBS=y
CONFIG_NET_SCH_ETF=m
CONFIG_NET_SCH_TAPRIO=m
CONFIG_NET_SCH_GRED=m
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
CONFIG_NET_SCH_DRR=y
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
# CONFIG_NET_SCH_CHOKE is not set
# CONFIG_NET_SCH_QFQ is not set
# CONFIG_NET_SCH_CODEL is not set
# CONFIG_NET_SCH_FQ_CODEL is not set
CONFIG_NET_SCH_CAKE=y
# CONFIG_NET_SCH_FQ is not set
# CONFIG_NET_SCH_HHF is not set
CONFIG_NET_SCH_PIE=y
CONFIG_NET_SCH_FQ_PIE=m
# CONFIG_NET_SCH_PLUG is not set
CONFIG_NET_SCH_ETS=y
# CONFIG_NET_SCH_DEFAULT is not set

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=y
CONFIG_NET_CLS_TCINDEX=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=y
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=y
CONFIG_NET_CLS_FLOW=m
# CONFIG_NET_CLS_BPF is not set
CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_CLS_MATCHALL=m
# CONFIG_NET_EMATCH is not set
# CONFIG_NET_CLS_ACT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
# CONFIG_DNS_RESOLVER is not set
CONFIG_BATMAN_ADV=y
# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_DEBUG=y
# CONFIG_BATMAN_ADV_TRACING is not set
CONFIG_VSOCKETS=y
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VIRTIO_VSOCKETS=y
CONFIG_VIRTIO_VSOCKETS_COMMON=y
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=y
CONFIG_MPLS_ROUTING=m
# CONFIG_NET_NSH is not set
# CONFIG_HSR is not set
# CONFIG_QRTR is not set
CONFIG_PCPU_DEV_REFCNT=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
# end of Network testing
# end of Networking options

# CONFIG_CAN is not set
# CONFIG_MCTP is not set
CONFIG_RFKILL=y
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_INPUT=y
CONFIG_RFKILL_GPIO=y
# CONFIG_NET_9P is not set
CONFIG_CAIF=y
CONFIG_CAIF_DEBUG=y
CONFIG_CAIF_NETDEV=m
CONFIG_CAIF_USB=m
CONFIG_NFC=m
CONFIG_NFC_DIGITAL=m
# CONFIG_NFC_NCI is not set
CONFIG_NFC_HCI=m
# CONFIG_NFC_SHDLC is not set

#
# Near Field Communication (NFC) devices
#
CONFIG_NFC_SIM=m
CONFIG_NFC_PN533=m
# CONFIG_NFC_PN533_I2C is not set
CONFIG_NFC_PN532_UART=m
# end of Near Field Communication (NFC) devices

CONFIG_PSAMPLE=m
CONFIG_NET_IFE=m
# CONFIG_LWTUNNEL is not set
CONFIG_FAILOVER=y
# CONFIG_ETHTOOL_NETLINK is not set
# CONFIG_NETDEV_ADDR_LIST_TEST is not set

#
# Device Drivers
#
CONFIG_HAVE_PCI=y
# CONFIG_PCI is not set
CONFIG_PCCARD=y
CONFIG_PCMCIA=m
CONFIG_PCMCIA_LOAD_CIS=y

#
# PC-card bridges
#

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
# CONFIG_DEVTMPFS_SAFE is not set
# CONFIG_STANDALONE is not set
# CONFIG_PREVENT_FIRMWARE_BUILD is not set

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_FW_LOADER_SYSFS=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_COMPRESS=y
CONFIG_FW_LOADER_COMPRESS_XZ=y
# CONFIG_FW_LOADER_COMPRESS_ZSTD is not set
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
CONFIG_TEST_ASYNC_DRIVER_PROBE=m
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=y
CONFIG_REGMAP_SPMI=m
CONFIG_REGMAP_IRQ=y
CONFIG_REGMAP_I3C=y
# end of Generic Driver Options

#
# Bus devices
#
CONFIG_MHI_BUS=y
# CONFIG_MHI_BUS_DEBUG is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

CONFIG_CONNECTOR=m

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
CONFIG_MTD=m
CONFIG_MTD_TESTS=m

#
# Partition parsers
#
CONFIG_MTD_AR7_PARTS=m
# CONFIG_MTD_CMDLINE_PARTS is not set
CONFIG_MTD_OF_PARTS=m
CONFIG_MTD_REDBOOT_PARTS=m
CONFIG_MTD_REDBOOT_DIRECTORY_BLOCK=-1
# CONFIG_MTD_REDBOOT_PARTS_UNALLOCATED is not set
# CONFIG_MTD_REDBOOT_PARTS_READONLY is not set
# end of Partition parsers

#
# User Modules And Translation Layers
#
CONFIG_MTD_BLKDEVS=m
CONFIG_MTD_BLOCK=m
# CONFIG_MTD_BLOCK_RO is not set

#
# Note that in some cases UBI block is preferred. See MTD_UBI_BLOCK.
#
CONFIG_FTL=m
# CONFIG_NFTL is not set
CONFIG_INFTL=m
CONFIG_RFD_FTL=m
CONFIG_SSFDC=m
CONFIG_SM_FTL=m
CONFIG_MTD_OOPS=m
# CONFIG_MTD_SWAP is not set
# CONFIG_MTD_PARTITIONED_MASTER is not set

#
# RAM/ROM/Flash chip drivers
#
# CONFIG_MTD_CFI is not set
# CONFIG_MTD_JEDECPROBE is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_RAM is not set
CONFIG_MTD_ROM=m
# CONFIG_MTD_ABSENT is not set
# end of RAM/ROM/Flash chip drivers

#
# NAND
#
CONFIG_MTD_NAND_CORE=m
CONFIG_MTD_RAW_NAND=m

#
# Raw/parallel NAND flash controllers
#

#
# Misc
#
# CONFIG_MTD_NAND_NANDSIM is not set

#
# ECC engine support
#
CONFIG_MTD_NAND_ECC=y
CONFIG_MTD_NAND_ECC_SW_HAMMING=y
# CONFIG_MTD_NAND_ECC_SW_HAMMING_SMC is not set
# CONFIG_MTD_NAND_ECC_SW_BCH is not set
# end of ECC engine support
# end of NAND

#
# LPDDR & LPDDR2 PCM memory drivers
#
CONFIG_MTD_LPDDR=m
CONFIG_MTD_QINFO_PROBE=m
# end of LPDDR & LPDDR2 PCM memory drivers

CONFIG_MTD_UBI=m
CONFIG_MTD_UBI_WL_THRESHOLD=4096
CONFIG_MTD_UBI_BEB_LIMIT=20
CONFIG_MTD_UBI_FASTMAP=y
# CONFIG_MTD_UBI_GLUEBI is not set
# CONFIG_MTD_UBI_BLOCK is not set
CONFIG_DTC=y
CONFIG_OF=y
CONFIG_OF_UNITTEST=y
CONFIG_OF_FLATTREE=y
CONFIG_OF_EARLY_FLATTREE=y
CONFIG_OF_KOBJ=y
CONFIG_OF_DYNAMIC=y
CONFIG_OF_IRQ=y
CONFIG_OF_RESERVED_MEM=y
CONFIG_OF_RESOLVE=y
# CONFIG_OF_OVERLAY is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
CONFIG_CDROM=m
# CONFIG_ZRAM is not set
# CONFIG_BLK_DEV_LOOP is not set

#
# DRBD disabled because PROC_FS or INET not selected
#
# CONFIG_BLK_DEV_NBD is not set
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
# CONFIG_CDROM_PKTCDVD is not set
CONFIG_ATA_OVER_ETH=m

#
# S/390 block device drivers
#
# CONFIG_DCSSBLK is not set
CONFIG_DASD=m
# CONFIG_DASD_PROFILE is not set
# CONFIG_DASD_ECKD is not set
CONFIG_DASD_FBA=m
# CONFIG_DASD_DIAG is not set
# CONFIG_DASD_EER is not set
# CONFIG_VIRTIO_BLK is not set

#
# NVME Support
#
# CONFIG_NVME_FC is not set
# CONFIG_NVME_TARGET is not set
# end of NVME Support

#
# Misc devices
#
CONFIG_AD525X_DPOT=y
CONFIG_AD525X_DPOT_I2C=m
# CONFIG_DUMMY_IRQ is not set
CONFIG_ICS932S401=y
CONFIG_ENCLOSURE_SERVICES=m
# CONFIG_APDS9802ALS is not set
# CONFIG_ISL29003 is not set
CONFIG_ISL29020=y
# CONFIG_SENSORS_TSL2550 is not set
CONFIG_SENSORS_BH1770=y
# CONFIG_SENSORS_APDS990X is not set
CONFIG_HMC6352=m
CONFIG_DS1682=y
# CONFIG_OPEN_DICE is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
CONFIG_EEPROM_AT24=m
CONFIG_EEPROM_LEGACY=m
CONFIG_EEPROM_MAX6875=y
CONFIG_EEPROM_93CX6=m
CONFIG_EEPROM_IDT_89HPESX=m
# CONFIG_EEPROM_EE1004 is not set
# end of EEPROM support

#
# Texas Instruments shared transport line discipline
#
CONFIG_TI_ST=m
# end of Texas Instruments shared transport line discipline

# CONFIG_SENSORS_LIS3_I2C is not set
CONFIG_ALTERA_STAPL=m
CONFIG_ECHO=m
# CONFIG_UACCE is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=m
CONFIG_RAID_ATTRS=m
CONFIG_SCSI_COMMON=m
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_BLK_DEV_BSG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
# CONFIG_SCSI_CONSTANTS is not set
# CONFIG_SCSI_LOGGING is not set
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
# CONFIG_SCSI_FC_ATTRS is not set
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
# CONFIG_SCSI_SAS_HOST_SMP is not set
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

# CONFIG_SCSI_LOWLEVEL is not set
# CONFIG_SCSI_DH is not set
# end of SCSI device support

CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_BCACHE=m
# CONFIG_BCACHE_DEBUG is not set
# CONFIG_BCACHE_CLOSURES_DEBUG is not set
CONFIG_BCACHE_ASYNC_REGISTRATION=y
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=y
CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING=y
# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set
CONFIG_DM_BIO_PRISON=y
CONFIG_DM_PERSISTENT_DATA=y
CONFIG_DM_UNSTRIPED=m
CONFIG_DM_CRYPT=y
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=y
# CONFIG_DM_CACHE is not set
# CONFIG_DM_WRITECACHE is not set
CONFIG_DM_EBS=m
CONFIG_DM_ERA=y
CONFIG_DM_CLONE=m
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
# CONFIG_DM_ZERO is not set
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=y
CONFIG_DM_MULTIPATH_HST=m
# CONFIG_DM_MULTIPATH_IOA is not set
CONFIG_DM_DELAY=y
CONFIG_DM_DUST=m
CONFIG_DM_INIT=y
CONFIG_DM_UEVENT=y
# CONFIG_DM_FLAKEY is not set
CONFIG_DM_VERITY=m
CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
# CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG_SECONDARY_KEYRING is not set
CONFIG_DM_VERITY_FEC=y
CONFIG_DM_SWITCH=m
CONFIG_DM_LOG_WRITES=m
CONFIG_DM_INTEGRITY=y
# CONFIG_DM_ZONED is not set
CONFIG_DM_AUDIT=y
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
CONFIG_TCM_FILEIO=m
# CONFIG_TCM_PSCSI is not set
# CONFIG_LOOPBACK_TARGET is not set
CONFIG_NETDEVICES=y
# CONFIG_NET_CORE is not set
CONFIG_ARCNET=m
CONFIG_ARCNET_1201=m
CONFIG_ARCNET_1051=m
CONFIG_ARCNET_RAW=m
# CONFIG_ARCNET_CAP is not set
CONFIG_ARCNET_COM90xx=m
CONFIG_ARCNET_COM90xxIO=m
CONFIG_ARCNET_RIM_I=m
CONFIG_ARCNET_COM20020=m
CONFIG_ARCNET_COM20020_CS=m
# CONFIG_ATM_DRIVERS is not set
# CONFIG_CAIF_DRIVERS is not set
# CONFIG_ETHERNET is not set
CONFIG_PHYLIB=y
CONFIG_SWPHY=y
CONFIG_LED_TRIGGER_PHY=y
CONFIG_FIXED_PHY=y

#
# MII PHY device drivers
#
# CONFIG_AMD_PHY is not set
CONFIG_ADIN_PHY=m
# CONFIG_ADIN1100_PHY is not set
CONFIG_AQUANTIA_PHY=m
CONFIG_AX88796B_PHY=m
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM54140_PHY is not set
CONFIG_BCM7XXX_PHY=y
# CONFIG_BCM84881_PHY is not set
# CONFIG_BCM87XX_PHY is not set
CONFIG_BCM_NET_PHYLIB=y
# CONFIG_CICADA_PHY is not set
CONFIG_CORTINA_PHY=y
CONFIG_DAVICOM_PHY=y
# CONFIG_ICPLUS_PHY is not set
CONFIG_LXT_PHY=y
# CONFIG_INTEL_XWAY_PHY is not set
CONFIG_LSI_ET1011C_PHY=y
CONFIG_MARVELL_PHY=y
CONFIG_MARVELL_10G_PHY=m
# CONFIG_MARVELL_88X2222_PHY is not set
CONFIG_MAXLINEAR_GPHY=m
# CONFIG_MEDIATEK_GE_PHY is not set
CONFIG_MICREL_PHY=y
# CONFIG_MICROCHIP_PHY is not set
# CONFIG_MICROCHIP_T1_PHY is not set
CONFIG_MICROSEMI_PHY=m
# CONFIG_MOTORCOMM_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_NXP_C45_TJA11XX_PHY is not set
CONFIG_AT803X_PHY=m
CONFIG_QSEMI_PHY=y
# CONFIG_REALTEK_PHY is not set
CONFIG_RENESAS_PHY=m
CONFIG_ROCKCHIP_PHY=y
CONFIG_SMSC_PHY=m
CONFIG_STE10XP=y
# CONFIG_TERANETICS_PHY is not set
CONFIG_DP83822_PHY=y
# CONFIG_DP83TC811_PHY is not set
CONFIG_DP83848_PHY=m
# CONFIG_DP83867_PHY is not set
CONFIG_DP83869_PHY=m
# CONFIG_DP83TD510_PHY is not set
CONFIG_VITESSE_PHY=m
CONFIG_XILINX_GMII2RGMII=m
CONFIG_MDIO_DEVICE=y
CONFIG_MDIO_BUS=y
CONFIG_FWNODE_MDIO=y
CONFIG_OF_MDIO=y
CONFIG_MDIO_DEVRES=y
CONFIG_MDIO_BITBANG=m
# CONFIG_MDIO_GPIO is not set

#
# MDIO Multiplexers
#
# CONFIG_MDIO_BUS_MUX_MULTIPLEXER is not set

#
# PCS device drivers
#
CONFIG_PCS_XPCS=m
# end of PCS device drivers

CONFIG_PPP=y
# CONFIG_PPP_BSDCOMP is not set
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_FILTER=y
CONFIG_PPP_MPPE=m
# CONFIG_PPP_MULTILINK is not set
CONFIG_PPPOATM=m
CONFIG_PPPOE=m
# CONFIG_PPP_ASYNC is not set
CONFIG_PPP_SYNC_TTY=m
CONFIG_SLIP=y
CONFIG_SLHC=y
# CONFIG_SLIP_COMPRESSED is not set
# CONFIG_SLIP_SMART is not set
CONFIG_SLIP_MODE_SLIP6=y

#
# S/390 network device drivers
#
CONFIG_CTCM=m
# CONFIG_NETIUCV is not set
# CONFIG_SMSGIUCV is not set
CONFIG_CCWGROUP=m
# end of S/390 network device drivers

#
# Host-side USB support is needed for USB Network Adapter support
#
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m
# CONFIG_IEEE802154_FAKELB is not set
CONFIG_IEEE802154_HWSIM=m

#
# Wireless WAN
#
# CONFIG_WWAN is not set
# end of Wireless WAN

CONFIG_NET_FAILOVER=y

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_LEDS is not set
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_SPARSEKMAP=m
# CONFIG_INPUT_MATRIXKMAP is not set
CONFIG_INPUT_VIVALDIFMAP=m

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=y
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_BYD=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_PS2_SMBUS=y
CONFIG_MOUSE_SERIAL=y
CONFIG_MOUSE_CYAPA=m
CONFIG_MOUSE_ELAN_I2C=m
# CONFIG_MOUSE_ELAN_I2C_I2C is not set
CONFIG_MOUSE_ELAN_I2C_SMBUS=y
CONFIG_MOUSE_VSXXXAA=y
# CONFIG_MOUSE_GPIO is not set
CONFIG_MOUSE_SYNAPTICS_I2C=m
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
CONFIG_INPUT_TOUCHSCREEN=y
# CONFIG_TOUCHSCREEN_AD7879 is not set
CONFIG_TOUCHSCREEN_ADC=y
CONFIG_TOUCHSCREEN_AR1021_I2C=y
CONFIG_TOUCHSCREEN_ATMEL_MXT=m
# CONFIG_TOUCHSCREEN_AUO_PIXCIR is not set
CONFIG_TOUCHSCREEN_BU21013=y
CONFIG_TOUCHSCREEN_BU21029=y
CONFIG_TOUCHSCREEN_CHIPONE_ICN8318=m
# CONFIG_TOUCHSCREEN_CY8CTMA140 is not set
# CONFIG_TOUCHSCREEN_CY8CTMG110 is not set
# CONFIG_TOUCHSCREEN_CYTTSP_CORE is not set
CONFIG_TOUCHSCREEN_CYTTSP4_CORE=m
CONFIG_TOUCHSCREEN_CYTTSP4_I2C=m
# CONFIG_TOUCHSCREEN_DYNAPRO is not set
CONFIG_TOUCHSCREEN_HAMPSHIRE=m
# CONFIG_TOUCHSCREEN_EETI is not set
CONFIG_TOUCHSCREEN_EGALAX=m
CONFIG_TOUCHSCREEN_EGALAX_SERIAL=m
CONFIG_TOUCHSCREEN_EXC3000=y
CONFIG_TOUCHSCREEN_FUJITSU=y
CONFIG_TOUCHSCREEN_GOODIX=m
CONFIG_TOUCHSCREEN_HIDEEP=y
# CONFIG_TOUCHSCREEN_HYCON_HY46XX is not set
CONFIG_TOUCHSCREEN_ILI210X=y
# CONFIG_TOUCHSCREEN_ILITEK is not set
# CONFIG_TOUCHSCREEN_S6SY761 is not set
CONFIG_TOUCHSCREEN_GUNZE=m
CONFIG_TOUCHSCREEN_EKTF2127=y
CONFIG_TOUCHSCREEN_ELAN=y
CONFIG_TOUCHSCREEN_ELO=m
CONFIG_TOUCHSCREEN_WACOM_W8001=y
# CONFIG_TOUCHSCREEN_WACOM_I2C is not set
CONFIG_TOUCHSCREEN_MAX11801=y
# CONFIG_TOUCHSCREEN_MCS5000 is not set
CONFIG_TOUCHSCREEN_MMS114=y
CONFIG_TOUCHSCREEN_MELFAS_MIP4=m
# CONFIG_TOUCHSCREEN_MSG2638 is not set
CONFIG_TOUCHSCREEN_MTOUCH=m
# CONFIG_TOUCHSCREEN_IMAGIS is not set
# CONFIG_TOUCHSCREEN_INEXIO is not set
CONFIG_TOUCHSCREEN_MK712=m
CONFIG_TOUCHSCREEN_PENMOUNT=m
CONFIG_TOUCHSCREEN_EDT_FT5X06=y
# CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
# CONFIG_TOUCHSCREEN_TOUCHWIN is not set
CONFIG_TOUCHSCREEN_PIXCIR=m
CONFIG_TOUCHSCREEN_WDT87XX_I2C=y
# CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
# CONFIG_TOUCHSCREEN_TSC_SERIO is not set
CONFIG_TOUCHSCREEN_TSC200X_CORE=m
CONFIG_TOUCHSCREEN_TSC2004=m
CONFIG_TOUCHSCREEN_TSC2007=m
# CONFIG_TOUCHSCREEN_TSC2007_IIO is not set
# CONFIG_TOUCHSCREEN_RM_TS is not set
# CONFIG_TOUCHSCREEN_SILEAD is not set
# CONFIG_TOUCHSCREEN_SIS_I2C is not set
# CONFIG_TOUCHSCREEN_ST1232 is not set
# CONFIG_TOUCHSCREEN_STMFTS is not set
# CONFIG_TOUCHSCREEN_SX8654 is not set
CONFIG_TOUCHSCREEN_TPS6507X=m
CONFIG_TOUCHSCREEN_ZET6223=m
# CONFIG_TOUCHSCREEN_ZFORCE is not set
CONFIG_TOUCHSCREEN_ROHM_BU21023=y
CONFIG_TOUCHSCREEN_IQS5XX=y
CONFIG_TOUCHSCREEN_ZINITIX=m
# CONFIG_INPUT_MISC is not set
CONFIG_RMI4_CORE=m
CONFIG_RMI4_I2C=m
CONFIG_RMI4_SMB=m
CONFIG_RMI4_F03=y
CONFIG_RMI4_F03_SERIO=m
CONFIG_RMI4_2D_SENSOR=y
CONFIG_RMI4_F11=y
# CONFIG_RMI4_F12 is not set
# CONFIG_RMI4_F30 is not set
# CONFIG_RMI4_F34 is not set
CONFIG_RMI4_F3A=y
# CONFIG_RMI4_F55 is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_SERPORT=y
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
CONFIG_SERIO_PS2MULT=m
# CONFIG_SERIO_GPIO_PS2 is not set
# CONFIG_USERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
# CONFIG_LDISC_AUTOLOAD is not set
CONFIG_N_GSM=y
# CONFIG_NULL_TTY is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IUCV=y
# CONFIG_RPMSG_TTY is not set
CONFIG_SERIAL_DEV_BUS=y
CONFIG_SERIAL_DEV_CTRL_TTYPORT=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_IPMB_DEVICE_INTERFACE=m
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_VIRTIO is not set
# CONFIG_HW_RANDOM_S390 is not set

#
# PCMCIA character devices
#
CONFIG_SYNCLINK_CS=m
CONFIG_CARDMAN_4000=m
CONFIG_CARDMAN_4040=m
# CONFIG_SCR24X is not set
CONFIG_IPWIRELESS=m
# end of PCMCIA character devices

# CONFIG_DEVMEM is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# S/390 character device drivers
#
# CONFIG_TN3270 is not set
CONFIG_TN3215=y
# CONFIG_TN3215_CONSOLE is not set
# CONFIG_SCLP_TTY is not set
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
# CONFIG_HMC_DRV is not set
CONFIG_SCLP_OFB=y
CONFIG_S390_UV_UAPI=m
# CONFIG_S390_TAPE is not set
CONFIG_VMLOGRDR=m
CONFIG_VMCP=y
CONFIG_VMCP_CMA_SIZE=4
# CONFIG_MONREADER is not set
CONFIG_MONWRITER=y
CONFIG_S390_VMUR=m
CONFIG_XILLYBUS_CLASS=y
CONFIG_XILLYBUS=y
# CONFIG_XILLYBUS_OF is not set
CONFIG_RANDOM_TRUST_BOOTLOADER=y
# end of Character devices

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
CONFIG_I2C_MUX=y

#
# Multiplexer I2C Chip support
#
CONFIG_I2C_ARB_GPIO_CHALLENGE=y
CONFIG_I2C_MUX_GPIO=m
# CONFIG_I2C_MUX_GPMUX is not set
# CONFIG_I2C_MUX_LTC4306 is not set
CONFIG_I2C_MUX_PCA9541=y
CONFIG_I2C_MUX_PCA954x=m
# CONFIG_I2C_MUX_PINCTRL is not set
CONFIG_I2C_DEMUX_PINCTRL=m
CONFIG_I2C_MUX_MLXCPLD=m
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_STUB=m
CONFIG_I2C_SLAVE=y
CONFIG_I2C_SLAVE_EEPROM=y
CONFIG_I2C_SLAVE_TESTUNIT=y
CONFIG_I2C_DEBUG_CORE=y
# CONFIG_I2C_DEBUG_ALGO is not set
# end of I2C support

CONFIG_I3C=y
CONFIG_SPMI=m
CONFIG_HSI=m
CONFIG_HSI_BOARDINFO=y

#
# HSI controllers
#

#
# HSI clients
#
CONFIG_HSI_CHAR=m
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set
# CONFIG_NTP_PPS is not set

#
# PPS clients support
#
CONFIG_PPS_CLIENT_KTIMER=y
# CONFIG_PPS_CLIENT_LDISC is not set
CONFIG_PPS_CLIENT_GPIO=y

#
# PPS generators support
#

#
# PTP clock support
#
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_PTP_1588_CLOCK_OPTIONAL=y
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
CONFIG_DEBUG_PINCTRL=y
CONFIG_PINCTRL_MCP23S08_I2C=y
CONFIG_PINCTRL_MCP23S08=y
# CONFIG_PINCTRL_SX150X is not set

#
# Renesas pinctrl drivers
#
# end of Renesas pinctrl drivers

CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_GPIOLIB_IRQCHIP=y
# CONFIG_DEBUG_GPIO is not set
CONFIG_GPIO_CDEV=y
CONFIG_GPIO_CDEV_V1=y

#
# I2C GPIO expanders
#
CONFIG_GPIO_ADP5588=m
# CONFIG_GPIO_MAX7300 is not set
CONFIG_GPIO_MAX732X=m
# CONFIG_GPIO_PCA953X is not set
CONFIG_GPIO_PCA9570=m
CONFIG_GPIO_PCF857X=y
CONFIG_GPIO_TPIC2810=m
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
# end of MFD GPIO expanders

#
# Virtual GPIO drivers
#
CONFIG_GPIO_AGGREGATOR=y
CONFIG_GPIO_MOCKUP=m
# CONFIG_GPIO_VIRTIO is not set
# CONFIG_GPIO_SIM is not set
# end of Virtual GPIO drivers

# CONFIG_POWER_RESET is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_GENERIC_ADC_BATTERY is not set
# CONFIG_IP5XXX_POWER is not set
# CONFIG_TEST_POWER is not set
CONFIG_CHARGER_ADP5061=y
CONFIG_BATTERY_CW2015=m
CONFIG_BATTERY_DS2782=m
# CONFIG_BATTERY_SAMSUNG_SDI is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_CHARGER_SBS is not set
CONFIG_MANAGER_SBS=y
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_MAX17040 is not set
CONFIG_BATTERY_MAX17042=m
CONFIG_CHARGER_MAX8903=m
# CONFIG_CHARGER_LP8727 is not set
CONFIG_CHARGER_GPIO=y
# CONFIG_CHARGER_MANAGER is not set
CONFIG_CHARGER_LT3651=y
# CONFIG_CHARGER_LTC4162L is not set
# CONFIG_CHARGER_DETECTOR_MAX14656 is not set
# CONFIG_CHARGER_MAX77976 is not set
# CONFIG_CHARGER_BQ2415X is not set
# CONFIG_CHARGER_BQ24190 is not set
CONFIG_CHARGER_BQ24257=m
# CONFIG_CHARGER_BQ24735 is not set
CONFIG_CHARGER_BQ2515X=m
CONFIG_CHARGER_BQ25890=y
# CONFIG_CHARGER_BQ25980 is not set
# CONFIG_CHARGER_BQ256XX is not set
CONFIG_CHARGER_SMB347=m
CONFIG_BATTERY_GAUGE_LTC2941=y
CONFIG_BATTERY_RT5033=y
# CONFIG_CHARGER_RT9455 is not set
CONFIG_CHARGER_UCS1002=m
CONFIG_CHARGER_BD99954=m
# CONFIG_BATTERY_UG3105 is not set
CONFIG_THERMAL=y
CONFIG_THERMAL_NETLINK=y
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
CONFIG_THERMAL_OF=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
# CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE is not set
CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE=y
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
CONFIG_THERMAL_GOV_FAIR_SHARE=y
# CONFIG_THERMAL_GOV_STEP_WISE is not set
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
CONFIG_CPU_THERMAL=y
# CONFIG_THERMAL_EMULATION is not set
# CONFIG_GENERIC_ADC_THERMAL is not set
# CONFIG_WATCHDOG is not set
CONFIG_REGULATOR=y
CONFIG_REGULATOR_DEBUG=y
CONFIG_REGULATOR_FIXED_VOLTAGE=m
# CONFIG_REGULATOR_VIRTUAL_CONSUMER is not set
CONFIG_REGULATOR_USERSPACE_CONSUMER=m
CONFIG_REGULATOR_88PG86X=y
CONFIG_REGULATOR_ACT8865=y
CONFIG_REGULATOR_AD5398=m
# CONFIG_REGULATOR_DA9121 is not set
CONFIG_REGULATOR_DA9210=y
CONFIG_REGULATOR_DA9211=m
# CONFIG_REGULATOR_FAN53555 is not set
CONFIG_REGULATOR_FAN53880=m
CONFIG_REGULATOR_GPIO=y
CONFIG_REGULATOR_ISL9305=y
CONFIG_REGULATOR_ISL6271A=y
# CONFIG_REGULATOR_LP3971 is not set
CONFIG_REGULATOR_LP3972=m
# CONFIG_REGULATOR_LP872X is not set
CONFIG_REGULATOR_LP8755=y
CONFIG_REGULATOR_LTC3589=m
CONFIG_REGULATOR_LTC3676=y
# CONFIG_REGULATOR_MAX1586 is not set
CONFIG_REGULATOR_MAX8649=m
# CONFIG_REGULATOR_MAX8660 is not set
# CONFIG_REGULATOR_MAX8893 is not set
CONFIG_REGULATOR_MAX8952=m
CONFIG_REGULATOR_MAX8973=m
# CONFIG_REGULATOR_MAX20086 is not set
# CONFIG_REGULATOR_MAX77826 is not set
CONFIG_REGULATOR_MCP16502=m
CONFIG_REGULATOR_MP5416=m
CONFIG_REGULATOR_MP8859=m
CONFIG_REGULATOR_MP886X=y
CONFIG_REGULATOR_MPQ7920=m
CONFIG_REGULATOR_MT6311=m
# CONFIG_REGULATOR_MT6315 is not set
CONFIG_REGULATOR_PCA9450=y
# CONFIG_REGULATOR_PF8X00 is not set
# CONFIG_REGULATOR_PFUZE100 is not set
# CONFIG_REGULATOR_PV88060 is not set
CONFIG_REGULATOR_PV88080=y
CONFIG_REGULATOR_PV88090=y
CONFIG_REGULATOR_QCOM_SPMI=m
CONFIG_REGULATOR_QCOM_USB_VBUS=m
CONFIG_REGULATOR_RT4801=m
# CONFIG_REGULATOR_RT5190A is not set
# CONFIG_REGULATOR_RT5759 is not set
# CONFIG_REGULATOR_RT6160 is not set
# CONFIG_REGULATOR_RT6245 is not set
# CONFIG_REGULATOR_RTQ2134 is not set
CONFIG_REGULATOR_RTMV20=m
# CONFIG_REGULATOR_RTQ6752 is not set
CONFIG_REGULATOR_SLG51000=y
# CONFIG_REGULATOR_SY7636A is not set
CONFIG_REGULATOR_SY8106A=y
CONFIG_REGULATOR_SY8824X=y
# CONFIG_REGULATOR_SY8827N is not set
CONFIG_REGULATOR_TPS51632=y
CONFIG_REGULATOR_TPS62360=y
# CONFIG_REGULATOR_TPS6286X is not set
# CONFIG_REGULATOR_TPS65023 is not set
# CONFIG_REGULATOR_TPS6507X is not set
# CONFIG_REGULATOR_TPS65132 is not set
CONFIG_REGULATOR_VCTRL=m
# CONFIG_REGULATOR_QCOM_LABIBB is not set
CONFIG_RC_CORE=m
CONFIG_LIRC=y
CONFIG_RC_MAP=m
# CONFIG_RC_DECODERS is not set
# CONFIG_RC_DEVICES is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

#
# Graphics support
#

#
# Console display driver support
#
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
# end of Console display driver support
# end of Graphics support

#
# HID support
#
CONFIG_HID=m
# CONFIG_HID_BATTERY_STRENGTH is not set
# CONFIG_HIDRAW is not set
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set

#
# Special HID drivers
#
CONFIG_HID_A4TECH=m
CONFIG_HID_ACRUX=m
# CONFIG_HID_ACRUX_FF is not set
CONFIG_HID_APPLE=m
CONFIG_HID_AUREAL=m
# CONFIG_HID_BELKIN is not set
# CONFIG_HID_CHERRY is not set
CONFIG_HID_COUGAR=m
CONFIG_HID_MACALLY=m
CONFIG_HID_CMEDIA=m
CONFIG_HID_CYPRESS=m
CONFIG_HID_DRAGONRISE=m
# CONFIG_DRAGONRISE_FF is not set
CONFIG_HID_EMS_FF=m
# CONFIG_HID_ELECOM is not set
CONFIG_HID_EZKEY=m
# CONFIG_HID_GEMBIRD is not set
# CONFIG_HID_GFRM is not set
# CONFIG_HID_GLORIOUS is not set
CONFIG_HID_VIVALDI_COMMON=m
CONFIG_HID_VIVALDI=m
CONFIG_HID_KEYTOUCH=m
CONFIG_HID_KYE=m
# CONFIG_HID_WALTOP is not set
CONFIG_HID_VIEWSONIC=m
# CONFIG_HID_XIAOMI is not set
# CONFIG_HID_GYRATION is not set
# CONFIG_HID_ICADE is not set
CONFIG_HID_ITE=m
CONFIG_HID_JABRA=m
CONFIG_HID_TWINHAN=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LCPOWER=m
CONFIG_HID_LED=m
CONFIG_HID_LENOVO=m
CONFIG_HID_MAGICMOUSE=m
# CONFIG_HID_MALTRON is not set
# CONFIG_HID_MAYFLASH is not set
# CONFIG_HID_REDRAGON is not set
CONFIG_HID_MICROSOFT=m
# CONFIG_HID_MONTEREY is not set
# CONFIG_HID_MULTITOUCH is not set
# CONFIG_HID_NINTENDO is not set
CONFIG_HID_NTI=m
CONFIG_HID_ORTEK=m
# CONFIG_HID_PANTHERLORD is not set
CONFIG_HID_PETALYNX=m
# CONFIG_HID_PICOLCD is not set
CONFIG_HID_PLANTRONICS=m
# CONFIG_HID_PLAYSTATION is not set
# CONFIG_HID_RAZER is not set
CONFIG_HID_PRIMAX=m
CONFIG_HID_SAITEK=m
# CONFIG_HID_SEMITEK is not set
CONFIG_HID_SPEEDLINK=m
# CONFIG_HID_STEAM is not set
# CONFIG_HID_STEELSERIES is not set
CONFIG_HID_SUNPLUS=m
# CONFIG_HID_RMI is not set
CONFIG_HID_GREENASIA=m
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_SMARTJOYPLUS=m
CONFIG_SMARTJOYPLUS_FF=y
CONFIG_HID_TIVO=m
CONFIG_HID_TOPSEED=m
# CONFIG_HID_THINGM is not set
CONFIG_HID_UDRAW_PS3=m
CONFIG_HID_WIIMOTE=m
# CONFIG_HID_XINMO is not set
# CONFIG_HID_ZEROPLUS is not set
# CONFIG_HID_ZYDACRON is not set
# CONFIG_HID_ALPS is not set
# end of Special HID drivers

#
# I2C HID support
#
# CONFIG_I2C_HID_OF is not set
# CONFIG_I2C_HID_OF_GOODIX is not set
# end of I2C HID support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_SCSI_UFSHCD is not set
CONFIG_MEMSTICK=m
CONFIG_MEMSTICK_DEBUG=y

#
# MemoryStick drivers
#
# CONFIG_MEMSTICK_UNSAFE_RESUME is not set
CONFIG_MSPRO_BLOCK=m
CONFIG_MS_BLOCK=m

#
# MemoryStick Host Controller Drivers
#
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_CLASS_FLASH=m
CONFIG_LEDS_CLASS_MULTICOLOR=m
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set

#
# LED drivers
#
# CONFIG_LEDS_AN30259A is not set
# CONFIG_LEDS_AW2013 is not set
CONFIG_LEDS_LM3530=y
CONFIG_LEDS_LM3532=y
CONFIG_LEDS_LM3642=y
# CONFIG_LEDS_LM3692X is not set
CONFIG_LEDS_PCA9532=y
CONFIG_LEDS_PCA9532_GPIO=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_LP3944=m
CONFIG_LEDS_LP3952=y
CONFIG_LEDS_LP50XX=m
CONFIG_LEDS_LP55XX_COMMON=m
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_LP5523 is not set
# CONFIG_LEDS_LP5562 is not set
CONFIG_LEDS_LP8501=m
CONFIG_LEDS_LP8860=m
CONFIG_LEDS_PCA955X=y
CONFIG_LEDS_PCA955X_GPIO=y
CONFIG_LEDS_PCA963X=m
CONFIG_LEDS_REGULATOR=m
CONFIG_LEDS_BD2802=y
CONFIG_LEDS_LT3593=m
CONFIG_LEDS_TCA6507=m
# CONFIG_LEDS_TLC591XX is not set
# CONFIG_LEDS_LM355x is not set
CONFIG_LEDS_IS31FL319X=m
# CONFIG_LEDS_IS31FL32XX is not set

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=y
# CONFIG_LEDS_MLXREG is not set
CONFIG_LEDS_USER=y
CONFIG_LEDS_TI_LMU_COMMON=y
# CONFIG_LEDS_LM3697 is not set

#
# Flash and Torch LED drivers
#
CONFIG_LEDS_AAT1290=m
# CONFIG_LEDS_AS3645A is not set
# CONFIG_LEDS_KTD2692 is not set
CONFIG_LEDS_LM3601X=m
# CONFIG_LEDS_RT4505 is not set
# CONFIG_LEDS_RT8515 is not set
CONFIG_LEDS_SGM3140=m

#
# RGB LED drivers
#

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
# CONFIG_LEDS_TRIGGER_ONESHOT is not set
CONFIG_LEDS_TRIGGER_MTD=y
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_CPU is not set
CONFIG_LEDS_TRIGGER_ACTIVITY=m
# CONFIG_LEDS_TRIGGER_GPIO is not set
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=m
# CONFIG_LEDS_TRIGGER_CAMERA is not set
CONFIG_LEDS_TRIGGER_PANIC=y
CONFIG_LEDS_TRIGGER_NETDEV=m
CONFIG_LEDS_TRIGGER_PATTERN=y
CONFIG_LEDS_TRIGGER_AUDIO=y
# CONFIG_LEDS_TRIGGER_TTY is not set

#
# Simple LED drivers
#
CONFIG_ACCESSIBILITY=y

#
# Speakup console speech
#
CONFIG_SPEAKUP=m
# CONFIG_SPEAKUP_SYNTH_ACNTSA is not set
CONFIG_SPEAKUP_SYNTH_APOLLO=m
CONFIG_SPEAKUP_SYNTH_AUDPTR=m
CONFIG_SPEAKUP_SYNTH_BNS=m
# CONFIG_SPEAKUP_SYNTH_DECTLK is not set
CONFIG_SPEAKUP_SYNTH_DECEXT=m
# CONFIG_SPEAKUP_SYNTH_LTLK is not set
CONFIG_SPEAKUP_SYNTH_SOFT=m
CONFIG_SPEAKUP_SYNTH_SPKOUT=m
CONFIG_SPEAKUP_SYNTH_TXPRT=m
# CONFIG_SPEAKUP_SYNTH_DUMMY is not set
# end of Speakup console speech

CONFIG_DMADEVICES=y
CONFIG_DMADEVICES_DEBUG=y
CONFIG_DMADEVICES_VDEBUG=y

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_OF=y
CONFIG_FSL_EDMA=m
CONFIG_INTEL_IDMA64=y
# CONFIG_QCOM_HIDMA is not set

#
# DMA Clients
#
# CONFIG_ASYNC_TX_DMA is not set
CONFIG_DMATEST=m
CONFIG_DMA_ENGINE_RAID=y

#
# DMABUF options
#
# CONFIG_SYNC_FILE is not set
# CONFIG_DMABUF_HEAPS is not set
# end of DMABUF options

# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
CONFIG_VFIO=m
CONFIG_VFIO_IOMMU_TYPE1=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_MDEV=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS=y
# CONFIG_VIRTIO_MENU is not set
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
# CONFIG_VHOST_NET is not set
CONFIG_VHOST_SCSI=m
# CONFIG_VHOST_VSOCK is not set
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

CONFIG_GREYBUS=y
# CONFIG_COMEDI is not set
CONFIG_STAGING=y

#
# IIO staging drivers
#

#
# Accelerometers
#
# end of Accelerometers

#
# Analog to digital converters
#
# end of Analog to digital converters

#
# Analog digital bi-direction converters
#
CONFIG_ADT7316=y
# CONFIG_ADT7316_I2C is not set
# end of Analog digital bi-direction converters

#
# Capacitance to digital converters
#
# CONFIG_AD7746 is not set
# end of Capacitance to digital converters

#
# Direct Digital Synthesis
#
# end of Direct Digital Synthesis

#
# Network Analyzer, Impedance Converters
#
# CONFIG_AD5933 is not set
# end of Network Analyzer, Impedance Converters

#
# Active energy metering IC
#
# CONFIG_ADE7854 is not set
# end of Active energy metering IC

#
# Resolver to digital converters
#
# end of Resolver to digital converters
# end of IIO staging drivers

CONFIG_STAGING_MEDIA=y
CONFIG_GREYBUS_BOOTROM=m
CONFIG_GREYBUS_HID=m
# CONFIG_GREYBUS_LIGHT is not set
CONFIG_GREYBUS_LOG=y
CONFIG_GREYBUS_LOOPBACK=m
# CONFIG_GREYBUS_POWER is not set
CONFIG_GREYBUS_RAW=m
CONFIG_GREYBUS_VIBRATOR=m
CONFIG_GREYBUS_BRIDGED_PHY=y
CONFIG_GREYBUS_GPIO=y
CONFIG_GREYBUS_I2C=m
CONFIG_GREYBUS_UART=y
CONFIG_FIELDBUS_DEV=m
CONFIG_HMS_ANYBUSS_BUS=m
CONFIG_HMS_PROFINET=m

#
# VME Device Drivers
#
CONFIG_HAVE_CLK=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
CONFIG_COMMON_CLK_MAX9485=m
# CONFIG_COMMON_CLK_SI5341 is not set
CONFIG_COMMON_CLK_SI5351=y
CONFIG_COMMON_CLK_SI514=m
# CONFIG_COMMON_CLK_SI544 is not set
# CONFIG_COMMON_CLK_SI570 is not set
CONFIG_COMMON_CLK_CDCE706=m
CONFIG_COMMON_CLK_CDCE925=m
# CONFIG_COMMON_CLK_CS2000_CP is not set
# CONFIG_COMMON_CLK_RS9_PCIE is not set
# CONFIG_COMMON_CLK_VC5 is not set
CONFIG_COMMON_CLK_FIXED_MMIO=y
# CONFIG_CLK_KUNIT_TEST is not set
# CONFIG_CLK_GATE_KUNIT_TEST is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
# CONFIG_MICROCHIP_PIT64B is not set
# end of Clock Source drivers

CONFIG_MAILBOX=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

CONFIG_IOMMU_DEBUGFS=y
CONFIG_IOMMU_DEFAULT_DMA_STRICT=y
# CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_OF_IOMMU=y
CONFIG_S390_CCW_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
CONFIG_RPMSG=m
# CONFIG_RPMSG_CHAR is not set
# CONFIG_RPMSG_CTRL is not set
CONFIG_RPMSG_NS=m
CONFIG_RPMSG_VIRTIO=m
# end of Rpmsg drivers

CONFIG_SOUNDWIRE=m

#
# SoundWire Devices
#

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

CONFIG_SOC_TI=y

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
CONFIG_EXTCON=y

#
# Extcon Device Drivers
#
# CONFIG_EXTCON_ADC_JACK is not set
CONFIG_EXTCON_FSA9480=y
CONFIG_EXTCON_GPIO=y
CONFIG_EXTCON_MAX3355=y
# CONFIG_EXTCON_PTN5150 is not set
CONFIG_EXTCON_RT8973A=y
CONFIG_EXTCON_SM5502=y
CONFIG_EXTCON_USB_GPIO=y
# CONFIG_EXTCON_USBC_TUSB320 is not set
CONFIG_MEMORY=y
CONFIG_IIO=y
CONFIG_IIO_BUFFER=y
CONFIG_IIO_BUFFER_CB=y
# CONFIG_IIO_BUFFER_DMA is not set
# CONFIG_IIO_BUFFER_DMAENGINE is not set
CONFIG_IIO_BUFFER_HW_CONSUMER=y
CONFIG_IIO_KFIFO_BUF=y
CONFIG_IIO_TRIGGERED_BUFFER=y
CONFIG_IIO_CONFIGFS=y
CONFIG_IIO_TRIGGER=y
CONFIG_IIO_CONSUMERS_PER_TRIGGER=2
CONFIG_IIO_SW_DEVICE=y
# CONFIG_IIO_SW_TRIGGER is not set
CONFIG_IIO_TRIGGERED_EVENT=m

#
# Accelerometers
#
# CONFIG_ADXL313_I2C is not set
CONFIG_ADXL345=m
CONFIG_ADXL345_I2C=m
# CONFIG_ADXL355_I2C is not set
# CONFIG_ADXL367_I2C is not set
CONFIG_ADXL372=m
CONFIG_ADXL372_I2C=m
CONFIG_BMA180=m
CONFIG_BMA400=y
CONFIG_BMA400_I2C=y
CONFIG_BMC150_ACCEL=m
CONFIG_BMC150_ACCEL_I2C=m
# CONFIG_DA280 is not set
CONFIG_DA311=y
CONFIG_DMARD06=y
CONFIG_DMARD09=m
# CONFIG_DMARD10 is not set
# CONFIG_FXLS8962AF_I2C is not set
CONFIG_IIO_ST_ACCEL_3AXIS=m
CONFIG_IIO_ST_ACCEL_I2C_3AXIS=m
# CONFIG_KXSD9 is not set
# CONFIG_KXCJK1013 is not set
# CONFIG_MC3230 is not set
CONFIG_MMA7455=y
CONFIG_MMA7455_I2C=y
# CONFIG_MMA7660 is not set
# CONFIG_MMA8452 is not set
CONFIG_MMA9551_CORE=y
CONFIG_MMA9551=y
CONFIG_MMA9553=y
CONFIG_MXC4005=m
CONFIG_MXC6255=y
CONFIG_STK8312=y
# CONFIG_STK8BA50 is not set
# end of Accelerometers

#
# Analog to digital converters
#
CONFIG_AD7091R5=y
CONFIG_AD7291=m
CONFIG_AD799X=m
CONFIG_ENVELOPE_DETECTOR=m
CONFIG_HX711=y
# CONFIG_INA2XX_ADC is not set
CONFIG_LTC2471=m
CONFIG_LTC2485=y
# CONFIG_LTC2497 is not set
# CONFIG_MAX1363 is not set
CONFIG_MAX9611=m
CONFIG_MCP3422=y
CONFIG_NAU7802=y
CONFIG_QCOM_VADC_COMMON=m
# CONFIG_QCOM_SPMI_IADC is not set
CONFIG_QCOM_SPMI_VADC=m
CONFIG_QCOM_SPMI_ADC5=m
CONFIG_SD_ADC_MODULATOR=m
CONFIG_TI_ADC081C=m
CONFIG_TI_ADS1015=y
# end of Analog to digital converters

#
# Analog to digital and digital to analog converters
#
# end of Analog to digital and digital to analog converters

#
# Analog Front Ends
#
CONFIG_IIO_RESCALE=m
# end of Analog Front Ends

#
# Amplifiers
#
CONFIG_HMC425=m
# end of Amplifiers

#
# Capacitance to digital converters
#
CONFIG_AD7150=m
# end of Capacitance to digital converters

#
# Chemical Sensors
#
# CONFIG_ATLAS_PH_SENSOR is not set
CONFIG_ATLAS_EZO_SENSOR=m
CONFIG_BME680=m
CONFIG_BME680_I2C=m
CONFIG_CCS811=m
CONFIG_IAQCORE=m
CONFIG_PMS7003=m
# CONFIG_SCD30_CORE is not set
# CONFIG_SCD4X is not set
CONFIG_SENSIRION_SGP30=m
# CONFIG_SENSIRION_SGP40 is not set
# CONFIG_SPS30_I2C is not set
# CONFIG_SPS30_SERIAL is not set
# CONFIG_SENSEAIR_SUNRISE_CO2 is not set
CONFIG_VZ89X=y
# end of Chemical Sensors

#
# Hid Sensor IIO Common
#
# end of Hid Sensor IIO Common

CONFIG_IIO_MS_SENSORS_I2C=y

#
# IIO SCMI Sensors
#
# end of IIO SCMI Sensors

#
# SSP Sensor Common
#
# end of SSP Sensor Common

CONFIG_IIO_ST_SENSORS_I2C=m
CONFIG_IIO_ST_SENSORS_CORE=m

#
# Digital to analog converters
#
# CONFIG_AD5064 is not set
# CONFIG_AD5380 is not set
CONFIG_AD5446=y
CONFIG_AD5592R_BASE=m
CONFIG_AD5593R=m
# CONFIG_AD5696_I2C is not set
CONFIG_DPOT_DAC=m
CONFIG_DS4424=m
CONFIG_M62332=y
CONFIG_MAX517=y
# CONFIG_MAX5821 is not set
CONFIG_MCP4725=y
CONFIG_TI_DAC5571=y
# end of Digital to analog converters

#
# IIO dummy driver
#
CONFIG_IIO_SIMPLE_DUMMY=m
# CONFIG_IIO_SIMPLE_DUMMY_EVENTS is not set
CONFIG_IIO_SIMPLE_DUMMY_BUFFER=y
# end of IIO dummy driver

#
# Filters
#
# end of Filters

#
# Frequency Synthesizers DDS/PLL
#

#
# Clock Generator/Distribution
#
# end of Clock Generator/Distribution

#
# Phase-Locked Loop (PLL) frequency synthesizers
#
# end of Phase-Locked Loop (PLL) frequency synthesizers
# end of Frequency Synthesizers DDS/PLL

#
# Digital gyroscope sensors
#
# CONFIG_BMG160 is not set
# CONFIG_FXAS21002C is not set
CONFIG_MPU3050=y
CONFIG_MPU3050_I2C=y
# CONFIG_IIO_ST_GYRO_3AXIS is not set
# CONFIG_ITG3200 is not set
# end of Digital gyroscope sensors

#
# Health Sensors
#

#
# Heart Rate Monitors
#
CONFIG_AFE4404=m
# CONFIG_MAX30100 is not set
CONFIG_MAX30102=y
# end of Heart Rate Monitors
# end of Health Sensors

#
# Humidity sensors
#
CONFIG_AM2315=y
CONFIG_DHT11=m
CONFIG_HDC100X=y
CONFIG_HDC2010=m
# CONFIG_HTS221 is not set
CONFIG_HTU21=y
# CONFIG_SI7005 is not set
CONFIG_SI7020=m
# end of Humidity sensors

#
# Inertial measurement units
#
CONFIG_BMI160=m
CONFIG_BMI160_I2C=m
CONFIG_FXOS8700=m
CONFIG_FXOS8700_I2C=m
# CONFIG_KMX61 is not set
CONFIG_INV_ICM42600=m
CONFIG_INV_ICM42600_I2C=m
CONFIG_INV_MPU6050_IIO=m
CONFIG_INV_MPU6050_I2C=m
CONFIG_IIO_ST_LSM6DSX=y
CONFIG_IIO_ST_LSM6DSX_I2C=y
CONFIG_IIO_ST_LSM6DSX_I3C=y
# CONFIG_IIO_ST_LSM9DS0 is not set
# end of Inertial measurement units

#
# Light sensors
#
CONFIG_ADJD_S311=m
# CONFIG_ADUX1020 is not set
CONFIG_AL3010=m
CONFIG_AL3320A=m
# CONFIG_APDS9300 is not set
CONFIG_APDS9960=y
CONFIG_AS73211=m
# CONFIG_BH1750 is not set
# CONFIG_BH1780 is not set
CONFIG_CM32181=m
# CONFIG_CM3232 is not set
CONFIG_CM3323=y
CONFIG_CM3605=m
CONFIG_CM36651=m
CONFIG_GP2AP002=m
CONFIG_GP2AP020A00F=y
CONFIG_SENSORS_ISL29018=m
CONFIG_SENSORS_ISL29028=y
CONFIG_ISL29125=m
# CONFIG_JSA1212 is not set
CONFIG_RPR0521=m
CONFIG_LTR501=m
# CONFIG_LV0104CS is not set
# CONFIG_MAX44000 is not set
# CONFIG_MAX44009 is not set
CONFIG_NOA1305=m
CONFIG_OPT3001=m
CONFIG_PA12203001=y
CONFIG_SI1133=m
CONFIG_SI1145=y
# CONFIG_STK3310 is not set
CONFIG_ST_UVIS25=m
CONFIG_ST_UVIS25_I2C=m
CONFIG_TCS3414=m
CONFIG_TCS3472=y
CONFIG_SENSORS_TSL2563=y
CONFIG_TSL2583=y
# CONFIG_TSL2591 is not set
# CONFIG_TSL2772 is not set
# CONFIG_TSL4531 is not set
CONFIG_US5182D=y
# CONFIG_VCNL4000 is not set
CONFIG_VCNL4035=m
CONFIG_VEML6030=y
CONFIG_VEML6070=m
# CONFIG_VL6180 is not set
CONFIG_ZOPT2201=y
# end of Light sensors

#
# Magnetometer sensors
#
CONFIG_AK8974=y
# CONFIG_AK8975 is not set
# CONFIG_AK09911 is not set
CONFIG_BMC150_MAGN=y
CONFIG_BMC150_MAGN_I2C=y
CONFIG_MAG3110=m
CONFIG_MMC35240=m
# CONFIG_IIO_ST_MAGN_3AXIS is not set
# CONFIG_SENSORS_HMC5843_I2C is not set
CONFIG_SENSORS_RM3100=m
CONFIG_SENSORS_RM3100_I2C=m
# CONFIG_YAMAHA_YAS530 is not set
# end of Magnetometer sensors

#
# Multiplexers
#
CONFIG_IIO_MUX=m
# end of Multiplexers

#
# Inclinometer sensors
#
# end of Inclinometer sensors

#
# Triggers - standalone
#
# CONFIG_IIO_INTERRUPT_TRIGGER is not set
CONFIG_IIO_SYSFS_TRIGGER=y
# end of Triggers - standalone

#
# Linear and angular position sensors
#
# end of Linear and angular position sensors

#
# Digital potentiometers
#
# CONFIG_AD5110 is not set
CONFIG_AD5272=y
CONFIG_DS1803=m
CONFIG_MAX5432=m
CONFIG_MCP4018=y
# CONFIG_MCP4531 is not set
CONFIG_TPL0102=y
# end of Digital potentiometers

#
# Digital potentiostats
#
CONFIG_LMP91000=m
# end of Digital potentiostats

#
# Pressure sensors
#
CONFIG_ABP060MG=y
CONFIG_BMP280=m
CONFIG_BMP280_I2C=m
# CONFIG_DLHL60D is not set
CONFIG_DPS310=y
CONFIG_HP03=y
CONFIG_ICP10100=y
# CONFIG_MPL115_I2C is not set
CONFIG_MPL3115=m
# CONFIG_MS5611 is not set
CONFIG_MS5637=y
CONFIG_IIO_ST_PRESS=m
CONFIG_IIO_ST_PRESS_I2C=m
# CONFIG_T5403 is not set
CONFIG_HP206C=m
# CONFIG_ZPA2326 is not set
# end of Pressure sensors

#
# Lightning sensors
#
# end of Lightning sensors

#
# Proximity and distance sensors
#
CONFIG_ISL29501=m
# CONFIG_LIDAR_LITE_V2 is not set
# CONFIG_MB1232 is not set
# CONFIG_PING is not set
CONFIG_RFD77402=m
CONFIG_SRF04=y
CONFIG_SX_COMMON=y
CONFIG_SX9310=y
# CONFIG_SX9324 is not set
# CONFIG_SX9360 is not set
# CONFIG_SX9500 is not set
CONFIG_SRF08=y
# CONFIG_VCNL3020 is not set
# CONFIG_VL53L0X_I2C is not set
# end of Proximity and distance sensors

#
# Resolver to digital converters
#
# end of Resolver to digital converters

#
# Temperature sensors
#
CONFIG_MLX90614=m
CONFIG_MLX90632=y
# CONFIG_TMP006 is not set
CONFIG_TMP007=m
# CONFIG_TMP117 is not set
# CONFIG_TSYS01 is not set
# CONFIG_TSYS02D is not set
# end of Temperature sensors

# CONFIG_PWM is not set

#
# IRQ chip support
#
CONFIG_IRQCHIP=y
# CONFIG_AL_FIC is not set
# end of IRQ chip support

# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
CONFIG_GENERIC_PHY=y
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# end of PHY drivers for Broadcom platforms
# end of PHY Subsystem

CONFIG_POWERCAP=y
# CONFIG_DTPM is not set

#
# Performance monitor support
#
# end of Performance monitor support

# CONFIG_RAS is not set

#
# Android
#
CONFIG_ANDROID=y
CONFIG_ANDROID_BINDER_IPC=y
CONFIG_ANDROID_BINDERFS=y
CONFIG_ANDROID_BINDER_DEVICES="binder,hwbinder,vndbinder"
CONFIG_ANDROID_BINDER_IPC_SELFTEST=y
# end of Android

CONFIG_DAX=y
CONFIG_DEV_DAX=y
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y
CONFIG_NVMEM_SPMI_SDAM=m

#
# HW tracing support
#
# CONFIG_STM is not set
# end of HW tracing support

CONFIG_FPGA=m
# CONFIG_ALTERA_PR_IP_CORE is not set
CONFIG_FPGA_BRIDGE=m
# CONFIG_FPGA_REGION is not set
CONFIG_FSI=m
# CONFIG_FSI_NEW_DEV_NODE is not set
# CONFIG_FSI_MASTER_GPIO is not set
CONFIG_FSI_MASTER_HUB=m
# CONFIG_FSI_SCOM is not set
CONFIG_MULTIPLEXER=m

#
# Multiplexer drivers
#
CONFIG_MUX_ADG792A=m
CONFIG_MUX_GPIO=m
# CONFIG_MUX_MMIO is not set
# end of Multiplexer drivers

# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
CONFIG_INTERCONNECT=y
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
CONFIG_VALIDATE_FS_PARSER=y
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_POSIX_ACL=y
# CONFIG_EXT3_FS_SECURITY is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_USE_FOR_EXT2 is not set
CONFIG_EXT4_FS_POSIX_ACL=y
# CONFIG_EXT4_FS_SECURITY is not set
CONFIG_EXT4_DEBUG=y
# CONFIG_EXT4_KUNIT_TESTS is not set
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
CONFIG_REISERFS_CHECK=y
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_REISERFS_FS_XATTR is not set
# CONFIG_JFS_FS is not set
CONFIG_XFS_FS=y
# CONFIG_XFS_SUPPORT_V4 is not set
CONFIG_XFS_QUOTA=y
# CONFIG_XFS_POSIX_ACL is not set
# CONFIG_XFS_RT is not set
# CONFIG_XFS_ONLINE_SCRUB is not set
# CONFIG_XFS_WARN is not set
# CONFIG_XFS_DEBUG is not set
CONFIG_GFS2_FS=m
CONFIG_BTRFS_FS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_ASSERT=y
CONFIG_BTRFS_FS_REF_VERIFY=y
CONFIG_NILFS2_FS=m
CONFIG_F2FS_FS=m
# CONFIG_F2FS_STAT_FS is not set
CONFIG_F2FS_FS_XATTR=y
# CONFIG_F2FS_FS_POSIX_ACL is not set
CONFIG_F2FS_FS_SECURITY=y
CONFIG_F2FS_CHECK_FS=y
# CONFIG_F2FS_FAULT_INJECTION is not set
CONFIG_F2FS_FS_COMPRESSION=y
# CONFIG_F2FS_FS_LZO is not set
CONFIG_F2FS_FS_LZ4=y
CONFIG_F2FS_FS_LZ4HC=y
# CONFIG_F2FS_FS_ZSTD is not set
CONFIG_F2FS_IOSTAT=y
CONFIG_ZONEFS_FS=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
# CONFIG_EXPORTFS_BLOCK_OPS is not set
CONFIG_FILE_LOCKING=y
# CONFIG_FS_ENCRYPTION is not set
CONFIG_FS_VERITY=y
# CONFIG_FS_VERITY_DEBUG is not set
CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
# CONFIG_INOTIFY_USER is not set
# CONFIG_FANOTIFY is not set
# CONFIG_QUOTA is not set
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS4_FS is not set
CONFIG_AUTOFS_FS=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_VIRTIO_FS=m
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
# CONFIG_OVERLAY_FS_INDEX is not set
CONFIG_OVERLAY_FS_XINO_AUTO=y
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_NETFS_SUPPORT=m
# CONFIG_NETFS_STATS is not set
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
CONFIG_FSCACHE_DEBUG=y
# CONFIG_CACHEFILES is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=y
# CONFIG_MSDOS_FS is not set
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
CONFIG_FAT_DEFAULT_UTF8=y
# CONFIG_FAT_KUNIT_TEST is not set
CONFIG_EXFAT_FS=m
CONFIG_EXFAT_DEFAULT_IOCHARSET="utf8"
CONFIG_NTFS_FS=y
# CONFIG_NTFS_DEBUG is not set
CONFIG_NTFS_RW=y
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
# CONFIG_PROC_VMCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROC_CHILDREN is not set
CONFIG_KERNFS=y
CONFIG_SYSFS=y
# CONFIG_TMPFS is not set
CONFIG_ARCH_SUPPORTS_HUGETLBFS=y
# CONFIG_HUGETLBFS is not set
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=y
# end of Pseudo filesystems

# CONFIG_MISC_FILESYSTEMS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
CONFIG_NLS_CODEPAGE_737=y
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_CODEPAGE_852=y
# CONFIG_NLS_CODEPAGE_855 is not set
CONFIG_NLS_CODEPAGE_857=m
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
CONFIG_NLS_CODEPAGE_864=y
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=y
# CONFIG_NLS_CODEPAGE_869 is not set
CONFIG_NLS_CODEPAGE_936=y
CONFIG_NLS_CODEPAGE_950=y
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=y
# CONFIG_NLS_ISO8859_8 is not set
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=y
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
# CONFIG_NLS_ISO8859_13 is not set
CONFIG_NLS_ISO8859_14=y
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=y
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
# CONFIG_NLS_MAC_CELTIC is not set
CONFIG_NLS_MAC_CENTEURO=m
CONFIG_NLS_MAC_CROATIAN=y
CONFIG_NLS_MAC_CYRILLIC=m
# CONFIG_NLS_MAC_GAELIC is not set
CONFIG_NLS_MAC_GREEK=m
CONFIG_NLS_MAC_ICELAND=y
CONFIG_NLS_MAC_INUIT=y
CONFIG_NLS_MAC_ROMANIAN=m
# CONFIG_NLS_MAC_TURKISH is not set
CONFIG_NLS_UTF8=y
CONFIG_UNICODE=y
CONFIG_UNICODE_NORMALIZATION_SELFTEST=m
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
# CONFIG_TRUSTED_KEYS is not set
# CONFIG_ENCRYPTED_KEYS is not set
CONFIG_KEY_DH_OPERATIONS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_PATH=y
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_HARDENED_USERCOPY=y
# CONFIG_FORTIFY_SOURCE is not set
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH="/sbin/usermode-helper"
CONFIG_SECURITY_TOMOYO=y
CONFIG_SECURITY_TOMOYO_MAX_ACCEPT_ENTRY=2048
CONFIG_SECURITY_TOMOYO_MAX_AUDIT_LOG=1024
# CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER is not set
CONFIG_SECURITY_TOMOYO_POLICY_LOADER="/sbin/tomoyo-init"
CONFIG_SECURITY_TOMOYO_ACTIVATION_TRIGGER="/sbin/init"
# CONFIG_SECURITY_TOMOYO_INSECURE_BUILTIN_SETTING is not set
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_APPARMOR_HASH=y
# CONFIG_SECURITY_APPARMOR_HASH_DEFAULT is not set
# CONFIG_SECURITY_APPARMOR_DEBUG is not set
CONFIG_SECURITY_LOADPIN=y
CONFIG_SECURITY_LOADPIN_ENFORCE=y
# CONFIG_SECURITY_YAMA is not set
# CONFIG_SECURITY_SAFESETID is not set
CONFIG_SECURITY_LOCKDOWN_LSM=y
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_LOCK_DOWN_KERNEL_FORCE_NONE=y
# CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY is not set
# CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY is not set
# CONFIG_SECURITY_LANDLOCK is not set
# CONFIG_INTEGRITY is not set
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
CONFIG_DEFAULT_SECURITY_TOMOYO=y
# CONFIG_DEFAULT_SECURITY_APPARMOR is not set
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
# CONFIG_INIT_STACK_NONE is not set
CONFIG_INIT_STACK_ALL_PATTERN=y
# CONFIG_INIT_STACK_ALL_ZERO is not set
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
CONFIG_INIT_ON_FREE_DEFAULT_ON=y
# end of Memory initialization

CONFIG_CC_HAS_RANDSTRUCT=y
CONFIG_RANDSTRUCT_NONE=y
# CONFIG_RANDSTRUCT_FULL is not set
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=y
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=y
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=y
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_USER is not set
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
# CONFIG_CRYPTO_MANAGER_EXTRA_TESTS is not set
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=y
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=y
# CONFIG_CRYPTO_TEST is not set

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=y
# CONFIG_CRYPTO_DH_RFC7919_GROUPS is not set
# CONFIG_CRYPTO_ECDH is not set
# CONFIG_CRYPTO_ECDSA is not set
# CONFIG_CRYPTO_ECRDSA is not set
CONFIG_CRYPTO_SM2=y
CONFIG_CRYPTO_CURVE25519=m

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=y
# CONFIG_CRYPTO_GCM is not set
CONFIG_CRYPTO_CHACHA20POLY1305=y
CONFIG_CRYPTO_AEGIS128=y
CONFIG_CRYPTO_SEQIV=y
# CONFIG_CRYPTO_ECHAINIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=m
# CONFIG_CRYPTO_LRW is not set
CONFIG_CRYPTO_OFB=y
# CONFIG_CRYPTO_PCBC is not set
CONFIG_CRYPTO_XTS=m
CONFIG_CRYPTO_KEYWRAP=y
# CONFIG_CRYPTO_ADIANTUM is not set
CONFIG_CRYPTO_ESSIV=y

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=y
CONFIG_CRYPTO_XXHASH=y
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_BLAKE2S=m
CONFIG_CRYPTO_CRCT10DIF=m
CONFIG_CRYPTO_CRC64_ROCKSOFT=m
CONFIG_CRYPTO_GHASH=m
CONFIG_CRYPTO_POLY1305=y
CONFIG_CRYPTO_MD4=m
# CONFIG_CRYPTO_MD5 is not set
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD160=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
# CONFIG_CRYPTO_SHA3 is not set
CONFIG_CRYPTO_SM3=y
# CONFIG_CRYPTO_SM3_GENERIC is not set
CONFIG_CRYPTO_STREEBOG=y
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=y
CONFIG_CRYPTO_CAST5=y
CONFIG_CRYPTO_CAST6=y
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_CHACHA20=y
CONFIG_CRYPTO_SERPENT=m
# CONFIG_CRYPTO_SM4_GENERIC is not set
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_TWOFISH_COMMON=y

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
# CONFIG_CRYPTO_LZO is not set
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=y
CONFIG_CRYPTO_ZSTD=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_KDF800108_CTR=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=m
CONFIG_CRYPTO_USER_API_RNG=m
# CONFIG_CRYPTO_USER_API_RNG_CAVP is not set
# CONFIG_CRYPTO_USER_API_AEAD is not set
# CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE is not set
CONFIG_CRYPTO_HASH_INFO=y
# CONFIG_CRYPTO_HW is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
CONFIG_PKCS8_PRIVATE_KEY_PARSER=m
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
CONFIG_SIGNED_PE_FILE_VERIFICATION=y

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_MODULE_SIG_KEY_TYPE_RSA=y
# CONFIG_MODULE_SIG_KEY_TYPE_ECDSA is not set
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
CONFIG_SECONDARY_TRUSTED_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
CONFIG_SYSTEM_REVOCATION_LIST=y
CONFIG_SYSTEM_REVOCATION_KEYS=""
# CONFIG_SYSTEM_BLACKLIST_AUTH_UPDATE is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
# CONFIG_RAID6_PQ_BENCHMARK is not set
CONFIG_LINEAR_RANGES=y
CONFIG_PACKING=y
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_CORDIC=y
CONFIG_PRIME_NUMBERS=m
CONFIG_RATIONAL=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=y
# CONFIG_CRYPTO_LIB_CHACHA is not set
CONFIG_CRYPTO_LIB_CURVE25519_GENERIC=m
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_DES=y
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=y
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=m
CONFIG_CRC64_ROCKSOFT=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
# CONFIG_CRC32_SLICEBY8 is not set
# CONFIG_CRC32_SLICEBY4 is not set
CONFIG_CRC32_SARWATE=y
# CONFIG_CRC32_BIT is not set
CONFIG_CRC64=y
CONFIG_CRC4=m
CONFIG_CRC7=y
CONFIG_LIBCRC32C=y
CONFIG_CRC8=y
CONFIG_XXHASH=y
CONFIG_RANDOM32_SELFTEST=y
CONFIG_842_COMPRESS=m
CONFIG_842_DECOMPRESS=m
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
# CONFIG_ZLIB_DFLTCC is not set
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=y
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
# CONFIG_XZ_DEC_MICROLZMA is not set
CONFIG_XZ_DEC_BCJ=y
CONFIG_XZ_DEC_TEST=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_DEC8=y
CONFIG_REED_SOLOMON_ENC16=y
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_DMA=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DMA_DECLARE_COHERENT=y
CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED=y
CONFIG_SWIOTLB=y
# CONFIG_DMA_RESTRICTED_POOL is not set
CONFIG_DMA_CMA=y
# CONFIG_DMA_PERNUMA_CMA is not set

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=16
CONFIG_CMA_SIZE_PERCENTAGE=10
# CONFIG_CMA_SIZE_SEL_MBYTES is not set
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
CONFIG_CMA_SIZE_SEL_MAX=y
CONFIG_CMA_ALIGNMENT=8
CONFIG_DMA_API_DEBUG=y
# CONFIG_DMA_API_DEBUG_SG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
CONFIG_GLOB_SELFTEST=m
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_LIBFDT=y
CONFIG_OID_REGISTRY=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_SG_POOL=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_STACKDEPOT_ALWAYS_INIT=y
CONFIG_STACK_HASH_ORDER=20
CONFIG_SBITMAP=y
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
# CONFIG_PRINTK_TIME is not set
CONFIG_PRINTK_CALLER=y
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_DYNAMIC_DEBUG is not set
CONFIG_DYNAMIC_DEBUG_CORE=y
# CONFIG_SYMBOLIC_ERRNAME is not set
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO_NONE=y
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
CONFIG_FRAME_WARN=8192
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_HEADERS_INSTALL is not set
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
# CONFIG_MAGIC_SYSRQ_SERIAL is not set
CONFIG_DEBUG_FS=y
# CONFIG_DEBUG_FS_ALLOW_ALL is not set
CONFIG_DEBUG_FS_DISALLOW_MOUNT=y
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
CONFIG_UBSAN=y
# CONFIG_UBSAN_TRAP is not set
CONFIG_CC_HAS_UBSAN_BOUNDS=y
CONFIG_CC_HAS_UBSAN_ARRAY_BOUNDS=y
CONFIG_UBSAN_BOUNDS=y
CONFIG_UBSAN_ARRAY_BOUNDS=y
CONFIG_UBSAN_SHIFT=y
# CONFIG_UBSAN_DIV_ZERO is not set
# CONFIG_UBSAN_UNREACHABLE is not set
CONFIG_UBSAN_BOOL=y
CONFIG_UBSAN_ENUM=y
# CONFIG_UBSAN_ALIGNMENT is not set
# CONFIG_UBSAN_SANITIZE_ALL is not set
CONFIG_TEST_UBSAN=m
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_HAVE_KCSAN_COMPILER=y
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# CONFIG_NET_DEV_REFCNT_TRACKER is not set
# CONFIG_NET_NS_REFCNT_TRACKER is not set
# CONFIG_DEBUG_NET is not set
# end of Networking Debugging

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
# CONFIG_PAGE_OWNER is not set
CONFIG_PAGE_POISONING=y
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_PTDUMP_DEBUGFS is not set
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y
CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
CONFIG_DEBUG_VM_RB=y
CONFIG_DEBUG_VM_PGFLAGS=y
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
# CONFIG_KASAN_STACK is not set
# CONFIG_KASAN_VMALLOC is not set
CONFIG_KASAN_KUNIT_TEST=m
# CONFIG_KASAN_MODULE_TEST is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

# CONFIG_DEBUG_SHIRQ is not set

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
# CONFIG_DETECT_HUNG_TASK is not set
CONFIG_WQ_WATCHDOG=y
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
# CONFIG_SCHEDSTATS is not set
# end of Scheduler Debugging

CONFIG_DEBUG_TIMEKEEPING=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
CONFIG_PROVE_RAW_LOCK_NESTING=y
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
CONFIG_DEBUG_LOCKDEP=y
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_LOCK_TORTURE_TEST=y
CONFIG_WW_MUTEX_SELFTEST=m
CONFIG_SCF_TORTURE_TEST=m
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
# CONFIG_DEBUG_LIST is not set
CONFIG_DEBUG_PLIST=y
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# end of Debug kernel data structures

CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_TORTURE_TEST=y
CONFIG_RCU_SCALE_TEST=y
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20
CONFIG_RCU_TRACE=y
CONFIG_RCU_EQS_DEBUG=y
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
# CONFIG_LATENCYTOP is not set
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_NOP_MCOUNT=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
# CONFIG_STRICT_DEVMEM is not set

#
# s390 Debugging
#
CONFIG_EARLY_PRINTK=y
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_CIO_INJECT is not set
# end of s390 Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=m
# CONFIG_KUNIT_DEBUGFS is not set
CONFIG_KUNIT_TEST=m
CONFIG_KUNIT_EXAMPLE_TEST=m
# CONFIG_KUNIT_ALL_TESTS is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
CONFIG_FAULT_INJECTION=y
CONFIG_FAILSLAB=y
CONFIG_FAIL_PAGE_ALLOC=y
CONFIG_FAULT_INJECTION_USERCOPY=y
# CONFIG_FAIL_MAKE_REQUEST is not set
CONFIG_FAIL_IO_TIMEOUT=y
CONFIG_FAIL_FUTEX=y
# CONFIG_FAULT_INJECTION_DEBUG_FS is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
CONFIG_KCOV=y
CONFIG_KCOV_ENABLE_COMPARISONS=y
# CONFIG_KCOV_INSTRUMENT_ALL is not set
CONFIG_KCOV_IRQ_AREA_SIZE=0x40000
CONFIG_RUNTIME_TESTING_MENU=y
CONFIG_LKDTM=y
CONFIG_TEST_LIST_SORT=m
CONFIG_TEST_MIN_HEAP=m
CONFIG_TEST_SORT=m
# CONFIG_TEST_DIV64 is not set
CONFIG_KPROBES_SANITY_TEST=m
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_TEST_REF_TRACKER is not set
CONFIG_RBTREE_TEST=y
CONFIG_REED_SOLOMON_TEST=m
CONFIG_INTERVAL_TREE_TEST=m
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
CONFIG_ASYNC_RAID6_TEST=m
CONFIG_TEST_HEXDUMP=m
CONFIG_STRING_SELFTEST=y
# CONFIG_TEST_STRING_HELPERS is not set
CONFIG_TEST_STRSCPY=y
CONFIG_TEST_KSTRTOX=y
CONFIG_TEST_PRINTF=y
# CONFIG_TEST_SCANF is not set
CONFIG_TEST_BITMAP=y
CONFIG_TEST_UUID=m
CONFIG_TEST_XARRAY=y
CONFIG_TEST_RHASHTABLE=y
# CONFIG_TEST_SIPHASH is not set
CONFIG_TEST_IDA=m
CONFIG_TEST_LKM=m
CONFIG_TEST_BITOPS=m
CONFIG_TEST_VMALLOC=m
CONFIG_TEST_USER_COPY=m
CONFIG_TEST_BPF=m
CONFIG_TEST_BLACKHOLE_DEV=m
CONFIG_FIND_BIT_BENCHMARK=y
CONFIG_TEST_FIRMWARE=y
CONFIG_TEST_SYSCTL=m
# CONFIG_BITFIELD_KUNIT is not set
# CONFIG_HASH_KUNIT_TEST is not set
# CONFIG_RESOURCE_KUNIT_TEST is not set
CONFIG_SYSCTL_KUNIT_TEST=m
CONFIG_LIST_KUNIT_TEST=m
CONFIG_LINEAR_RANGES_TEST=m
# CONFIG_CMDLINE_KUNIT_TEST is not set
CONFIG_BITS_TEST=m
# CONFIG_SLUB_KUNIT_TEST is not set
# CONFIG_RATIONAL_KUNIT_TEST is not set
# CONFIG_MEMCPY_KUNIT_TEST is not set
# CONFIG_OVERFLOW_KUNIT_TEST is not set
# CONFIG_STACKINIT_KUNIT_TEST is not set
CONFIG_TEST_UDELAY=y
CONFIG_TEST_STATIC_KEYS=m
CONFIG_TEST_MEMCAT_P=m
CONFIG_TEST_MEMINIT=y
CONFIG_TEST_FREE_PAGES=y
# end of Kernel Testing and Coverage
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
  2022-06-24 17:36 ` [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
  2022-06-24 18:51   ` Mina Almasry
@ 2022-06-27 12:08   ` manish.mishra
  2022-06-28 15:35     ` James Houghton
  2022-06-27 18:42   ` Mike Kravetz
  2 siblings, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-06-27 12:08 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> When using HugeTLB high-granularity mapping, we need to go through the
> supported hugepage sizes in decreasing order so that we pick the largest
> size that works. Consider the case where we're faulting in a 1G hugepage
> for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> a PUD. By going through the sizes in decreasing order, we will find that
> PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++---
>   1 file changed, 37 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a57e1be41401..5df838d86f32 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -33,6 +33,7 @@
>   #include <linux/migrate.h>
>   #include <linux/nospec.h>
>   #include <linux/delayacct.h>
> +#include <linux/sort.h>
>   
>   #include <asm/page.h>
>   #include <asm/pgalloc.h>
> @@ -48,6 +49,10 @@
>   
>   int hugetlb_max_hstate __read_mostly;
>   unsigned int default_hstate_idx;
> +/*
> + * After hugetlb_init_hstates is called, hstates will be sorted from largest
> + * to smallest.
> + */
>   struct hstate hstates[HUGE_MAX_HSTATE];
>   
>   #ifdef CONFIG_CMA
> @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
>   	kfree(node_alloc_noretry);
>   }
>   
> +static int compare_hstates_decreasing(const void *a, const void *b)
> +{
> +	const int shift_a = huge_page_shift((const struct hstate *)a);
> +	const int shift_b = huge_page_shift((const struct hstate *)b);
> +
> +	if (shift_a < shift_b)
> +		return 1;
> +	if (shift_a > shift_b)
> +		return -1;
> +	return 0;
> +}
> +
> +static void sort_hstates(void)
> +{
> +	unsigned long default_hstate_sz = huge_page_size(&default_hstate);
> +
> +	/* Sort from largest to smallest. */
> +	sort(hstates, hugetlb_max_hstate, sizeof(*hstates),
> +	     compare_hstates_decreasing, NULL);
> +
> +	/*
> +	 * We may have changed the location of the default hstate, so we need to
> +	 * update it.
> +	 */
> +	default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz));
> +}
> +
>   static void __init hugetlb_init_hstates(void)
>   {
>   	struct hstate *h, *h2;
>   
> -	for_each_hstate(h) {
> -		if (minimum_order > huge_page_order(h))
> -			minimum_order = huge_page_order(h);
> +	sort_hstates();
>   
> +	/* The last hstate is now the smallest. */
> +	minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]);
> +
> +	for_each_hstate(h) {
>   		/* oversize hugepages were init'ed in early boot */
>   		if (!hstate_is_gigantic(h))
>   			hugetlb_hstate_alloc_pages(h);

As now hstates are ordered can code which does calculation of demot_order

can too be optimised, i mean it can be value of order of hstate at next index?



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 01/26] hugetlb: make hstate accessor functions const
       [not found]   ` <e55f90f5-ba14-5d6e-8f8f-abf731b9095e@nutanix.com>
@ 2022-06-27 12:09     ` manish.mishra
  2022-06-28 17:08       ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-06-27 12:09 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2172 bytes --]


On 27/06/22 5:06 pm, manish.mishra wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
>> This is just a const-correctness change so that the new hugetlb_pte
>> changes can be const-correct too.
>>
>> Acked-by: David Rientjes<rientjes@google.com>
>>
>> Signed-off-by: James Houghton<jthoughton@google.com>
>> ---
>>   include/linux/hugetlb.h | 12 ++++++------
>>   1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index e4cff27d1198..498a4ae3d462 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -715,7 +715,7 @@ static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
>>   	return hstate_file(vma->vm_file);
>>   }
>>   
>> -static inline unsigned long huge_page_size(struct hstate *h)
>> +static inline unsigned long huge_page_size(const struct hstate *h)
>>   {
>>   	return (unsigned long)PAGE_SIZE << h->order;
>>   }
>> @@ -729,27 +729,27 @@ static inline unsigned long huge_page_mask(struct hstate *h)
>>   	return h->mask;
>>   }
>>   
>> -static inline unsigned int huge_page_order(struct hstate *h)
>> +static inline unsigned int huge_page_order(const struct hstate *h)
>>   {
>>   	return h->order;
>>   }
>>   
>> -static inline unsigned huge_page_shift(struct hstate *h)
>> +static inline unsigned huge_page_shift(const struct hstate *h)
>>   {
>>   	return h->order + PAGE_SHIFT;
>>   }
>>   
>> -static inline bool hstate_is_gigantic(struct hstate *h)
>> +static inline bool hstate_is_gigantic(const struct hstate *h)
>>   {
>>   	return huge_page_order(h) >= MAX_ORDER;
>>   }
>>   
>> -static inline unsigned int pages_per_huge_page(struct hstate *h)
>> +static inline unsigned int pages_per_huge_page(const struct hstate *h)
>>   {
>>   	return 1 << h->order;
>>   }
>>   
>> -static inline unsigned int blocks_per_huge_page(struct hstate *h)
>> +static inline unsigned int blocks_per_huge_page(const struct hstate *h)
>>   {
>>   	return huge_page_size(h) / 512;
>>   }
>
> James, Just wanted to check why you did it selectively only for these functions
>
> why not for something like hstate_index which too i see used in your code.
>

[-- Attachment #2: Type: text/html, Size: 3428 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift
  2022-06-24 17:36 ` [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift James Houghton
  2022-06-24 19:01   ` Mina Almasry
@ 2022-06-27 12:13   ` manish.mishra
  1 sibling, 0 replies; 128+ messages in thread
From: manish.mishra @ 2022-06-27 12:13 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This allows us to make huge PTEs at shifts other than the hstate shift,
> which will be necessary for high-granularity mappings.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   mm/hugetlb.c | 33 ++++++++++++++++++++-------------
>   1 file changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5df838d86f32..0eec34edf3b2 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4686,23 +4686,30 @@ const struct vm_operations_struct hugetlb_vm_ops = {
>   	.pagesize = hugetlb_vm_op_pagesize,
>   };
reviewed-by: manish.mishra@nutanix.com
>   
> +static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma,
> +				      struct page *page, int writable,
> +				      int shift)
> +{
> +	bool huge = shift > PAGE_SHIFT;
> +	pte_t entry = huge ? mk_huge_pte(page, vma->vm_page_prot)
> +			   : mk_pte(page, vma->vm_page_prot);
> +
> +	if (writable)
> +		entry = huge ? huge_pte_mkwrite(entry) : pte_mkwrite(entry);
> +	else
> +		entry = huge ? huge_pte_wrprotect(entry) : pte_wrprotect(entry);
> +	pte_mkyoung(entry);
> +	if (huge)
> +		entry = arch_make_huge_pte(entry, shift, vma->vm_flags);
> +	return entry;
> +}
> +
>   static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
> -				int writable)
> +			   int writable)
>   {
> -	pte_t entry;
>   	unsigned int shift = huge_page_shift(hstate_vma(vma));
>   
> -	if (writable) {
> -		entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page,
> -					 vma->vm_page_prot)));
> -	} else {
> -		entry = huge_pte_wrprotect(mk_huge_pte(page,
> -					   vma->vm_page_prot));
> -	}
> -	entry = pte_mkyoung(entry);
> -	entry = arch_make_huge_pte(entry, shift, vma->vm_flags);
> -
> -	return entry;
> +	return make_huge_pte_with_shift(vma, page, writable, shift);
>   }
>   
>   static void set_huge_ptep_writable(struct vm_area_struct *vma,

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-24 17:36 ` [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
@ 2022-06-27 12:26   ` manish.mishra
  2022-06-27 20:51   ` Mike Kravetz
  1 sibling, 0 replies; 128+ messages in thread
From: manish.mishra @ 2022-06-27 12:26 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This is needed to handle PTL locking with high-granularity mapping. We
> won't always be using the PMD-level PTL even if we're using the 2M
> hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> case, we need to lock the PTL for the 4K PTE.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   arch/powerpc/mm/pgtable.c |  3 ++-
>   include/linux/hugetlb.h   | 19 ++++++++++++++-----
>   mm/hugetlb.c              |  9 +++++----
>   mm/migrate.c              |  3 ++-
>   mm/page_vma_mapped.c      |  3 ++-
>   5 files changed, 25 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index e6166b71d36d..663d591a8f08 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -261,7 +261,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>   
>   		psize = hstate_get_psize(h);
>   #ifdef CONFIG_DEBUG_VM
> -		assert_spin_locked(huge_pte_lockptr(h, vma->vm_mm, ptep));
> +		assert_spin_locked(huge_pte_lockptr(huge_page_shift(h),
> +						    vma->vm_mm, ptep));
>   #endif
>   
>   #else
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 498a4ae3d462..5fe1db46d8c9 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -868,12 +868,11 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
>   	return modified_mask;
>   }
>   
> -static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> +static inline spinlock_t *huge_pte_lockptr(unsigned int shift,
>   					   struct mm_struct *mm, pte_t *pte)
>   {
> -	if (huge_page_size(h) == PMD_SIZE)
> +	if (shift == PMD_SHIFT)
>   		return pmd_lockptr(mm, (pmd_t *) pte);
> -	VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);

I may have wrong understanding here, is per pmd lock for reducing

contention, if that is the case should be take per pmd lock of

PAGE_SIZE too and page_table_lock for anything higer than PMD.

>   	return &mm->page_table_lock;
>   }
>   
> @@ -1076,7 +1075,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
>   	return 0;
>   }
>   
> -static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> +static inline spinlock_t *huge_pte_lockptr(unsigned int shift,
>   					   struct mm_struct *mm, pte_t *pte)
>   {
>   	return &mm->page_table_lock;
> @@ -1116,7 +1115,17 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
>   {
>   	spinlock_t *ptl;
>   
> -	ptl = huge_pte_lockptr(h, mm, pte);
> +	ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte);
> +	spin_lock(ptl);
> +	return ptl;
> +}
> +
> +static inline spinlock_t *huge_pte_lock_shift(unsigned int shift,
> +					      struct mm_struct *mm, pte_t *pte)
> +{
> +	spinlock_t *ptl;
> +
> +	ptl = huge_pte_lockptr(shift, mm, pte);
>   	spin_lock(ptl);
>   	return ptl;
>   }
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 0eec34edf3b2..d6d0d4c03def 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4817,7 +4817,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   			continue;
>   
>   		dst_ptl = huge_pte_lock(h, dst, dst_pte);
> -		src_ptl = huge_pte_lockptr(h, src, src_pte);
> +		src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte);
>   		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
>   		entry = huge_ptep_get(src_pte);
>   		dst_entry = huge_ptep_get(dst_pte);
> @@ -4894,7 +4894,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   
>   				/* Install the new huge page if src pte stable */
>   				dst_ptl = huge_pte_lock(h, dst, dst_pte);
> -				src_ptl = huge_pte_lockptr(h, src, src_pte);
> +				src_ptl = huge_pte_lockptr(huge_page_shift(h),
> +							   src, src_pte);
>   				spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
>   				entry = huge_ptep_get(src_pte);
>   				if (!pte_same(src_pte_old, entry)) {
> @@ -4948,7 +4949,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr,
>   	pte_t pte;
>   
>   	dst_ptl = huge_pte_lock(h, mm, dst_pte);
> -	src_ptl = huge_pte_lockptr(h, mm, src_pte);
> +	src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte);
>   
>   	/*
>   	 * We don't have to worry about the ordering of src and dst ptlocks
> @@ -6024,7 +6025,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>   		page_in_pagecache = true;
>   	}
>   
> -	ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
> +	ptl = huge_pte_lockptr(huge_page_shift(h), dst_mm, dst_pte);
>   	spin_lock(ptl);
>   
>   	/*
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e51588e95f57..a8a960992373 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -318,7 +318,8 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>   void migration_entry_wait_huge(struct vm_area_struct *vma,
>   		struct mm_struct *mm, pte_t *pte)
>   {
> -	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
> +	spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)),
> +					   mm, pte);
>   	__migration_entry_wait(mm, pte, ptl);
>   }
>   
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index c10f839fc410..8921dd4e41b1 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -174,7 +174,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>   		if (!pvmw->pte)
>   			return false;
>   
> -		pvmw->ptl = huge_pte_lockptr(hstate, mm, pvmw->pte);
> +		pvmw->ptl = huge_pte_lockptr(huge_page_shift(hstate),
> +					     mm, pvmw->pte);
>   		spin_lock(pvmw->ptl);
>   		if (!check_pte(pvmw))
>   			return not_found(pvmw);

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
  2022-06-24 17:36 ` [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
@ 2022-06-27 12:28   ` manish.mishra
  2022-06-28 20:03     ` Mina Almasry
  0 siblings, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-06-27 12:28 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This adds the Kconfig to enable or disable high-granularity mapping. It
> is enabled by default for architectures that use
> ARCH_WANT_GENERAL_HUGETLB.
>
> There is also an arch-specific config ARCH_HAS_SPECIAL_HUGETLB_HGM which
> controls whether or not the architecture has been updated to support
> HGM if it doesn't use general HugeTLB.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
reviewed-by:manish.mishra@nutanix.com
> ---
>   fs/Kconfig | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 5976eb33535f..d76c7d812656 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -268,6 +268,13 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON
>   	  to enable optimizing vmemmap pages of HugeTLB by default. It can then
>   	  be disabled on the command line via hugetlb_free_vmemmap=off.
>   
> +config ARCH_HAS_SPECIAL_HUGETLB_HGM
> +	bool
> +
> +config HUGETLB_HIGH_GRANULARITY_MAPPING
> +	def_bool ARCH_WANT_GENERAL_HUGETLB || ARCH_HAS_SPECIAL_HUGETLB_HGM
> +	depends on HUGETLB_PAGE
> +
>   config MEMFD_CREATE
>   	def_bool TMPFS || HUGETLBFS
>   

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 06/26] mm: make free_p?d_range functions public
  2022-06-24 17:36 ` [RFC PATCH 06/26] mm: make free_p?d_range functions public James Houghton
@ 2022-06-27 12:31   ` manish.mishra
  2022-06-28 20:35   ` Mike Kravetz
  1 sibling, 0 replies; 128+ messages in thread
From: manish.mishra @ 2022-06-27 12:31 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This makes them usable for HugeTLB page table freeing operations.
> After HugeTLB high-granularity mapping, the page table for a HugeTLB VMA
> can get more complex, and these functions handle freeing page tables
> generally.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
reviewed-by: manish.mishra@nutanix.com
> ---
>   include/linux/mm.h | 7 +++++++
>   mm/memory.c        | 8 ++++----
>   2 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bc8f326be0ce..07f5da512147 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1847,6 +1847,13 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>   
>   struct mmu_notifier_range;
>   
> +void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, unsigned long addr);
> +void free_pmd_range(struct mmu_gather *tlb, pud_t *pud, unsigned long addr,
> +		unsigned long end, unsigned long floor, unsigned long ceiling);
> +void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d, unsigned long addr,
> +		unsigned long end, unsigned long floor, unsigned long ceiling);
> +void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd, unsigned long addr,
> +		unsigned long end, unsigned long floor, unsigned long ceiling);
>   void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
>   		unsigned long end, unsigned long floor, unsigned long ceiling);
>   int
> diff --git a/mm/memory.c b/mm/memory.c
> index 7a089145cad4..bb3b9b5b94fb 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -227,7 +227,7 @@ static void check_sync_rss_stat(struct task_struct *task)
>    * Note: this doesn't free the actual pages themselves. That
>    * has been handled earlier when unmapping all the memory regions.
>    */
> -static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
> +void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
>   			   unsigned long addr)
>   {
>   	pgtable_t token = pmd_pgtable(*pmd);
> @@ -236,7 +236,7 @@ static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
>   	mm_dec_nr_ptes(tlb->mm);
>   }
>   
> -static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
> +inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
>   				unsigned long addr, unsigned long end,
>   				unsigned long floor, unsigned long ceiling)
>   {
> @@ -270,7 +270,7 @@ static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
>   	mm_dec_nr_pmds(tlb->mm);
>   }
>   
> -static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
> +inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
>   				unsigned long addr, unsigned long end,
>   				unsigned long floor, unsigned long ceiling)
>   {
> @@ -304,7 +304,7 @@ static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
>   	mm_dec_nr_puds(tlb->mm);
>   }
>   
> -static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
> +inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
>   				unsigned long addr, unsigned long end,
>   				unsigned long floor, unsigned long ceiling)
>   {

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
  2022-06-25  2:59   ` kernel test robot
@ 2022-06-27 12:47   ` manish.mishra
  2022-06-29 16:28     ` James Houghton
  2022-06-28 20:25   ` Mina Almasry
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-06-27 12:47 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> After high-granularity mapping, page table entries for HugeTLB pages can
> be of any size/type. (For example, we can have a 1G page mapped with a
> mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> PTE after we have done a page table walk.
>
> Without this, we'd have to pass around the "size" of the PTE everywhere.
> We effectively did this before; it could be fetched from the hstate,
> which we pass around pretty much everywhere.
>
> This commit includes definitions for some basic helper functions that
> are used later. These helper functions wrap existing PTE
> inspection/modification functions, where the correct version is picked
> depending on if the HugeTLB PTE is actually "huge" or not. (Previously,
> all HugeTLB PTEs were "huge").
>
> For example, hugetlb_ptep_get wraps huge_ptep_get and ptep_get, where
> ptep_get is used when the HugeTLB PTE is PAGE_SIZE, and huge_ptep_get is
> used in all other cases.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   include/linux/hugetlb.h | 84 +++++++++++++++++++++++++++++++++++++++++
>   mm/hugetlb.c            | 57 ++++++++++++++++++++++++++++
>   2 files changed, 141 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 5fe1db46d8c9..1d4ec9dfdebf 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -46,6 +46,68 @@ enum {
>   	__NR_USED_SUBPAGE,
>   };
>   
> +struct hugetlb_pte {
> +	pte_t *ptep;
> +	unsigned int shift;
> +};
> +
> +static inline
> +void hugetlb_pte_init(struct hugetlb_pte *hpte)
> +{
> +	hpte->ptep = NULL;
I agree it does not matter but still will hpte->shift = 0 too be better?
> +}
> +
> +static inline
> +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep,
> +			  unsigned int shift)
> +{
> +	BUG_ON(!ptep);
> +	hpte->ptep = ptep;
> +	hpte->shift = shift;
> +}
> +
> +static inline
> +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte)
> +{
> +	BUG_ON(!hpte->ptep);
> +	return 1UL << hpte->shift;
> +}
> +
> +static inline
> +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte)
> +{
> +	BUG_ON(!hpte->ptep);
> +	return ~(hugetlb_pte_size(hpte) - 1);
> +}
> +
> +static inline
> +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte)
> +{
> +	BUG_ON(!hpte->ptep);
> +	return hpte->shift;
> +}
> +
> +static inline
> +bool hugetlb_pte_huge(const struct hugetlb_pte *hpte)
> +{
> +	return !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) ||
> +		hugetlb_pte_shift(hpte) > PAGE_SHIFT;
> +}
> +
> +static inline
> +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
> +{
> +	dest->ptep = src->ptep;
> +	dest->shift = src->shift;
> +}
> +
> +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte);
> +bool hugetlb_pte_none(const struct hugetlb_pte *hpte);
> +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
> +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
> +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> +		       unsigned long address);
> +
>   struct hugepage_subpool {
>   	spinlock_t lock;
>   	long count;
> @@ -1130,6 +1192,28 @@ static inline spinlock_t *huge_pte_lock_shift(unsigned int shift,
>   	return ptl;
>   }
>   
> +static inline
> +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> +{
> +
> +	BUG_ON(!hpte->ptep);
> +	// Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
> +	// the regular page table lock.
> +	if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> +		return huge_pte_lockptr(hugetlb_pte_shift(hpte),
> +				mm, hpte->ptep);
> +	return &mm->page_table_lock;
> +}
> +
> +static inline
> +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
> +{
> +	spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
> +
> +	spin_lock(ptl);
> +	return ptl;
> +}
> +
>   #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
>   extern void __init hugetlb_cma_reserve(int order);
>   extern void __init hugetlb_cma_check(void);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d6d0d4c03def..1a1434e29740 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1120,6 +1120,63 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
>   	return false;
>   }
>   
> +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
> +{
> +	pgd_t pgd;
> +	p4d_t p4d;
> +	pud_t pud;
> +	pmd_t pmd;
> +
> +	BUG_ON(!hpte->ptep);
> +	if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {
> +		pgd = *(pgd_t *)hpte->ptep;

sorry did not understand in these conditions why

hugetlb_pte_size(hpte) >= PGDIR_SIZE. I mean why >= check

and not just == check?

> +		return pgd_present(pgd) && pgd_leaf(pgd);
> +	} else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
> +		p4d = *(p4d_t *)hpte->ptep;
> +		return p4d_present(p4d) && p4d_leaf(p4d);
> +	} else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
> +		pud = *(pud_t *)hpte->ptep;
> +		return pud_present(pud) && pud_leaf(pud);
> +	} else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
> +		pmd = *(pmd_t *)hpte->ptep;
> +		return pmd_present(pmd) && pmd_leaf(pmd);
> +	} else if (hugetlb_pte_size(hpte) >= PAGE_SIZE)
> +		return pte_present(*hpte->ptep);
> +	BUG();
> +}
> +
> +bool hugetlb_pte_none(const struct hugetlb_pte *hpte)
> +{
> +	if (hugetlb_pte_huge(hpte))
> +		return huge_pte_none(huge_ptep_get(hpte->ptep));
> +	return pte_none(ptep_get(hpte->ptep));
> +}
> +
> +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte)
> +{
> +	if (hugetlb_pte_huge(hpte))
> +		return huge_pte_none_mostly(huge_ptep_get(hpte->ptep));
> +	return pte_none_mostly(ptep_get(hpte->ptep));
> +}
> +
> +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte)
> +{
> +	if (hugetlb_pte_huge(hpte))
> +		return huge_ptep_get(hpte->ptep);
> +	return ptep_get(hpte->ptep);
> +}
> +
> +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> +		       unsigned long address)
> +{
> +	BUG_ON(!hpte->ptep);
> +	unsigned long sz = hugetlb_pte_size(hpte);
> +
> +	if (sz > PAGE_SIZE)
> +		return huge_pte_clear(mm, address, hpte->ptep, sz);

just for cosistency something like above?

if (hugetlb_pte_huge(hpte))
+		return huge_pte_clear
;

> +	return pte_clear(mm, address, hpte->ptep);
> +}
> +
>   static void enqueue_huge_page(struct hstate *h, struct page *page)
>   {
>   	int nid = page_to_nid(page);

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures
  2022-06-24 17:36 ` [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures James Houghton
@ 2022-06-27 12:52   ` manish.mishra
  2022-06-28 20:27   ` Mina Almasry
  1 sibling, 0 replies; 128+ messages in thread
From: manish.mishra @ 2022-06-27 12:52 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This is a helper function for freeing the bits of the page table that
> map a particular HugeTLB PTE.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   include/linux/hugetlb.h |  2 ++
>   mm/hugetlb.c            | 17 +++++++++++++++++
>   2 files changed, 19 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 1d4ec9dfdebf..33ba48fac551 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -107,6 +107,8 @@ bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
>   pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
>   void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
>   		       unsigned long address);
> +void hugetlb_free_range(struct mmu_gather *tlb, const struct hugetlb_pte *hpte,
> +			unsigned long start, unsigned long end);
>   
>   struct hugepage_subpool {
>   	spinlock_t lock;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1a1434e29740..a2d2ffa76173 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1120,6 +1120,23 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
>   	return false;
>   }
>   
> +void hugetlb_free_range(struct mmu_gather *tlb, const struct hugetlb_pte *hpte,
> +			unsigned long start, unsigned long end)
> +{
> +	unsigned long floor = start & hugetlb_pte_mask(hpte);
> +	unsigned long ceiling = floor + hugetlb_pte_size(hpte);
> +
> +	if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {

sorry again did not understand why it is >= check and not just ==, does it help

in non-x86 arches.

> +		free_p4d_range(tlb, (pgd_t *)hpte->ptep, start, end, floor, ceiling);
> +	} else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
> +		free_pud_range(tlb, (p4d_t *)hpte->ptep, start, end, floor, ceiling);
> +	} else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
> +		free_pmd_range(tlb, (pud_t *)hpte->ptep, start, end, floor, ceiling);
> +	} else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
> +		free_pte_range(tlb, (pmd_t *)hpte->ptep, start);
> +	}
> +}
> +
>   bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
>   {
>   	pgd_t pgd;

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled
  2022-06-24 17:36 ` [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled James Houghton
@ 2022-06-27 12:55   ` manish.mishra
  2022-06-28 20:33   ` Mina Almasry
  2022-09-08 18:07   ` Peter Xu
  2 siblings, 0 replies; 128+ messages in thread
From: manish.mishra @ 2022-06-27 12:55 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

reviewed-by: manish.mishra@nutanix.com

On 24/06/22 11:06 pm, James Houghton wrote:
> Currently, this is always true if the VMA is shared. In the future, it's
> possible that private mappings will get some or all HGM functionality.
>
> Signed-off-by: James Houghton<jthoughton@google.com>
> ---
>   include/linux/hugetlb.h | 10 ++++++++++
>   mm/hugetlb.c            |  8 ++++++++
>   2 files changed, 18 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 33ba48fac551..e7a6b944d0cc 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1174,6 +1174,16 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>   }
>   #endif	/* CONFIG_HUGETLB_PAGE */
>   
> +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> +/* If HugeTLB high-granularity mappings are enabled for this VMA. */
> +bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
> +#else
> +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> +{
> +	return false;
> +}
> +#endif
> +
>   static inline spinlock_t *huge_pte_lock(struct hstate *h,
>   					struct mm_struct *mm, pte_t *pte)
>   {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a2d2ffa76173..8b10b941458d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6983,6 +6983,14 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>   
>   #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>   
> +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> +bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> +{
> +	/* All shared VMAs have HGM enabled. */
> +	return vma->vm_flags & VM_SHARED;
> +}
> +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
> +
>   /*
>    * These functions are overwritable if your architecture needs its own
>    * behavior.

[-- Attachment #2: Type: text/html, Size: 2190 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift
  2022-06-24 17:36 ` [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift James Houghton
@ 2022-06-27 13:01   ` manish.mishra
  2022-06-28 21:58   ` Mina Almasry
  1 sibling, 0 replies; 128+ messages in thread
From: manish.mishra @ 2022-06-27 13:01 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This is a helper macro to loop through all the usable page sizes for a
> high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will
> loop, in descending order, through the page sizes that HugeTLB supports
> for this architecture; it always includes PAGE_SIZE.
reviewed-by:manish.mishra@nutanix.com
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   mm/hugetlb.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8b10b941458d..557b0afdb503 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6989,6 +6989,16 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
>   	/* All shared VMAs have HGM enabled. */
>   	return vma->vm_flags & VM_SHARED;
>   }
> +static unsigned int __shift_for_hstate(struct hstate *h)
> +{
> +	if (h >= &hstates[hugetlb_max_hstate])
> +		return PAGE_SHIFT;
> +	return huge_page_shift(h);
> +}
> +#define for_each_hgm_shift(hstate, tmp_h, shift) \
> +	for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
> +			       (tmp_h) <= &hstates[hugetlb_max_hstate]; \
> +			       (tmp_h)++)
>   #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
>   
>   /*

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks
  2022-06-24 17:36 ` [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks James Houghton
@ 2022-06-27 13:07   ` manish.mishra
  2022-07-07 23:03     ` Mike Kravetz
  2022-09-08 18:20   ` Peter Xu
  1 sibling, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-06-27 13:07 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This adds it for architectures that use GENERAL_HUGETLB, including x86.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   include/linux/hugetlb.h |  2 ++
>   mm/hugetlb.c            | 45 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 47 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index e7a6b944d0cc..605aa19d8572 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -258,6 +258,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
>   			unsigned long addr, unsigned long sz);
>   pte_t *huge_pte_offset(struct mm_struct *mm,
>   		       unsigned long addr, unsigned long sz);
> +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		    unsigned long addr, unsigned long sz, bool stop_at_none);
>   int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
>   				unsigned long *addr, pte_t *ptep);
>   void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 557b0afdb503..3ec2a921ee6f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6981,6 +6981,51 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>   	return (pte_t *)pmd;
>   }


not strong feeling but this name looks confusing to me as it does

not only walk over page-tables but can also alloc.

> +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		    unsigned long addr, unsigned long sz, bool stop_at_none)
> +{
> +	pte_t *ptep;
> +
> +	if (!hpte->ptep) {
> +		pgd_t *pgd = pgd_offset(mm, addr);
> +
> +		if (!pgd)
> +			return -ENOMEM;
> +		ptep = (pte_t *)p4d_alloc(mm, pgd, addr);
> +		if (!ptep)
> +			return -ENOMEM;
> +		hugetlb_pte_populate(hpte, ptep, P4D_SHIFT);
> +	}
> +
> +	while (hugetlb_pte_size(hpte) > sz &&
> +			!hugetlb_pte_present_leaf(hpte) &&
> +			!(stop_at_none && hugetlb_pte_none(hpte))) {

Should this ordering of if-else condition be in reverse, i mean it will look

more natural and possibly less condition checks as we go from top to bottom.

> +		if (hpte->shift == PMD_SHIFT) {
> +			ptep = pte_alloc_map(mm, (pmd_t *)hpte->ptep, addr);
> +			if (!ptep)
> +				return -ENOMEM;
> +			hpte->shift = PAGE_SHIFT;
> +			hpte->ptep = ptep;
> +		} else if (hpte->shift == PUD_SHIFT) {
> +			ptep = (pte_t *)pmd_alloc(mm, (pud_t *)hpte->ptep,
> +						  addr);
> +			if (!ptep)
> +				return -ENOMEM;
> +			hpte->shift = PMD_SHIFT;
> +			hpte->ptep = ptep;
> +		} else if (hpte->shift == P4D_SHIFT) {
> +			ptep = (pte_t *)pud_alloc(mm, (p4d_t *)hpte->ptep,
> +						  addr);
> +			if (!ptep)
> +				return -ENOMEM;
> +			hpte->shift = PUD_SHIFT;
> +			hpte->ptep = ptep;
> +		} else
> +			BUG();
> +	}
> +	return 0;
> +}
> +
>   #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>   
>   #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality
  2022-06-24 17:36 ` [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality James Houghton
@ 2022-06-27 13:50   ` manish.mishra
  2022-06-29 16:10     ` James Houghton
  2022-06-29 14:33   ` manish.mishra
  1 sibling, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-06-27 13:50 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> The new function, hugetlb_split_to_shift, will optimally split the page
> table to map a particular address at a particular granularity.
>
> This is useful for punching a hole in the mapping and for mapping small
> sections of a HugeTLB page (via UFFDIO_CONTINUE, for example).
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   mm/hugetlb.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 122 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3ec2a921ee6f..eaffe7b4f67c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -102,6 +102,18 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
>   /* Forward declaration */
>   static int hugetlb_acct_memory(struct hstate *h, long delta);
>   
> +/*
> + * Find the subpage that corresponds to `addr` in `hpage`.
> + */
> +static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hpage,
> +				 unsigned long addr)
> +{
> +	size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE;
> +
> +	BUG_ON(idx >= pages_per_huge_page(h));
> +	return &hpage[idx];
> +}
> +
>   static inline bool subpool_is_free(struct hugepage_subpool *spool)
>   {
>   	if (spool->count)
> @@ -7044,6 +7056,116 @@ static unsigned int __shift_for_hstate(struct hstate *h)
>   	for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
>   			       (tmp_h) <= &hstates[hugetlb_max_hstate]; \
>   			       (tmp_h)++)
> +
> +/*
> + * Given a particular address, split the HugeTLB PTE that currently maps it
> + * so that, for the given address, the PTE that maps it is `desired_shift`.
> + * This function will always split the HugeTLB PTE optimally.
> + *
> + * For example, given a HugeTLB 1G page that is mapped from VA 0 to 1G. If we
> + * call this function with addr=0 and desired_shift=PAGE_SHIFT, will result in
> + * these changes to the page table:
> + * 1. The PUD will be split into 2M PMDs.
> + * 2. The first PMD will be split again into 4K PTEs.
> + */
> +static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma,
> +			   const struct hugetlb_pte *hpte,
> +			   unsigned long addr, unsigned long desired_shift)
> +{
> +	unsigned long start, end, curr;
> +	unsigned long desired_sz = 1UL << desired_shift;
> +	struct hstate *h = hstate_vma(vma);
> +	int ret;
> +	struct hugetlb_pte new_hpte;
> +	struct mmu_notifier_range range;
> +	struct page *hpage = NULL;
> +	struct page *subpage;
> +	pte_t old_entry;
> +	struct mmu_gather tlb;
> +
> +	BUG_ON(!hpte->ptep);
> +	BUG_ON(hugetlb_pte_size(hpte) == desired_sz);
can it be BUG_ON(hugetlb_pte_size(hpte) <= desired_sz)
> +
> +	start = addr & hugetlb_pte_mask(hpte);
> +	end = start + hugetlb_pte_size(hpte);
> +
> +	i_mmap_assert_write_locked(vma->vm_file->f_mapping);

As it is just changing mappings is holding f_mapping required? I mean in future

is it any paln or way to use some per process level sub-lock?

> +
> +	BUG_ON(!hpte->ptep);
> +	/* This function only works if we are looking at a leaf-level PTE. */
> +	BUG_ON(!hugetlb_pte_none(hpte) && !hugetlb_pte_present_leaf(hpte));
> +
> +	/*
> +	 * Clear the PTE so that we will allocate the PT structures when
> +	 * walking the page table.
> +	 */
> +	old_entry = huge_ptep_get_and_clear(mm, start, hpte->ptep);
> +
> +	if (!huge_pte_none(old_entry))
> +		hpage = pte_page(old_entry);
> +
> +	BUG_ON(!IS_ALIGNED(start, desired_sz));
> +	BUG_ON(!IS_ALIGNED(end, desired_sz));
> +
> +	for (curr = start; curr < end;) {
> +		struct hstate *tmp_h;
> +		unsigned int shift;
> +
> +		for_each_hgm_shift(h, tmp_h, shift) {
> +			unsigned long sz = 1UL << shift;
> +
> +			if (!IS_ALIGNED(curr, sz) || curr + sz > end)
> +				continue;
> +			/*
> +			 * If we are including `addr`, we need to make sure
> +			 * splitting down to the correct size. Go to a smaller
> +			 * size if we are not.
> +			 */
> +			if (curr <= addr && curr + sz > addr &&
> +					shift > desired_shift)
> +				continue;
> +
> +			/*
> +			 * Continue the page table walk to the level we want,
> +			 * allocate PT structures as we go.
> +			 */

As i understand this for_each_hgm_shift loop is just to find right size of shift,

then code below this line can be put out of loop, no strong feeling but it looks

more proper may make code easier to understand.

> +			hugetlb_pte_copy(&new_hpte, hpte);
> +			ret = hugetlb_walk_to(mm, &new_hpte, curr, sz,
> +					      /*stop_at_none=*/false);
> +			if (ret)
> +				goto err;
> +			BUG_ON(hugetlb_pte_size(&new_hpte) != sz);
> +			if (hpage) {
> +				pte_t new_entry;
> +
> +				subpage = hugetlb_find_subpage(h, hpage, curr);
> +				new_entry = make_huge_pte_with_shift(vma, subpage,
> +								     huge_pte_write(old_entry),
> +								     shift);
> +				set_huge_pte_at(mm, curr, new_hpte.ptep, new_entry);
> +			}
> +			curr += sz;
> +			goto next;
> +		}
> +		/* We couldn't find a size that worked. */
> +		BUG();
> +next:
> +		continue;
> +	}
> +
> +	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> +				start, end);
> +	mmu_notifier_invalidate_range_start(&range);

sorry did not understand where tlb flush will be taken care in case of success?

I see set_huge_pte_at does not do it internally by self.

> +	return 0;
> +err:
> +	tlb_gather_mmu(&tlb, mm);
> +	/* Free any newly allocated page table entries. */
> +	hugetlb_free_range(&tlb, hpte, start, curr);
> +	/* Restore the old entry. */
> +	set_huge_pte_at(mm, start, hpte->ptep, old_entry);
> +	tlb_finish_mmu(&tlb);
> +	return ret;
> +}
>   #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
>   
>   /*

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-24 18:41 ` Mina Almasry
@ 2022-06-27 16:27   ` James Houghton
  2022-06-28 14:17     ` Muchun Song
  2022-06-28 17:26     ` Mina Almasry
  0 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-27 16:27 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 11:41 AM Mina Almasry <almasrymina@google.com> wrote:
>
> On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> >
> > [trimmed...]
> > ---- Userspace API ----
> >
> > This patch series introduces a single way to take advantage of
> > high-granularity mapping: via UFFDIO_CONTINUE. UFFDIO_CONTINUE allows
> > userspace to resolve MINOR page faults on shared VMAs.
> >
> > To collapse a HugeTLB address range that has been mapped with several
> > UFFDIO_CONTINUE operations, userspace can issue MADV_COLLAPSE. We expect
> > userspace to know when all pages (that they care about) have been fetched.
> >
>
> Thanks James! Cover letter looks good. A few questions:
>
> Why not have the kernel collapse the hugepage once all the 4K pages
> have been fetched automatically? It would remove the need for a new
> userspace API, and AFACT there aren't really any cases where it is
> beneficial to have a hugepage sharded into 4K mappings when those
> mappings can be collapsed.

The reason that we don't automatically collapse mappings is because it
would take additional complexity, and it is less flexible. Consider
the case of 1G pages on x86: currently, userspace can collapse the
whole page when it's all ready, but they can also choose to collapse a
2M piece of it. On architectures with more supported hugepage sizes
(e.g., arm64), userspace has even more possibilities for when to
collapse. This likely further complicates a potential
automatic-collapse solution. Userspace may also want to collapse the
mapping for an entire hugepage without completely mapping the hugepage
first (this would also be possible by issuing UFFDIO_CONTINUE on all
the holes, though).

>
> > ---- HugeTLB Changes ----
> >
> > - Mapcount
> > The way mapcount is handled is different from the way that it was handled
> > before. If the PUD for a hugepage is not none, a hugepage's mapcount will
> > be increased. This scheme means that, for hugepages that aren't mapped at
> > high granularity, their mapcounts will remain the same as what they would
> > have been pre-HGM.
> >
>
> Sorry, I didn't quite follow this. It says mapcount is handled
> differently, but the same if the page is not mapped at high
> granularity. Can you elaborate on how the mapcount handling will be
> different when the page is mapped at high granularity?

I guess I didn't phrase this very well. For the sake of simplicity,
consider 1G pages on x86, typically mapped with leaf-level PUDs.
Previously, there were two possibilities for how a hugepage was
mapped, either it was (1) completely mapped (PUD is present and a
leaf), or (2) it wasn't mapped (PUD is none). Now we have a third
case, where the PUD is not none but also not a leaf (this usually
means that the page is partially mapped). We handle this case as if
the whole page was mapped. That is, if we partially map a hugepage
that was previously unmapped (making the PUD point to PMDs), we
increment its mapcount, and if we completely unmap a partially mapped
hugepage (making the PUD none), we decrement its mapcount. If we
collapse a non-leaf PUD to a leaf PUD, we don't change mapcount.

It is possible for a PUD to be present and not a leaf (mapcount has
been incremented) but for the page to still be unmapped: if the PMDs
(or PTEs) underneath are all none. This case is atypical, and as of
this RFC (without bestowing MADV_DONTNEED with HGM flexibility), I
think it would be very difficult to get this to happen.

>
> > - Page table walking and manipulation
> > A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
> > high-granularity mappings. Eventually, it's possible to merge
> > hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.
> >
> > We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
> > This is because we generally need to know the "size" of a PTE (previously
> > always just huge_page_size(hstate)).
> >
> > For every page table manipulation function that has a huge version (e.g.
> > huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
> > hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
> > PTE really is "huge".
> >
> > - Synchronization
> > For existing bits of HugeTLB, synchronization is unchanged. For splitting
> > and collapsing HugeTLB PTEs, we require that the i_mmap_rw_sem is held for
> > writing, and for doing high-granularity page table walks, we require it to
> > be held for reading.
> >
> > ---- Limitations & Future Changes ----
> >
> > This patch series only implements high-granularity mapping for VM_SHARED
> > VMAs.  I intend to implement enough HGM to support 4K unmapping for memory
> > failure recovery for both shared and private mappings.
> >
> > The memory failure use case poses its own challenges that can be
> > addressed, but I will do so in a separate RFC.
> >
> > Performance has not been heavily scrutinized with this patch series. There
> > are places where lock contention can significantly reduce performance. This
> > will be addressed later.
> >
> > The patch series, as it stands right now, is compatible with the VMEMMAP
> > page struct optimization[3], as we do not need to modify data contained
> > in the subpage page structs.
> >
> > Other omissions:
> >  - Compatibility with userfaultfd write-protect (will be included in v1).
> >  - Support for mremap() (will be included in v1). This looks a lot like
> >    the support we have for fork().
> >  - Documentation changes (will be included in v1).
> >  - Completely ignores PMD sharing and hugepage migration (will be included
> >    in v1).
> >  - Implementations for architectures that don't use GENERAL_HUGETLB other
> >    than arm64.
> >
> > ---- Patch Breakdown ----
> >
> > Patch 1     - Preliminary changes
> > Patch 2-10  - HugeTLB HGM core changes
> > Patch 11-13 - HugeTLB HGM page table walking functionality
> > Patch 14-19 - HugeTLB HGM compatibility with other bits
> > Patch 20-23 - Userfaultfd and collapse changes
> > Patch 24-26 - arm64 support and selftests
> >
> > [1] This used to be called HugeTLB double mapping, a bad and confusing
> >     name. "High-granularity mapping" is not a great name either. I am open
> >     to better names.
>
> I would drop 1 extra word and do "granular mapping", as in the mapping
> is more granular than what it normally is (2MB/1G, etc).

Noted. :)

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-24 18:29 ` [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping Matthew Wilcox
@ 2022-06-27 16:36   ` James Houghton
  2022-06-27 17:56     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-27 16:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Manish Mishra, Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 11:29 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Jun 24, 2022 at 05:36:30PM +0000, James Houghton wrote:
> > [1] This used to be called HugeTLB double mapping, a bad and confusing
> >     name. "High-granularity mapping" is not a great name either. I am open
> >     to better names.
>
> Oh good, I was grinding my teeth every time I read it ;-)
>
> How does "Fine granularity" work for you?
> "sub-page mapping" might work too.

"Granularity", as I've come to realize, is hard to say, so I think I
prefer sub-page mapping. :) So to recap the suggestions I have so far:

1. Sub-page mapping
2. Granular mapping
3. Flexible mapping

I'll pick one of these (or maybe some other one that works better) for
the next version of this series.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-24 18:47 ` Matthew Wilcox
@ 2022-06-27 16:48   ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-27 16:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Manish Mishra, Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 11:47 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Jun 24, 2022 at 05:36:30PM +0000, James Houghton wrote:
> > - Page table walking and manipulation
> > A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
> > high-granularity mappings. Eventually, it's possible to merge
> > hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.
> >
> > We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
> > This is because we generally need to know the "size" of a PTE (previously
> > always just huge_page_size(hstate)).
> >
> > For every page table manipulation function that has a huge version (e.g.
> > huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
> > hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
> > PTE really is "huge".
>
> I'm disappointed to hear that page table walking is going to become even
> more special.  I'd much prefer it if hugetlb walking were exactly the
> same as THP walking.  This seems like a good time to do at least some
> of that work.
>
> Was there a reason you chose the "more complexity" direction?

I chose this direction because it seemed to be the most
straightforward to get to a working prototype and then to an RFC. I
agree with your sentiment -- I'll see what I can do to reconcile THP
walking with HugeTLB(+HGM) walking.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-27 16:36   ` James Houghton
@ 2022-06-27 17:56     ` Dr. David Alan Gilbert
  2022-06-27 20:31       ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: Dr. David Alan Gilbert @ 2022-06-27 17:56 UTC (permalink / raw)
  To: James Houghton
  Cc: Matthew Wilcox, Mike Kravetz, Muchun Song, Peter Xu,
	David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, linux-mm, linux-kernel

* James Houghton (jthoughton@google.com) wrote:
> On Fri, Jun 24, 2022 at 11:29 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Jun 24, 2022 at 05:36:30PM +0000, James Houghton wrote:
> > > [1] This used to be called HugeTLB double mapping, a bad and confusing
> > >     name. "High-granularity mapping" is not a great name either. I am open
> > >     to better names.
> >
> > Oh good, I was grinding my teeth every time I read it ;-)
> >
> > How does "Fine granularity" work for you?
> > "sub-page mapping" might work too.
> 
> "Granularity", as I've come to realize, is hard to say, so I think I
> prefer sub-page mapping. :) So to recap the suggestions I have so far:
> 
> 1. Sub-page mapping
> 2. Granular mapping
> 3. Flexible mapping
> 
> I'll pick one of these (or maybe some other one that works better) for
> the next version of this series.

<shrug> Just a name; SPM might work (although may confuse those
architectures which had subprotection for normal pages), and at least
we can mispronounce it.

In 14/26 your commit message says:

  1. Faults can be passed to handle_userfault. (Userspace will want to
     use UFFD_FEATURE_REAL_ADDRESS to get the real address to know which
     region they should be call UFFDIO_CONTINUE on later.)

can you explain what that new UFFD_FEATURE does?

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
  2022-06-24 17:36 ` [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
  2022-06-24 18:51   ` Mina Almasry
  2022-06-27 12:08   ` manish.mishra
@ 2022-06-27 18:42   ` Mike Kravetz
  2022-06-28 15:40     ` James Houghton
  2 siblings, 1 reply; 128+ messages in thread
From: Mike Kravetz @ 2022-06-27 18:42 UTC (permalink / raw)
  To: James Houghton
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/24/22 17:36, James Houghton wrote:
> When using HugeTLB high-granularity mapping, we need to go through the
> supported hugepage sizes in decreasing order so that we pick the largest
> size that works. Consider the case where we're faulting in a 1G hugepage
> for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> a PUD. By going through the sizes in decreasing order, we will find that
> PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
> 
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a57e1be41401..5df838d86f32 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -33,6 +33,7 @@
>  #include <linux/migrate.h>
>  #include <linux/nospec.h>
>  #include <linux/delayacct.h>
> +#include <linux/sort.h>
>  
>  #include <asm/page.h>
>  #include <asm/pgalloc.h>
> @@ -48,6 +49,10 @@
>  
>  int hugetlb_max_hstate __read_mostly;
>  unsigned int default_hstate_idx;
> +/*
> + * After hugetlb_init_hstates is called, hstates will be sorted from largest
> + * to smallest.
> + */
>  struct hstate hstates[HUGE_MAX_HSTATE];
>  
>  #ifdef CONFIG_CMA
> @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
>  	kfree(node_alloc_noretry);
>  }
>  
> +static int compare_hstates_decreasing(const void *a, const void *b)
> +{
> +	const int shift_a = huge_page_shift((const struct hstate *)a);
> +	const int shift_b = huge_page_shift((const struct hstate *)b);
> +
> +	if (shift_a < shift_b)
> +		return 1;
> +	if (shift_a > shift_b)
> +		return -1;
> +	return 0;
> +}
> +
> +static void sort_hstates(void)
> +{
> +	unsigned long default_hstate_sz = huge_page_size(&default_hstate);
> +
> +	/* Sort from largest to smallest. */
> +	sort(hstates, hugetlb_max_hstate, sizeof(*hstates),
> +	     compare_hstates_decreasing, NULL);
> +
> +	/*
> +	 * We may have changed the location of the default hstate, so we need to
> +	 * update it.
> +	 */
> +	default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz));
> +}
> +
>  static void __init hugetlb_init_hstates(void)
>  {
>  	struct hstate *h, *h2;
>  
> -	for_each_hstate(h) {
> -		if (minimum_order > huge_page_order(h))
> -			minimum_order = huge_page_order(h);
> +	sort_hstates();
>  
> +	/* The last hstate is now the smallest. */
> +	minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]);
> +
> +	for_each_hstate(h) {
>  		/* oversize hugepages were init'ed in early boot */
>  		if (!hstate_is_gigantic(h))
>  			hugetlb_hstate_alloc_pages(h);

This may/will cause problems for gigantic hugetlb pages allocated at boot
time.  See alloc_bootmem_huge_page() where a pointer to the associated hstate
is encoded within the allocated hugetlb page.  These pages are added to
hugetlb pools by the routine gather_bootmem_prealloc() which uses the saved
hstate to add prep the gigantic page and add to the correct pool.  Currently,
gather_bootmem_prealloc is called after hugetlb_init_hstates.  So, changing
hstate order will cause errors.

I do not see any reason why we could not call gather_bootmem_prealloc before
hugetlb_init_hstates to avoid this issue.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-27 17:56     ` Dr. David Alan Gilbert
@ 2022-06-27 20:31       ` James Houghton
  2022-06-28  0:04         ` Nadav Amit
  2022-06-28  8:20         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-27 20:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Matthew Wilcox, Mike Kravetz, Muchun Song, Peter Xu,
	David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, linux-mm, linux-kernel, Nadav Amit

On Mon, Jun 27, 2022 at 10:56 AM Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * James Houghton (jthoughton@google.com) wrote:
> > On Fri, Jun 24, 2022 at 11:29 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, Jun 24, 2022 at 05:36:30PM +0000, James Houghton wrote:
> > > > [1] This used to be called HugeTLB double mapping, a bad and confusing
> > > >     name. "High-granularity mapping" is not a great name either. I am open
> > > >     to better names.
> > >
> > > Oh good, I was grinding my teeth every time I read it ;-)
> > >
> > > How does "Fine granularity" work for you?
> > > "sub-page mapping" might work too.
> >
> > "Granularity", as I've come to realize, is hard to say, so I think I
> > prefer sub-page mapping. :) So to recap the suggestions I have so far:
> >
> > 1. Sub-page mapping
> > 2. Granular mapping
> > 3. Flexible mapping
> >
> > I'll pick one of these (or maybe some other one that works better) for
> > the next version of this series.
>
> <shrug> Just a name; SPM might work (although may confuse those
> architectures which had subprotection for normal pages), and at least
> we can mispronounce it.
>
> In 14/26 your commit message says:
>
>   1. Faults can be passed to handle_userfault. (Userspace will want to
>      use UFFD_FEATURE_REAL_ADDRESS to get the real address to know which
>      region they should be call UFFDIO_CONTINUE on later.)
>
> can you explain what that new UFFD_FEATURE does?

+cc Nadav Amit <namit@vmware.com> to check me here.

Sorry, this should be UFFD_FEATURE_EXACT_ADDRESS. It isn't a new
feature, and it actually isn't needed (I will correct the commit
message). Why it isn't needed is a little bit complicated, though. Let
me explain:

Before UFFD_FEATURE_EXACT_ADDRESS was introduced, the address that
userfaultfd gave userspace for HugeTLB pages was rounded down to be
hstate-size-aligned. This would have had to change, because userspace,
to take advantage of HGM, needs to know which 4K piece to install.

However, after UFFD_FEATURE_EXACT_ADDRESS was introduced[1], the
address was rounded down to be PAGE_SIZE-aligned instead, even if the
flag wasn't used. I think this was an unintended change. If the flag
is used, then the address isn't rounded at all -- that was the
intended purpose of this flag. Hope that makes sense.

The new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS_HGM, informs
userspace that high-granularity CONTINUEs are available.

[1] commit 824ddc601adc ("userfaultfd: provide unmasked address on page-fault")


>
> Dave
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-24 17:36 ` [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
  2022-06-27 12:26   ` manish.mishra
@ 2022-06-27 20:51   ` Mike Kravetz
  2022-06-28 15:29     ` James Houghton
  2022-06-29  6:09     ` Muchun Song
  1 sibling, 2 replies; 128+ messages in thread
From: Mike Kravetz @ 2022-06-27 20:51 UTC (permalink / raw)
  To: James Houghton
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/24/22 17:36, James Houghton wrote:
> This is needed to handle PTL locking with high-granularity mapping. We
> won't always be using the PMD-level PTL even if we're using the 2M
> hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> case, we need to lock the PTL for the 4K PTE.

I'm not really sure why this would be required.
Why not use the PMD level lock for 4K PTEs?  Seems that would scale better
with less contention than using the more coarse mm lock.  

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-27 20:31       ` James Houghton
@ 2022-06-28  0:04         ` Nadav Amit
  2022-06-30 19:21           ` Peter Xu
  2022-06-28  8:20         ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 128+ messages in thread
From: Nadav Amit @ 2022-06-28  0:04 UTC (permalink / raw)
  To: James Houghton
  Cc: Dr. David Alan Gilbert, Matthew Wilcox, Mike Kravetz,
	Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra, linux-mm,
	linux-kernel



> On Jun 27, 2022, at 1:31 PM, James Houghton <jthoughton@google.com> wrote:
> 
> ⚠ External Email
> 
> On Mon, Jun 27, 2022 at 10:56 AM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>> 
>> * James Houghton (jthoughton@google.com) wrote:
>>> On Fri, Jun 24, 2022 at 11:29 AM Matthew Wilcox <willy@infradead.org> wrote:
>>>> 
>>>> On Fri, Jun 24, 2022 at 05:36:30PM +0000, James Houghton wrote:
>>>>> [1] This used to be called HugeTLB double mapping, a bad and confusing
>>>>> name. "High-granularity mapping" is not a great name either. I am open
>>>>> to better names.
>>>> 
>>>> Oh good, I was grinding my teeth every time I read it ;-)
>>>> 
>>>> How does "Fine granularity" work for you?
>>>> "sub-page mapping" might work too.
>>> 
>>> "Granularity", as I've come to realize, is hard to say, so I think I
>>> prefer sub-page mapping. :) So to recap the suggestions I have so far:
>>> 
>>> 1. Sub-page mapping
>>> 2. Granular mapping
>>> 3. Flexible mapping
>>> 
>>> I'll pick one of these (or maybe some other one that works better) for
>>> the next version of this series.
>> 
>> <shrug> Just a name; SPM might work (although may confuse those
>> architectures which had subprotection for normal pages), and at least
>> we can mispronounce it.
>> 
>> In 14/26 your commit message says:
>> 
>> 1. Faults can be passed to handle_userfault. (Userspace will want to
>> use UFFD_FEATURE_REAL_ADDRESS to get the real address to know which
>> region they should be call UFFDIO_CONTINUE on later.)
>> 
>> can you explain what that new UFFD_FEATURE does?
> 
> +cc Nadav Amit <namit@vmware.com> to check me here.
> 
> Sorry, this should be UFFD_FEATURE_EXACT_ADDRESS. It isn't a new
> feature, and it actually isn't needed (I will correct the commit
> message). Why it isn't needed is a little bit complicated, though. Let
> me explain:
> 
> Before UFFD_FEATURE_EXACT_ADDRESS was introduced, the address that
> userfaultfd gave userspace for HugeTLB pages was rounded down to be
> hstate-size-aligned. This would have had to change, because userspace,
> to take advantage of HGM, needs to know which 4K piece to install.
> 
> However, after UFFD_FEATURE_EXACT_ADDRESS was introduced[1], the
> address was rounded down to be PAGE_SIZE-aligned instead, even if the
> flag wasn't used. I think this was an unintended change. If the flag
> is used, then the address isn't rounded at all -- that was the
> intended purpose of this flag. Hope that makes sense.
> 
> The new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS_HGM, informs
> userspace that high-granularity CONTINUEs are available.
> 
> [1] commit 824ddc601adc ("userfaultfd: provide unmasked address on page-fault")

Indeed this change of behavior (not aligning to huge-pages when flags is
not set) was unintentional. If you want to fix it in a separate patch so
it would be backported, that may be a good idea.

For the record, there was a short period of time in 2016 when the exact
fault address was delivered even when UFFD_FEATURE_EXACT_ADDRESS was not
provided. We had some arguments whether this was a regression...

BTW: I should have thought on the use-case of knowing the exact address
in huge-pages. It would have shorten my discussions with Andrea on whether
this feature (UFFD_FEATURE_EXACT_ADDRESS) is needed. :)


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-27 20:31       ` James Houghton
  2022-06-28  0:04         ` Nadav Amit
@ 2022-06-28  8:20         ` Dr. David Alan Gilbert
  2022-06-30 16:09           ` Peter Xu
  1 sibling, 1 reply; 128+ messages in thread
From: Dr. David Alan Gilbert @ 2022-06-28  8:20 UTC (permalink / raw)
  To: James Houghton
  Cc: Matthew Wilcox, Mike Kravetz, Muchun Song, Peter Xu,
	David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, linux-mm, linux-kernel, Nadav Amit

* James Houghton (jthoughton@google.com) wrote:
> On Mon, Jun 27, 2022 at 10:56 AM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * James Houghton (jthoughton@google.com) wrote:
> > > On Fri, Jun 24, 2022 at 11:29 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Fri, Jun 24, 2022 at 05:36:30PM +0000, James Houghton wrote:
> > > > > [1] This used to be called HugeTLB double mapping, a bad and confusing
> > > > >     name. "High-granularity mapping" is not a great name either. I am open
> > > > >     to better names.
> > > >
> > > > Oh good, I was grinding my teeth every time I read it ;-)
> > > >
> > > > How does "Fine granularity" work for you?
> > > > "sub-page mapping" might work too.
> > >
> > > "Granularity", as I've come to realize, is hard to say, so I think I
> > > prefer sub-page mapping. :) So to recap the suggestions I have so far:
> > >
> > > 1. Sub-page mapping
> > > 2. Granular mapping
> > > 3. Flexible mapping
> > >
> > > I'll pick one of these (or maybe some other one that works better) for
> > > the next version of this series.
> >
> > <shrug> Just a name; SPM might work (although may confuse those
> > architectures which had subprotection for normal pages), and at least
> > we can mispronounce it.
> >
> > In 14/26 your commit message says:
> >
> >   1. Faults can be passed to handle_userfault. (Userspace will want to
> >      use UFFD_FEATURE_REAL_ADDRESS to get the real address to know which
> >      region they should be call UFFDIO_CONTINUE on later.)
> >
> > can you explain what that new UFFD_FEATURE does?
> 
> +cc Nadav Amit <namit@vmware.com> to check me here.
> 
> Sorry, this should be UFFD_FEATURE_EXACT_ADDRESS. It isn't a new
> feature, and it actually isn't needed (I will correct the commit
> message). Why it isn't needed is a little bit complicated, though. Let
> me explain:
> 
> Before UFFD_FEATURE_EXACT_ADDRESS was introduced, the address that
> userfaultfd gave userspace for HugeTLB pages was rounded down to be
> hstate-size-aligned. This would have had to change, because userspace,
> to take advantage of HGM, needs to know which 4K piece to install.
> 
> However, after UFFD_FEATURE_EXACT_ADDRESS was introduced[1], the
> address was rounded down to be PAGE_SIZE-aligned instead, even if the
> flag wasn't used. I think this was an unintended change. If the flag
> is used, then the address isn't rounded at all -- that was the
> intended purpose of this flag. Hope that makes sense.

Oh that's 'fun'; right but the need for the less-rounded address makes
sense.

One other thing I thought of; you provide the modified 'CONTINUE'
behaviour, which works for postcopy as long as you use two mappings in
userspace; one protected by userfault, and one which you do the writes
to, and then issue the CONTINUE into the protected mapping; that's fine,
but it's not currently how we have our postcopy code wired up in qemu,
we have one mapping and use UFFDIO_COPY to place the page.
Requiring the two mappings is fine, but it's probably worth pointing out
the need for it somewhere.

Dave

> The new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS_HGM, informs
> userspace that high-granularity CONTINUEs are available.
> 
> [1] commit 824ddc601adc ("userfaultfd: provide unmasked address on page-fault")
> 
> 
> >
> > Dave
> >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-27 16:27   ` James Houghton
@ 2022-06-28 14:17     ` Muchun Song
  2022-06-28 17:26     ` Mina Almasry
  1 sibling, 0 replies; 128+ messages in thread
From: Muchun Song @ 2022-06-28 14:17 UTC (permalink / raw)
  To: James Houghton
  Cc: Mina Almasry, Mike Kravetz, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 09:27:38AM -0700, James Houghton wrote:
> On Fri, Jun 24, 2022 at 11:41 AM Mina Almasry <almasrymina@google.com> wrote:
> >
> > On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> > >
> > > [trimmed...]
> > > ---- Userspace API ----
> > >
> > > This patch series introduces a single way to take advantage of
> > > high-granularity mapping: via UFFDIO_CONTINUE. UFFDIO_CONTINUE allows
> > > userspace to resolve MINOR page faults on shared VMAs.
> > >
> > > To collapse a HugeTLB address range that has been mapped with several
> > > UFFDIO_CONTINUE operations, userspace can issue MADV_COLLAPSE. We expect
> > > userspace to know when all pages (that they care about) have been fetched.
> > >
> >
> > Thanks James! Cover letter looks good. A few questions:
> >
> > Why not have the kernel collapse the hugepage once all the 4K pages
> > have been fetched automatically? It would remove the need for a new
> > userspace API, and AFACT there aren't really any cases where it is
> > beneficial to have a hugepage sharded into 4K mappings when those
> > mappings can be collapsed.
> 
> The reason that we don't automatically collapse mappings is because it
> would take additional complexity, and it is less flexible. Consider
> the case of 1G pages on x86: currently, userspace can collapse the
> whole page when it's all ready, but they can also choose to collapse a
> 2M piece of it. On architectures with more supported hugepage sizes
> (e.g., arm64), userspace has even more possibilities for when to
> collapse. This likely further complicates a potential
> automatic-collapse solution. Userspace may also want to collapse the
> mapping for an entire hugepage without completely mapping the hugepage
> first (this would also be possible by issuing UFFDIO_CONTINUE on all
> the holes, though).
> 
> >
> > > ---- HugeTLB Changes ----
> > >
> > > - Mapcount
> > > The way mapcount is handled is different from the way that it was handled
> > > before. If the PUD for a hugepage is not none, a hugepage's mapcount will
> > > be increased. This scheme means that, for hugepages that aren't mapped at
> > > high granularity, their mapcounts will remain the same as what they would
> > > have been pre-HGM.
> > >
> >
> > Sorry, I didn't quite follow this. It says mapcount is handled

+1

> > differently, but the same if the page is not mapped at high
> > granularity. Can you elaborate on how the mapcount handling will be
> > different when the page is mapped at high granularity?
> 
> I guess I didn't phrase this very well. For the sake of simplicity,
> consider 1G pages on x86, typically mapped with leaf-level PUDs.
> Previously, there were two possibilities for how a hugepage was
> mapped, either it was (1) completely mapped (PUD is present and a
> leaf), or (2) it wasn't mapped (PUD is none). Now we have a third
> case, where the PUD is not none but also not a leaf (this usually
> means that the page is partially mapped). We handle this case as if
> the whole page was mapped. That is, if we partially map a hugepage
> that was previously unmapped (making the PUD point to PMDs), we
> increment its mapcount, and if we completely unmap a partially mapped
> hugepage (making the PUD none), we decrement its mapcount. If we
> collapse a non-leaf PUD to a leaf PUD, we don't change mapcount.
> 
> It is possible for a PUD to be present and not a leaf (mapcount has
> been incremented) but for the page to still be unmapped: if the PMDs
> (or PTEs) underneath are all none. This case is atypical, and as of
> this RFC (without bestowing MADV_DONTNEED with HGM flexibility), I
> think it would be very difficult to get this to happen.
> 

It is a good explanation. I think it is better to go to cover letter.

Thanks.

> >
> > > - Page table walking and manipulation
> > > A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
> > > high-granularity mappings. Eventually, it's possible to merge
> > > hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.
> > >
> > > We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
> > > This is because we generally need to know the "size" of a PTE (previously
> > > always just huge_page_size(hstate)).
> > >
> > > For every page table manipulation function that has a huge version (e.g.
> > > huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
> > > hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
> > > PTE really is "huge".
> > >
> > > - Synchronization
> > > For existing bits of HugeTLB, synchronization is unchanged. For splitting
> > > and collapsing HugeTLB PTEs, we require that the i_mmap_rw_sem is held for
> > > writing, and for doing high-granularity page table walks, we require it to
> > > be held for reading.
> > >
> > > ---- Limitations & Future Changes ----
> > >
> > > This patch series only implements high-granularity mapping for VM_SHARED
> > > VMAs.  I intend to implement enough HGM to support 4K unmapping for memory
> > > failure recovery for both shared and private mappings.
> > >
> > > The memory failure use case poses its own challenges that can be
> > > addressed, but I will do so in a separate RFC.
> > >
> > > Performance has not been heavily scrutinized with this patch series. There
> > > are places where lock contention can significantly reduce performance. This
> > > will be addressed later.
> > >
> > > The patch series, as it stands right now, is compatible with the VMEMMAP
> > > page struct optimization[3], as we do not need to modify data contained
> > > in the subpage page structs.
> > >
> > > Other omissions:
> > >  - Compatibility with userfaultfd write-protect (will be included in v1).
> > >  - Support for mremap() (will be included in v1). This looks a lot like
> > >    the support we have for fork().
> > >  - Documentation changes (will be included in v1).
> > >  - Completely ignores PMD sharing and hugepage migration (will be included
> > >    in v1).
> > >  - Implementations for architectures that don't use GENERAL_HUGETLB other
> > >    than arm64.
> > >
> > > ---- Patch Breakdown ----
> > >
> > > Patch 1     - Preliminary changes
> > > Patch 2-10  - HugeTLB HGM core changes
> > > Patch 11-13 - HugeTLB HGM page table walking functionality
> > > Patch 14-19 - HugeTLB HGM compatibility with other bits
> > > Patch 20-23 - Userfaultfd and collapse changes
> > > Patch 24-26 - arm64 support and selftests
> > >
> > > [1] This used to be called HugeTLB double mapping, a bad and confusing
> > >     name. "High-granularity mapping" is not a great name either. I am open
> > >     to better names.
> >
> > I would drop 1 extra word and do "granular mapping", as in the mapping
> > is more granular than what it normally is (2MB/1G, etc).
> 
> Noted. :)
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-27 20:51   ` Mike Kravetz
@ 2022-06-28 15:29     ` James Houghton
  2022-06-29  6:09     ` Muchun Song
  1 sibling, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-28 15:29 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 1:52 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 06/24/22 17:36, James Houghton wrote:
> > This is needed to handle PTL locking with high-granularity mapping. We
> > won't always be using the PMD-level PTL even if we're using the 2M
> > hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> > case, we need to lock the PTL for the 4K PTE.
>
> I'm not really sure why this would be required.
> Why not use the PMD level lock for 4K PTEs?  Seems that would scale better
> with less contention than using the more coarse mm lock.

I should be using the PMD level lock for 4K PTEs, yeah. I'll work this
into the next version of the series. Thanks both.

>
> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
  2022-06-27 12:08   ` manish.mishra
@ 2022-06-28 15:35     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-28 15:35 UTC (permalink / raw)
  To: manish.mishra
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 5:09 AM manish.mishra <manish.mishra@nutanix.com> wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
> > When using HugeTLB high-granularity mapping, we need to go through the
> > supported hugepage sizes in decreasing order so that we pick the largest
> > size that works. Consider the case where we're faulting in a 1G hugepage
> > for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> > a PUD. By going through the sizes in decreasing order, we will find that
> > PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >   mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++---
> >   1 file changed, 37 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index a57e1be41401..5df838d86f32 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -33,6 +33,7 @@
> >   #include <linux/migrate.h>
> >   #include <linux/nospec.h>
> >   #include <linux/delayacct.h>
> > +#include <linux/sort.h>
> >
> >   #include <asm/page.h>
> >   #include <asm/pgalloc.h>
> > @@ -48,6 +49,10 @@
> >
> >   int hugetlb_max_hstate __read_mostly;
> >   unsigned int default_hstate_idx;
> > +/*
> > + * After hugetlb_init_hstates is called, hstates will be sorted from largest
> > + * to smallest.
> > + */
> >   struct hstate hstates[HUGE_MAX_HSTATE];
> >
> >   #ifdef CONFIG_CMA
> > @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
> >       kfree(node_alloc_noretry);
> >   }
> >
> > +static int compare_hstates_decreasing(const void *a, const void *b)
> > +{
> > +     const int shift_a = huge_page_shift((const struct hstate *)a);
> > +     const int shift_b = huge_page_shift((const struct hstate *)b);
> > +
> > +     if (shift_a < shift_b)
> > +             return 1;
> > +     if (shift_a > shift_b)
> > +             return -1;
> > +     return 0;
> > +}
> > +
> > +static void sort_hstates(void)
> > +{
> > +     unsigned long default_hstate_sz = huge_page_size(&default_hstate);
> > +
> > +     /* Sort from largest to smallest. */
> > +     sort(hstates, hugetlb_max_hstate, sizeof(*hstates),
> > +          compare_hstates_decreasing, NULL);
> > +
> > +     /*
> > +      * We may have changed the location of the default hstate, so we need to
> > +      * update it.
> > +      */
> > +     default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz));
> > +}
> > +
> >   static void __init hugetlb_init_hstates(void)
> >   {
> >       struct hstate *h, *h2;
> >
> > -     for_each_hstate(h) {
> > -             if (minimum_order > huge_page_order(h))
> > -                     minimum_order = huge_page_order(h);
> > +     sort_hstates();
> >
> > +     /* The last hstate is now the smallest. */
> > +     minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]);
> > +
> > +     for_each_hstate(h) {
> >               /* oversize hugepages were init'ed in early boot */
> >               if (!hstate_is_gigantic(h))
> >                       hugetlb_hstate_alloc_pages(h);
>
> As now hstates are ordered can code which does calculation of demot_order
>
> can too be optimised, i mean it can be value of order of hstate at next index?
>

Indeed -- thanks for catching that. I'll make this optimization for
the next version of this series.

>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
  2022-06-27 18:42   ` Mike Kravetz
@ 2022-06-28 15:40     ` James Houghton
  2022-06-29  6:39       ` Muchun Song
  0 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-28 15:40 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 11:42 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 06/24/22 17:36, James Houghton wrote:
> > When using HugeTLB high-granularity mapping, we need to go through the
> > supported hugepage sizes in decreasing order so that we pick the largest
> > size that works. Consider the case where we're faulting in a 1G hugepage
> > for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> > a PUD. By going through the sizes in decreasing order, we will find that
> > PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >  mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 37 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index a57e1be41401..5df838d86f32 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -33,6 +33,7 @@
> >  #include <linux/migrate.h>
> >  #include <linux/nospec.h>
> >  #include <linux/delayacct.h>
> > +#include <linux/sort.h>
> >
> >  #include <asm/page.h>
> >  #include <asm/pgalloc.h>
> > @@ -48,6 +49,10 @@
> >
> >  int hugetlb_max_hstate __read_mostly;
> >  unsigned int default_hstate_idx;
> > +/*
> > + * After hugetlb_init_hstates is called, hstates will be sorted from largest
> > + * to smallest.
> > + */
> >  struct hstate hstates[HUGE_MAX_HSTATE];
> >
> >  #ifdef CONFIG_CMA
> > @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
> >       kfree(node_alloc_noretry);
> >  }
> >
> > +static int compare_hstates_decreasing(const void *a, const void *b)
> > +{
> > +     const int shift_a = huge_page_shift((const struct hstate *)a);
> > +     const int shift_b = huge_page_shift((const struct hstate *)b);
> > +
> > +     if (shift_a < shift_b)
> > +             return 1;
> > +     if (shift_a > shift_b)
> > +             return -1;
> > +     return 0;
> > +}
> > +
> > +static void sort_hstates(void)
> > +{
> > +     unsigned long default_hstate_sz = huge_page_size(&default_hstate);
> > +
> > +     /* Sort from largest to smallest. */
> > +     sort(hstates, hugetlb_max_hstate, sizeof(*hstates),
> > +          compare_hstates_decreasing, NULL);
> > +
> > +     /*
> > +      * We may have changed the location of the default hstate, so we need to
> > +      * update it.
> > +      */
> > +     default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz));
> > +}
> > +
> >  static void __init hugetlb_init_hstates(void)
> >  {
> >       struct hstate *h, *h2;
> >
> > -     for_each_hstate(h) {
> > -             if (minimum_order > huge_page_order(h))
> > -                     minimum_order = huge_page_order(h);
> > +     sort_hstates();
> >
> > +     /* The last hstate is now the smallest. */
> > +     minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]);
> > +
> > +     for_each_hstate(h) {
> >               /* oversize hugepages were init'ed in early boot */
> >               if (!hstate_is_gigantic(h))
> >                       hugetlb_hstate_alloc_pages(h);
>
> This may/will cause problems for gigantic hugetlb pages allocated at boot
> time.  See alloc_bootmem_huge_page() where a pointer to the associated hstate
> is encoded within the allocated hugetlb page.  These pages are added to
> hugetlb pools by the routine gather_bootmem_prealloc() which uses the saved
> hstate to add prep the gigantic page and add to the correct pool.  Currently,
> gather_bootmem_prealloc is called after hugetlb_init_hstates.  So, changing
> hstate order will cause errors.
>
> I do not see any reason why we could not call gather_bootmem_prealloc before
> hugetlb_init_hstates to avoid this issue.

Thanks for catching this, Mike. Your suggestion certainly seems to
work, but it also seems kind of error prone. I'll have to look at the
code more closely, but maybe it would be better if I just maintained a
separate `struct hstate *sorted_hstate_ptrs[]`, where the original
locations of the hstates remain unchanged, as to not break
gather_bootmem_prealloc/other things.

> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 01/26] hugetlb: make hstate accessor functions const
  2022-06-27 12:09     ` manish.mishra
@ 2022-06-28 17:08       ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-28 17:08 UTC (permalink / raw)
  To: manish.mishra
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 5:09 AM manish.mishra <manish.mishra@nutanix.com> wrote:
>
>
> On 27/06/22 5:06 pm, manish.mishra wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
>
> This is just a const-correctness change so that the new hugetlb_pte
> changes can be const-correct too.
>
> Acked-by: David Rientjes <rientjes@google.com>
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/hugetlb.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index e4cff27d1198..498a4ae3d462 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -715,7 +715,7 @@ static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
>   return hstate_file(vma->vm_file);
>  }
>
> -static inline unsigned long huge_page_size(struct hstate *h)
> +static inline unsigned long huge_page_size(const struct hstate *h)
>  {
>   return (unsigned long)PAGE_SIZE << h->order;
>  }
> @@ -729,27 +729,27 @@ static inline unsigned long huge_page_mask(struct hstate *h)
>   return h->mask;
>  }
>
> -static inline unsigned int huge_page_order(struct hstate *h)
> +static inline unsigned int huge_page_order(const struct hstate *h)
>  {
>   return h->order;
>  }
>
> -static inline unsigned huge_page_shift(struct hstate *h)
> +static inline unsigned huge_page_shift(const struct hstate *h)
>  {
>   return h->order + PAGE_SHIFT;
>  }
>
> -static inline bool hstate_is_gigantic(struct hstate *h)
> +static inline bool hstate_is_gigantic(const struct hstate *h)
>  {
>   return huge_page_order(h) >= MAX_ORDER;
>  }
>
> -static inline unsigned int pages_per_huge_page(struct hstate *h)
> +static inline unsigned int pages_per_huge_page(const struct hstate *h)
>  {
>   return 1 << h->order;
>  }
>
> -static inline unsigned int blocks_per_huge_page(struct hstate *h)
> +static inline unsigned int blocks_per_huge_page(const struct hstate *h)
>  {
>   return huge_page_size(h) / 512;
>  }
>
> James, Just wanted to check why you did it selectively only for these functions
>
> why not for something like hstate_index which too i see used in your code.

I'll look into which other functions can be made const. We need
huge_page_shift() to take `const struct hstate *h` so that the hstates
can be sorted, and it then followed to make the surrounding, related
functions const as well. I could also just leave it at
huge_page_shift().

The commit message here is wrong -- the hugetlb_pte const-correctness
is a separate issue that doesn't depend the constness of hstates. I'll
fix that -- sorry about that.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-27 16:27   ` James Houghton
  2022-06-28 14:17     ` Muchun Song
@ 2022-06-28 17:26     ` Mina Almasry
  2022-06-28 17:56       ` Dr. David Alan Gilbert
  2022-06-29 20:39       ` Axel Rasmussen
  1 sibling, 2 replies; 128+ messages in thread
From: Mina Almasry @ 2022-06-28 17:26 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 9:27 AM James Houghton <jthoughton@google.com> wrote:
>
> On Fri, Jun 24, 2022 at 11:41 AM Mina Almasry <almasrymina@google.com> wrote:
> >
> > On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> > >
> > > [trimmed...]
> > > ---- Userspace API ----
> > >
> > > This patch series introduces a single way to take advantage of
> > > high-granularity mapping: via UFFDIO_CONTINUE. UFFDIO_CONTINUE allows
> > > userspace to resolve MINOR page faults on shared VMAs.
> > >
> > > To collapse a HugeTLB address range that has been mapped with several
> > > UFFDIO_CONTINUE operations, userspace can issue MADV_COLLAPSE. We expect
> > > userspace to know when all pages (that they care about) have been fetched.
> > >
> >
> > Thanks James! Cover letter looks good. A few questions:
> >
> > Why not have the kernel collapse the hugepage once all the 4K pages
> > have been fetched automatically? It would remove the need for a new
> > userspace API, and AFACT there aren't really any cases where it is
> > beneficial to have a hugepage sharded into 4K mappings when those
> > mappings can be collapsed.
>
> The reason that we don't automatically collapse mappings is because it
> would take additional complexity, and it is less flexible. Consider
> the case of 1G pages on x86: currently, userspace can collapse the
> whole page when it's all ready, but they can also choose to collapse a
> 2M piece of it. On architectures with more supported hugepage sizes
> (e.g., arm64), userspace has even more possibilities for when to
> collapse. This likely further complicates a potential
> automatic-collapse solution. Userspace may also want to collapse the
> mapping for an entire hugepage without completely mapping the hugepage
> first (this would also be possible by issuing UFFDIO_CONTINUE on all
> the holes, though).
>

To be honest I'm don't think I'm a fan of this. I don't think this
saves complexity, but rather pushes it to the userspace. I.e. the
userspace now must track which regions are faulted in and which are
not to call MADV_COLLAPSE at the right time. Also, if the userspace
gets it wrong it may accidentally not call MADV_COLLAPSE (and not get
any hugepages) or call MADV_COLLAPSE too early and have to deal with a
storm of maybe hundreds of minor faults at once which may take too
long to resolve and may impact guest stability, yes?

For these reasons I think automatic collapsing is something that will
eventually be implemented by us or someone else, and at that point
MADV_COLLAPSE for hugetlb memory will become obsolete; i.e. this patch
is adding a userspace API that will probably need to be maintained for
perpetuity but actually is likely going to be going obsolete "soon".
For this reason I had hoped that automatic collapsing would come with
V1.

I wonder if we can have a very simple first try at automatic
collapsing for V1? I.e., can we support collapsing to the hstate size
and only that? So 4K pages can only be either collapsed to 2MB or 1G
on x86 depending on the hstate size. I think this may be not too
difficult to implement: we can have a counter similar to mapcount that
tracks how many of the subpages are mapped (subpage_mapcount). Once
all the subpages are mapped (the counter reaches a certain value),
trigger collapsing similar to hstate size MADV_COLLAPSE.

I gather that no one else reviewing this has raised this issue thus
far so it might not be a big deal and I will continue to review the
RFC, but I had hoped for automatic collapsing myself for the reasons
above.

> >
> > > ---- HugeTLB Changes ----
> > >
> > > - Mapcount
> > > The way mapcount is handled is different from the way that it was handled
> > > before. If the PUD for a hugepage is not none, a hugepage's mapcount will
> > > be increased. This scheme means that, for hugepages that aren't mapped at
> > > high granularity, their mapcounts will remain the same as what they would
> > > have been pre-HGM.
> > >
> >
> > Sorry, I didn't quite follow this. It says mapcount is handled
> > differently, but the same if the page is not mapped at high
> > granularity. Can you elaborate on how the mapcount handling will be
> > different when the page is mapped at high granularity?
>
> I guess I didn't phrase this very well. For the sake of simplicity,
> consider 1G pages on x86, typically mapped with leaf-level PUDs.
> Previously, there were two possibilities for how a hugepage was
> mapped, either it was (1) completely mapped (PUD is present and a
> leaf), or (2) it wasn't mapped (PUD is none). Now we have a third
> case, where the PUD is not none but also not a leaf (this usually
> means that the page is partially mapped). We handle this case as if
> the whole page was mapped. That is, if we partially map a hugepage
> that was previously unmapped (making the PUD point to PMDs), we
> increment its mapcount, and if we completely unmap a partially mapped
> hugepage (making the PUD none), we decrement its mapcount. If we
> collapse a non-leaf PUD to a leaf PUD, we don't change mapcount.
>
> It is possible for a PUD to be present and not a leaf (mapcount has
> been incremented) but for the page to still be unmapped: if the PMDs
> (or PTEs) underneath are all none. This case is atypical, and as of
> this RFC (without bestowing MADV_DONTNEED with HGM flexibility), I
> think it would be very difficult to get this to happen.
>

Thank you for the detailed explanation. Please add it to the cover letter.

I wonder the case "PUD present but all the PMD are none": is that a
bug? I don't understand the usefulness of that. Not a comment on this
patch but rather a curiosity.

> >
> > > - Page table walking and manipulation
> > > A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
> > > high-granularity mappings. Eventually, it's possible to merge
> > > hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.
> > >
> > > We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
> > > This is because we generally need to know the "size" of a PTE (previously
> > > always just huge_page_size(hstate)).
> > >
> > > For every page table manipulation function that has a huge version (e.g.
> > > huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
> > > hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
> > > PTE really is "huge".
> > >
> > > - Synchronization
> > > For existing bits of HugeTLB, synchronization is unchanged. For splitting
> > > and collapsing HugeTLB PTEs, we require that the i_mmap_rw_sem is held for
> > > writing, and for doing high-granularity page table walks, we require it to
> > > be held for reading.
> > >
> > > ---- Limitations & Future Changes ----
> > >
> > > This patch series only implements high-granularity mapping for VM_SHARED
> > > VMAs.  I intend to implement enough HGM to support 4K unmapping for memory
> > > failure recovery for both shared and private mappings.
> > >
> > > The memory failure use case poses its own challenges that can be
> > > addressed, but I will do so in a separate RFC.
> > >
> > > Performance has not been heavily scrutinized with this patch series. There
> > > are places where lock contention can significantly reduce performance. This
> > > will be addressed later.
> > >
> > > The patch series, as it stands right now, is compatible with the VMEMMAP
> > > page struct optimization[3], as we do not need to modify data contained
> > > in the subpage page structs.
> > >
> > > Other omissions:
> > >  - Compatibility with userfaultfd write-protect (will be included in v1).
> > >  - Support for mremap() (will be included in v1). This looks a lot like
> > >    the support we have for fork().
> > >  - Documentation changes (will be included in v1).
> > >  - Completely ignores PMD sharing and hugepage migration (will be included
> > >    in v1).
> > >  - Implementations for architectures that don't use GENERAL_HUGETLB other
> > >    than arm64.
> > >
> > > ---- Patch Breakdown ----
> > >
> > > Patch 1     - Preliminary changes
> > > Patch 2-10  - HugeTLB HGM core changes
> > > Patch 11-13 - HugeTLB HGM page table walking functionality
> > > Patch 14-19 - HugeTLB HGM compatibility with other bits
> > > Patch 20-23 - Userfaultfd and collapse changes
> > > Patch 24-26 - arm64 support and selftests
> > >
> > > [1] This used to be called HugeTLB double mapping, a bad and confusing
> > >     name. "High-granularity mapping" is not a great name either. I am open
> > >     to better names.
> >
> > I would drop 1 extra word and do "granular mapping", as in the mapping
> > is more granular than what it normally is (2MB/1G, etc).
>
> Noted. :)

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-28 17:26     ` Mina Almasry
@ 2022-06-28 17:56       ` Dr. David Alan Gilbert
  2022-06-29 18:31         ` James Houghton
  2022-06-29 20:39       ` Axel Rasmussen
  1 sibling, 1 reply; 128+ messages in thread
From: Dr. David Alan Gilbert @ 2022-06-28 17:56 UTC (permalink / raw)
  To: Mina Almasry
  Cc: James Houghton, Mike Kravetz, Muchun Song, Peter Xu,
	David Hildenbrand, David Rientjes, Axel Rasmussen, Jue Wang,
	Manish Mishra, linux-mm, linux-kernel

* Mina Almasry (almasrymina@google.com) wrote:
> On Mon, Jun 27, 2022 at 9:27 AM James Houghton <jthoughton@google.com> wrote:
> >
> > On Fri, Jun 24, 2022 at 11:41 AM Mina Almasry <almasrymina@google.com> wrote:
> > >
> > > On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> > > >
> > > > [trimmed...]
> > > > ---- Userspace API ----
> > > >
> > > > This patch series introduces a single way to take advantage of
> > > > high-granularity mapping: via UFFDIO_CONTINUE. UFFDIO_CONTINUE allows
> > > > userspace to resolve MINOR page faults on shared VMAs.
> > > >
> > > > To collapse a HugeTLB address range that has been mapped with several
> > > > UFFDIO_CONTINUE operations, userspace can issue MADV_COLLAPSE. We expect
> > > > userspace to know when all pages (that they care about) have been fetched.
> > > >
> > >
> > > Thanks James! Cover letter looks good. A few questions:
> > >
> > > Why not have the kernel collapse the hugepage once all the 4K pages
> > > have been fetched automatically? It would remove the need for a new
> > > userspace API, and AFACT there aren't really any cases where it is
> > > beneficial to have a hugepage sharded into 4K mappings when those
> > > mappings can be collapsed.
> >
> > The reason that we don't automatically collapse mappings is because it
> > would take additional complexity, and it is less flexible. Consider
> > the case of 1G pages on x86: currently, userspace can collapse the
> > whole page when it's all ready, but they can also choose to collapse a
> > 2M piece of it. On architectures with more supported hugepage sizes
> > (e.g., arm64), userspace has even more possibilities for when to
> > collapse. This likely further complicates a potential
> > automatic-collapse solution. Userspace may also want to collapse the
> > mapping for an entire hugepage without completely mapping the hugepage
> > first (this would also be possible by issuing UFFDIO_CONTINUE on all
> > the holes, though).
> >
> 
> To be honest I'm don't think I'm a fan of this. I don't think this
> saves complexity, but rather pushes it to the userspace. I.e. the
> userspace now must track which regions are faulted in and which are
> not to call MADV_COLLAPSE at the right time. Also, if the userspace
> gets it wrong it may accidentally not call MADV_COLLAPSE (and not get
> any hugepages) or call MADV_COLLAPSE too early and have to deal with a
> storm of maybe hundreds of minor faults at once which may take too
> long to resolve and may impact guest stability, yes?

I think it depends on whether the userspace is already holding bitmaps
and data structures to let it know when the right time to call collapse
is; if it already has to do all that book keeping for it's own postcopy
or whatever process, then getting userspace to call it is easy.
(I don't know the answer to whether it does have!)

Dave

> For these reasons I think automatic collapsing is something that will
> eventually be implemented by us or someone else, and at that point
> MADV_COLLAPSE for hugetlb memory will become obsolete; i.e. this patch
> is adding a userspace API that will probably need to be maintained for
> perpetuity but actually is likely going to be going obsolete "soon".
> For this reason I had hoped that automatic collapsing would come with
> V1.
> 
> I wonder if we can have a very simple first try at automatic
> collapsing for V1? I.e., can we support collapsing to the hstate size
> and only that? So 4K pages can only be either collapsed to 2MB or 1G
> on x86 depending on the hstate size. I think this may be not too
> difficult to implement: we can have a counter similar to mapcount that
> tracks how many of the subpages are mapped (subpage_mapcount). Once
> all the subpages are mapped (the counter reaches a certain value),
> trigger collapsing similar to hstate size MADV_COLLAPSE.
> 
> I gather that no one else reviewing this has raised this issue thus
> far so it might not be a big deal and I will continue to review the
> RFC, but I had hoped for automatic collapsing myself for the reasons
> above.
> 
> > >
> > > > ---- HugeTLB Changes ----
> > > >
> > > > - Mapcount
> > > > The way mapcount is handled is different from the way that it was handled
> > > > before. If the PUD for a hugepage is not none, a hugepage's mapcount will
> > > > be increased. This scheme means that, for hugepages that aren't mapped at
> > > > high granularity, their mapcounts will remain the same as what they would
> > > > have been pre-HGM.
> > > >
> > >
> > > Sorry, I didn't quite follow this. It says mapcount is handled
> > > differently, but the same if the page is not mapped at high
> > > granularity. Can you elaborate on how the mapcount handling will be
> > > different when the page is mapped at high granularity?
> >
> > I guess I didn't phrase this very well. For the sake of simplicity,
> > consider 1G pages on x86, typically mapped with leaf-level PUDs.
> > Previously, there were two possibilities for how a hugepage was
> > mapped, either it was (1) completely mapped (PUD is present and a
> > leaf), or (2) it wasn't mapped (PUD is none). Now we have a third
> > case, where the PUD is not none but also not a leaf (this usually
> > means that the page is partially mapped). We handle this case as if
> > the whole page was mapped. That is, if we partially map a hugepage
> > that was previously unmapped (making the PUD point to PMDs), we
> > increment its mapcount, and if we completely unmap a partially mapped
> > hugepage (making the PUD none), we decrement its mapcount. If we
> > collapse a non-leaf PUD to a leaf PUD, we don't change mapcount.
> >
> > It is possible for a PUD to be present and not a leaf (mapcount has
> > been incremented) but for the page to still be unmapped: if the PMDs
> > (or PTEs) underneath are all none. This case is atypical, and as of
> > this RFC (without bestowing MADV_DONTNEED with HGM flexibility), I
> > think it would be very difficult to get this to happen.
> >
> 
> Thank you for the detailed explanation. Please add it to the cover letter.
> 
> I wonder the case "PUD present but all the PMD are none": is that a
> bug? I don't understand the usefulness of that. Not a comment on this
> patch but rather a curiosity.
> 
> > >
> > > > - Page table walking and manipulation
> > > > A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
> > > > high-granularity mappings. Eventually, it's possible to merge
> > > > hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.
> > > >
> > > > We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
> > > > This is because we generally need to know the "size" of a PTE (previously
> > > > always just huge_page_size(hstate)).
> > > >
> > > > For every page table manipulation function that has a huge version (e.g.
> > > > huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
> > > > hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
> > > > PTE really is "huge".
> > > >
> > > > - Synchronization
> > > > For existing bits of HugeTLB, synchronization is unchanged. For splitting
> > > > and collapsing HugeTLB PTEs, we require that the i_mmap_rw_sem is held for
> > > > writing, and for doing high-granularity page table walks, we require it to
> > > > be held for reading.
> > > >
> > > > ---- Limitations & Future Changes ----
> > > >
> > > > This patch series only implements high-granularity mapping for VM_SHARED
> > > > VMAs.  I intend to implement enough HGM to support 4K unmapping for memory
> > > > failure recovery for both shared and private mappings.
> > > >
> > > > The memory failure use case poses its own challenges that can be
> > > > addressed, but I will do so in a separate RFC.
> > > >
> > > > Performance has not been heavily scrutinized with this patch series. There
> > > > are places where lock contention can significantly reduce performance. This
> > > > will be addressed later.
> > > >
> > > > The patch series, as it stands right now, is compatible with the VMEMMAP
> > > > page struct optimization[3], as we do not need to modify data contained
> > > > in the subpage page structs.
> > > >
> > > > Other omissions:
> > > >  - Compatibility with userfaultfd write-protect (will be included in v1).
> > > >  - Support for mremap() (will be included in v1). This looks a lot like
> > > >    the support we have for fork().
> > > >  - Documentation changes (will be included in v1).
> > > >  - Completely ignores PMD sharing and hugepage migration (will be included
> > > >    in v1).
> > > >  - Implementations for architectures that don't use GENERAL_HUGETLB other
> > > >    than arm64.
> > > >
> > > > ---- Patch Breakdown ----
> > > >
> > > > Patch 1     - Preliminary changes
> > > > Patch 2-10  - HugeTLB HGM core changes
> > > > Patch 11-13 - HugeTLB HGM page table walking functionality
> > > > Patch 14-19 - HugeTLB HGM compatibility with other bits
> > > > Patch 20-23 - Userfaultfd and collapse changes
> > > > Patch 24-26 - arm64 support and selftests
> > > >
> > > > [1] This used to be called HugeTLB double mapping, a bad and confusing
> > > >     name. "High-granularity mapping" is not a great name either. I am open
> > > >     to better names.
> > >
> > > I would drop 1 extra word and do "granular mapping", as in the mapping
> > > is more granular than what it normally is (2MB/1G, etc).
> >
> > Noted. :)
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
  2022-06-27 12:28   ` manish.mishra
@ 2022-06-28 20:03     ` Mina Almasry
  0 siblings, 0 replies; 128+ messages in thread
From: Mina Almasry @ 2022-06-28 20:03 UTC (permalink / raw)
  To: manish.mishra
  Cc: James Houghton, Mike Kravetz, Muchun Song, Peter Xu,
	David Hildenbrand, David Rientjes, Axel Rasmussen, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 5:29 AM manish.mishra <manish.mishra@nutanix.com> wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
> > This adds the Kconfig to enable or disable high-granularity mapping. It
> > is enabled by default for architectures that use
> > ARCH_WANT_GENERAL_HUGETLB.
> >
> > There is also an arch-specific config ARCH_HAS_SPECIAL_HUGETLB_HGM which
> > controls whether or not the architecture has been updated to support
> > HGM if it doesn't use general HugeTLB.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> reviewed-by:manish.mishra@nutanix.com

Mostly minor nits,

Reviewed-by: Mina Almasry <almasrymina@google.com>

> > ---
> >   fs/Kconfig | 7 +++++++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/fs/Kconfig b/fs/Kconfig
> > index 5976eb33535f..d76c7d812656 100644
> > --- a/fs/Kconfig
> > +++ b/fs/Kconfig
> > @@ -268,6 +268,13 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON
> >         to enable optimizing vmemmap pages of HugeTLB by default. It can then
> >         be disabled on the command line via hugetlb_free_vmemmap=off.
> >
> > +config ARCH_HAS_SPECIAL_HUGETLB_HGM

Nit: would have preferred just ARCH_HAS_HUGETLB_HGM, as ARCH implies
arch-specific.

> > +     bool
> > +
> > +config HUGETLB_HIGH_GRANULARITY_MAPPING
> > +     def_bool ARCH_WANT_GENERAL_HUGETLB || ARCH_HAS_SPECIAL_HUGETLB_HGM

Nit: would have preferred to go with either HGM _or_
HIGH_GRANULARITY_MAPPING (or whatever new name comes up), rather than
both, for consistency's sake.

> > +     depends on HUGETLB_PAGE
> > +
> >   config MEMFD_CREATE
> >       def_bool TMPFS || HUGETLBFS
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
  2022-06-25  2:59   ` kernel test robot
  2022-06-27 12:47   ` manish.mishra
@ 2022-06-28 20:25   ` Mina Almasry
  2022-06-29 16:42     ` James Houghton
  2022-06-28 20:44   ` Mike Kravetz
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 128+ messages in thread
From: Mina Almasry @ 2022-06-28 20:25 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
>
> After high-granularity mapping, page table entries for HugeTLB pages can
> be of any size/type. (For example, we can have a 1G page mapped with a
> mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> PTE after we have done a page table walk.
>
> Without this, we'd have to pass around the "size" of the PTE everywhere.
> We effectively did this before; it could be fetched from the hstate,
> which we pass around pretty much everywhere.
>
> This commit includes definitions for some basic helper functions that
> are used later. These helper functions wrap existing PTE
> inspection/modification functions, where the correct version is picked
> depending on if the HugeTLB PTE is actually "huge" or not. (Previously,
> all HugeTLB PTEs were "huge").
>
> For example, hugetlb_ptep_get wraps huge_ptep_get and ptep_get, where
> ptep_get is used when the HugeTLB PTE is PAGE_SIZE, and huge_ptep_get is
> used in all other cases.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/hugetlb.h | 84 +++++++++++++++++++++++++++++++++++++++++
>  mm/hugetlb.c            | 57 ++++++++++++++++++++++++++++
>  2 files changed, 141 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 5fe1db46d8c9..1d4ec9dfdebf 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -46,6 +46,68 @@ enum {
>         __NR_USED_SUBPAGE,
>  };
>
> +struct hugetlb_pte {
> +       pte_t *ptep;
> +       unsigned int shift;
> +};
> +
> +static inline
> +void hugetlb_pte_init(struct hugetlb_pte *hpte)
> +{
> +       hpte->ptep = NULL;

shift = 0; ?

> +}
> +
> +static inline
> +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep,
> +                         unsigned int shift)
> +{
> +       BUG_ON(!ptep);
> +       hpte->ptep = ptep;
> +       hpte->shift = shift;
> +}
> +
> +static inline
> +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte)
> +{
> +       BUG_ON(!hpte->ptep);
> +       return 1UL << hpte->shift;
> +}
> +

This helper is quite redundant in my opinion.

> +static inline
> +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte)
> +{
> +       BUG_ON(!hpte->ptep);
> +       return ~(hugetlb_pte_size(hpte) - 1);
> +}
> +
> +static inline
> +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte)
> +{
> +       BUG_ON(!hpte->ptep);
> +       return hpte->shift;
> +}
> +

This one jumps as quite redundant too.

> +static inline
> +bool hugetlb_pte_huge(const struct hugetlb_pte *hpte)
> +{
> +       return !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) ||
> +               hugetlb_pte_shift(hpte) > PAGE_SHIFT;
> +}
> +

I'm guessing the !IS_ENABLED() check is because only the HGM code
would store a non-huge pte in a hugetlb_pte struct. I think it's a bit
fragile because anyone can add code in the future that uses
hugetlb_pte in unexpected ways, but I will concede that it is correct
as written.

> +static inline
> +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
> +{
> +       dest->ptep = src->ptep;
> +       dest->shift = src->shift;
> +}
> +
> +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte);
> +bool hugetlb_pte_none(const struct hugetlb_pte *hpte);
> +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
> +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
> +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> +                      unsigned long address);
> +
>  struct hugepage_subpool {
>         spinlock_t lock;
>         long count;
> @@ -1130,6 +1192,28 @@ static inline spinlock_t *huge_pte_lock_shift(unsigned int shift,
>         return ptl;
>  }
>
> +static inline

Maybe for organization, move all the static functions you're adding
above the hugetlb_pte_* declarations you're adding?

> +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> +{
> +
> +       BUG_ON(!hpte->ptep);
> +       // Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
> +       // the regular page table lock.

Does checkpatch.pl not complain about // style comments? I think those
are not allowed, no?

> +       if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> +               return huge_pte_lockptr(hugetlb_pte_shift(hpte),
> +                               mm, hpte->ptep);
> +       return &mm->page_table_lock;
> +}
> +
> +static inline
> +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
> +{
> +       spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
> +
> +       spin_lock(ptl);
> +       return ptl;
> +}
> +
>  #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
>  extern void __init hugetlb_cma_reserve(int order);
>  extern void __init hugetlb_cma_check(void);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d6d0d4c03def..1a1434e29740 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1120,6 +1120,63 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
>         return false;
>  }
>
> +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
> +{
> +       pgd_t pgd;
> +       p4d_t p4d;
> +       pud_t pud;
> +       pmd_t pmd;
> +
> +       BUG_ON(!hpte->ptep);
> +       if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {
> +               pgd = *(pgd_t *)hpte->ptep;
> +               return pgd_present(pgd) && pgd_leaf(pgd);
> +       } else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
> +               p4d = *(p4d_t *)hpte->ptep;
> +               return p4d_present(p4d) && p4d_leaf(p4d);
> +       } else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
> +               pud = *(pud_t *)hpte->ptep;
> +               return pud_present(pud) && pud_leaf(pud);
> +       } else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
> +               pmd = *(pmd_t *)hpte->ptep;
> +               return pmd_present(pmd) && pmd_leaf(pmd);
> +       } else if (hugetlb_pte_size(hpte) >= PAGE_SIZE)
> +               return pte_present(*hpte->ptep);

The use of >= is a bit curious to me. Shouldn't these be ==?

Also probably doesn't matter but I was thinking to use *_SHIFTs
instead of *_SIZE so you don't have to calculate the size 5 times in
this routine, or calculate hugetlb_pte_size() once for some less code
duplication and re-use?

> +       BUG();
> +}
> +
> +bool hugetlb_pte_none(const struct hugetlb_pte *hpte)
> +{
> +       if (hugetlb_pte_huge(hpte))
> +               return huge_pte_none(huge_ptep_get(hpte->ptep));
> +       return pte_none(ptep_get(hpte->ptep));
> +}
> +
> +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte)
> +{
> +       if (hugetlb_pte_huge(hpte))
> +               return huge_pte_none_mostly(huge_ptep_get(hpte->ptep));
> +       return pte_none_mostly(ptep_get(hpte->ptep));
> +}
> +
> +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte)
> +{
> +       if (hugetlb_pte_huge(hpte))
> +               return huge_ptep_get(hpte->ptep);
> +       return ptep_get(hpte->ptep);
> +}
> +
> +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> +                      unsigned long address)
> +{
> +       BUG_ON(!hpte->ptep);
> +       unsigned long sz = hugetlb_pte_size(hpte);
> +
> +       if (sz > PAGE_SIZE)
> +               return huge_pte_clear(mm, address, hpte->ptep, sz);
> +       return pte_clear(mm, address, hpte->ptep);
> +}
> +
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>         int nid = page_to_nid(page);
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures
  2022-06-24 17:36 ` [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures James Houghton
  2022-06-27 12:52   ` manish.mishra
@ 2022-06-28 20:27   ` Mina Almasry
  1 sibling, 0 replies; 128+ messages in thread
From: Mina Almasry @ 2022-06-28 20:27 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
>
> This is a helper function for freeing the bits of the page table that
> map a particular HugeTLB PTE.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/hugetlb.h |  2 ++
>  mm/hugetlb.c            | 17 +++++++++++++++++
>  2 files changed, 19 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 1d4ec9dfdebf..33ba48fac551 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -107,6 +107,8 @@ bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
>  pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
>  void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
>                        unsigned long address);
> +void hugetlb_free_range(struct mmu_gather *tlb, const struct hugetlb_pte *hpte,
> +                       unsigned long start, unsigned long end);
>
>  struct hugepage_subpool {
>         spinlock_t lock;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1a1434e29740..a2d2ffa76173 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1120,6 +1120,23 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
>         return false;
>  }
>
> +void hugetlb_free_range(struct mmu_gather *tlb, const struct hugetlb_pte *hpte,
> +                       unsigned long start, unsigned long end)
> +{
> +       unsigned long floor = start & hugetlb_pte_mask(hpte);
> +       unsigned long ceiling = floor + hugetlb_pte_size(hpte);
> +
> +       if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {
> +               free_p4d_range(tlb, (pgd_t *)hpte->ptep, start, end, floor, ceiling);
> +       } else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
> +               free_pud_range(tlb, (p4d_t *)hpte->ptep, start, end, floor, ceiling);
> +       } else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
> +               free_pmd_range(tlb, (pud_t *)hpte->ptep, start, end, floor, ceiling);
> +       } else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
> +               free_pte_range(tlb, (pmd_t *)hpte->ptep, start);
> +       }

Same as the previous patch: I wonder about >=, and if possible
calculate hugetlb_pte_size() once, or use *_SHIFT comparison.

> +}
> +
>  bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
>  {
>         pgd_t pgd;
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled
  2022-06-24 17:36 ` [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled James Houghton
  2022-06-27 12:55   ` manish.mishra
@ 2022-06-28 20:33   ` Mina Almasry
  2022-09-08 18:07   ` Peter Xu
  2 siblings, 0 replies; 128+ messages in thread
From: Mina Almasry @ 2022-06-28 20:33 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
>
> Currently, this is always true if the VMA is shared. In the future, it's
> possible that private mappings will get some or all HGM functionality.
>
> Signed-off-by: James Houghton <jthoughton@google.com>

Reviewed-by: Mina Almasry <almasrymina@google.com>

> ---
>  include/linux/hugetlb.h | 10 ++++++++++
>  mm/hugetlb.c            |  8 ++++++++
>  2 files changed, 18 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 33ba48fac551..e7a6b944d0cc 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1174,6 +1174,16 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>  }
>  #endif /* CONFIG_HUGETLB_PAGE */
>
> +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> +/* If HugeTLB high-granularity mappings are enabled for this VMA. */
> +bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
> +#else
> +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> +{
> +       return false;
> +}
> +#endif
> +
>  static inline spinlock_t *huge_pte_lock(struct hstate *h,
>                                         struct mm_struct *mm, pte_t *pte)
>  {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a2d2ffa76173..8b10b941458d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6983,6 +6983,14 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>
>  #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>
> +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> +bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> +{
> +       /* All shared VMAs have HGM enabled. */

Personally I find the comment redundant; the next line does just that.

What about VM_MAYSHARE? Should those also have HGM enabled?

Is it possible to get some docs with this series for V1? This would be
something to highlight in the docs.

> +       return vma->vm_flags & VM_SHARED;
> +}
> +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
> +
>  /*
>   * These functions are overwritable if your architecture needs its own
>   * behavior.
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 06/26] mm: make free_p?d_range functions public
  2022-06-24 17:36 ` [RFC PATCH 06/26] mm: make free_p?d_range functions public James Houghton
  2022-06-27 12:31   ` manish.mishra
@ 2022-06-28 20:35   ` Mike Kravetz
  2022-07-12 20:52     ` James Houghton
  1 sibling, 1 reply; 128+ messages in thread
From: Mike Kravetz @ 2022-06-28 20:35 UTC (permalink / raw)
  To: James Houghton
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/24/22 17:36, James Houghton wrote:
> This makes them usable for HugeTLB page table freeing operations.
> After HugeTLB high-granularity mapping, the page table for a HugeTLB VMA
> can get more complex, and these functions handle freeing page tables
> generally.
> 

Hmmmm?

free_pgd_range is not generally called directly for hugetlb mappings.
There is a wrapper hugetlb_free_pgd_range which can have architecture
specific implementations.  It makes me wonder if these lower level
routines can be directly used on hugetlb mappings.  My 'guess' is that any
such details will be hidden in the callers.  Suspect this will become clear
in later patches.
-- 
Mike Kravetz

> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/mm.h | 7 +++++++
>  mm/memory.c        | 8 ++++----
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bc8f326be0ce..07f5da512147 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1847,6 +1847,13 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>  
>  struct mmu_notifier_range;
>  
> +void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, unsigned long addr);
> +void free_pmd_range(struct mmu_gather *tlb, pud_t *pud, unsigned long addr,
> +		unsigned long end, unsigned long floor, unsigned long ceiling);
> +void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d, unsigned long addr,
> +		unsigned long end, unsigned long floor, unsigned long ceiling);
> +void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd, unsigned long addr,
> +		unsigned long end, unsigned long floor, unsigned long ceiling);
>  void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
>  		unsigned long end, unsigned long floor, unsigned long ceiling);
>  int
> diff --git a/mm/memory.c b/mm/memory.c
> index 7a089145cad4..bb3b9b5b94fb 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -227,7 +227,7 @@ static void check_sync_rss_stat(struct task_struct *task)
>   * Note: this doesn't free the actual pages themselves. That
>   * has been handled earlier when unmapping all the memory regions.
>   */
> -static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
> +void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
>  			   unsigned long addr)
>  {
>  	pgtable_t token = pmd_pgtable(*pmd);
> @@ -236,7 +236,7 @@ static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
>  	mm_dec_nr_ptes(tlb->mm);
>  }
>  
> -static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
> +inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
>  				unsigned long addr, unsigned long end,
>  				unsigned long floor, unsigned long ceiling)
>  {
> @@ -270,7 +270,7 @@ static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
>  	mm_dec_nr_pmds(tlb->mm);
>  }
>  
> -static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
> +inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
>  				unsigned long addr, unsigned long end,
>  				unsigned long floor, unsigned long ceiling)
>  {
> @@ -304,7 +304,7 @@ static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
>  	mm_dec_nr_puds(tlb->mm);
>  }
>  
> -static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
> +inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
>  				unsigned long addr, unsigned long end,
>  				unsigned long floor, unsigned long ceiling)
>  {
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
                     ` (2 preceding siblings ...)
  2022-06-28 20:25   ` Mina Almasry
@ 2022-06-28 20:44   ` Mike Kravetz
  2022-06-29 16:24     ` James Houghton
  2022-07-11 23:32   ` Mike Kravetz
  2022-09-08 17:38   ` Peter Xu
  5 siblings, 1 reply; 128+ messages in thread
From: Mike Kravetz @ 2022-06-28 20:44 UTC (permalink / raw)
  To: James Houghton
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/24/22 17:36, James Houghton wrote:
> After high-granularity mapping, page table entries for HugeTLB pages can
> be of any size/type. (For example, we can have a 1G page mapped with a
> mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> PTE after we have done a page table walk.
> 
> Without this, we'd have to pass around the "size" of the PTE everywhere.
> We effectively did this before; it could be fetched from the hstate,
> which we pass around pretty much everywhere.
> 
> This commit includes definitions for some basic helper functions that
> are used later. These helper functions wrap existing PTE
> inspection/modification functions, where the correct version is picked
> depending on if the HugeTLB PTE is actually "huge" or not. (Previously,
> all HugeTLB PTEs were "huge").
> 
> For example, hugetlb_ptep_get wraps huge_ptep_get and ptep_get, where
> ptep_get is used when the HugeTLB PTE is PAGE_SIZE, and huge_ptep_get is
> used in all other cases.
> 
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/hugetlb.h | 84 +++++++++++++++++++++++++++++++++++++++++
>  mm/hugetlb.c            | 57 ++++++++++++++++++++++++++++
>  2 files changed, 141 insertions(+)

There is nothing 'wrong' with this patch, but it does make me wonder.
After introducing hugetlb_pte, is all code dealing with hugetlb mappings
going to be using hugetlb_ptes?  It would be quite confusing if there is
a mix of hugetlb_ptes and non-hugetlb_ptes.  This will be revealed later
in the series, but a comment about suture direction would be helpful
here.
-- 
Mike Kravetz

> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 5fe1db46d8c9..1d4ec9dfdebf 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -46,6 +46,68 @@ enum {
>  	__NR_USED_SUBPAGE,
>  };
>  
> +struct hugetlb_pte {
> +	pte_t *ptep;
> +	unsigned int shift;
> +};
> +
> +static inline
> +void hugetlb_pte_init(struct hugetlb_pte *hpte)
> +{
> +	hpte->ptep = NULL;
> +}
> +
> +static inline
> +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep,
> +			  unsigned int shift)
> +{
> +	BUG_ON(!ptep);
> +	hpte->ptep = ptep;
> +	hpte->shift = shift;
> +}
> +
> +static inline
> +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte)
> +{
> +	BUG_ON(!hpte->ptep);
> +	return 1UL << hpte->shift;
> +}
> +
> +static inline
> +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte)
> +{
> +	BUG_ON(!hpte->ptep);
> +	return ~(hugetlb_pte_size(hpte) - 1);
> +}
> +
> +static inline
> +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte)
> +{
> +	BUG_ON(!hpte->ptep);
> +	return hpte->shift;
> +}
> +
> +static inline
> +bool hugetlb_pte_huge(const struct hugetlb_pte *hpte)
> +{
> +	return !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) ||
> +		hugetlb_pte_shift(hpte) > PAGE_SHIFT;
> +}
> +
> +static inline
> +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
> +{
> +	dest->ptep = src->ptep;
> +	dest->shift = src->shift;
> +}
> +
> +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte);
> +bool hugetlb_pte_none(const struct hugetlb_pte *hpte);
> +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
> +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
> +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> +		       unsigned long address);
> +
>  struct hugepage_subpool {
>  	spinlock_t lock;
>  	long count;
> @@ -1130,6 +1192,28 @@ static inline spinlock_t *huge_pte_lock_shift(unsigned int shift,
>  	return ptl;
>  }
>  
> +static inline
> +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> +{
> +
> +	BUG_ON(!hpte->ptep);
> +	// Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
> +	// the regular page table lock.
> +	if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> +		return huge_pte_lockptr(hugetlb_pte_shift(hpte),
> +				mm, hpte->ptep);
> +	return &mm->page_table_lock;
> +}
> +
> +static inline
> +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
> +{
> +	spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
> +
> +	spin_lock(ptl);
> +	return ptl;
> +}
> +
>  #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
>  extern void __init hugetlb_cma_reserve(int order);
>  extern void __init hugetlb_cma_check(void);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d6d0d4c03def..1a1434e29740 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1120,6 +1120,63 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
>  	return false;
>  }
>  
> +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
> +{
> +	pgd_t pgd;
> +	p4d_t p4d;
> +	pud_t pud;
> +	pmd_t pmd;
> +
> +	BUG_ON(!hpte->ptep);
> +	if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {
> +		pgd = *(pgd_t *)hpte->ptep;
> +		return pgd_present(pgd) && pgd_leaf(pgd);
> +	} else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
> +		p4d = *(p4d_t *)hpte->ptep;
> +		return p4d_present(p4d) && p4d_leaf(p4d);
> +	} else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
> +		pud = *(pud_t *)hpte->ptep;
> +		return pud_present(pud) && pud_leaf(pud);
> +	} else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
> +		pmd = *(pmd_t *)hpte->ptep;
> +		return pmd_present(pmd) && pmd_leaf(pmd);
> +	} else if (hugetlb_pte_size(hpte) >= PAGE_SIZE)
> +		return pte_present(*hpte->ptep);
> +	BUG();
> +}
> +
> +bool hugetlb_pte_none(const struct hugetlb_pte *hpte)
> +{
> +	if (hugetlb_pte_huge(hpte))
> +		return huge_pte_none(huge_ptep_get(hpte->ptep));
> +	return pte_none(ptep_get(hpte->ptep));
> +}
> +
> +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte)
> +{
> +	if (hugetlb_pte_huge(hpte))
> +		return huge_pte_none_mostly(huge_ptep_get(hpte->ptep));
> +	return pte_none_mostly(ptep_get(hpte->ptep));
> +}
> +
> +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte)
> +{
> +	if (hugetlb_pte_huge(hpte))
> +		return huge_ptep_get(hpte->ptep);
> +	return ptep_get(hpte->ptep);
> +}
> +
> +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> +		       unsigned long address)
> +{
> +	BUG_ON(!hpte->ptep);
> +	unsigned long sz = hugetlb_pte_size(hpte);
> +
> +	if (sz > PAGE_SIZE)
> +		return huge_pte_clear(mm, address, hpte->ptep, sz);
> +	return pte_clear(mm, address, hpte->ptep);
> +}
> +
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>  	int nid = page_to_nid(page);
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift
  2022-06-24 17:36 ` [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift James Houghton
  2022-06-27 13:01   ` manish.mishra
@ 2022-06-28 21:58   ` Mina Almasry
  2022-07-07 21:39     ` Mike Kravetz
  2022-07-08 15:52     ` James Houghton
  1 sibling, 2 replies; 128+ messages in thread
From: Mina Almasry @ 2022-06-28 21:58 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
>
> This is a helper macro to loop through all the usable page sizes for a
> high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will
> loop, in descending order, through the page sizes that HugeTLB supports
> for this architecture; it always includes PAGE_SIZE.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  mm/hugetlb.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8b10b941458d..557b0afdb503 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6989,6 +6989,16 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
>         /* All shared VMAs have HGM enabled. */
>         return vma->vm_flags & VM_SHARED;
>  }
> +static unsigned int __shift_for_hstate(struct hstate *h)
> +{
> +       if (h >= &hstates[hugetlb_max_hstate])
> +               return PAGE_SHIFT;

h > &hstates[hugetlb_max_hstate] means that h is out of bounds, no? am
I missing something here?

So is this intending to do:

if (h == hstates[hugetlb_max_hstate]
    return PAGE_SHIFT;

? If so, could we write it as so?

I'm also wondering why __shift_for_hstate(hstate[hugetlb_max_hstate])
== PAGE_SHIFT? Isn't the last hstate the smallest hstate which should
be 2MB on x86? Shouldn't this return PMD_SHIFT in that case?

> +       return huge_page_shift(h);
> +}
> +#define for_each_hgm_shift(hstate, tmp_h, shift) \
> +       for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
> +                              (tmp_h) <= &hstates[hugetlb_max_hstate]; \
> +                              (tmp_h)++)
>  #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
>
>  /*
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-27 20:51   ` Mike Kravetz
  2022-06-28 15:29     ` James Houghton
@ 2022-06-29  6:09     ` Muchun Song
  2022-06-29 21:03       ` Mike Kravetz
  1 sibling, 1 reply; 128+ messages in thread
From: Muchun Song @ 2022-06-29  6:09 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: James Houghton, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 01:51:53PM -0700, Mike Kravetz wrote:
> On 06/24/22 17:36, James Houghton wrote:
> > This is needed to handle PTL locking with high-granularity mapping. We
> > won't always be using the PMD-level PTL even if we're using the 2M
> > hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> > case, we need to lock the PTL for the 4K PTE.
> 
> I'm not really sure why this would be required.
> Why not use the PMD level lock for 4K PTEs?  Seems that would scale better
> with less contention than using the more coarse mm lock.  
>

Your words make me thing of another question unrelated to this patch.
We __know__ that arm64 supports continues PTE HugeTLB. huge_pte_lockptr()
did not consider this case, in this case, those HugeTLB pages are contended
with mm lock. Seems we should optimize this case. Something like:

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 0d790fa3f297..68a1e071bfc0 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -893,7 +893,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
                                           struct mm_struct *mm, pte_t *pte)
 {
-       if (huge_page_size(h) == PMD_SIZE)
+       if (huge_page_size(h) <= PMD_SIZE)
                return pmd_lockptr(mm, (pmd_t *) pte);
        VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
        return &mm->page_table_lock;

I did not check if elsewhere needs to be changed as well. Just a primary
thought.

Thanks.
 
> -- 
> Mike Kravetz
> 

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 01/26] hugetlb: make hstate accessor functions const
  2022-06-24 17:36 ` [RFC PATCH 01/26] hugetlb: make hstate accessor functions const James Houghton
  2022-06-24 18:43   ` Mina Almasry
       [not found]   ` <e55f90f5-ba14-5d6e-8f8f-abf731b9095e@nutanix.com>
@ 2022-06-29  6:18   ` Muchun Song
  2 siblings, 0 replies; 128+ messages in thread
From: Muchun Song @ 2022-06-29  6:18 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 05:36:31PM +0000, James Houghton wrote:
> This is just a const-correctness change so that the new hugetlb_pte
> changes can be const-correct too.
> 
> Acked-by: David Rientjes <rientjes@google.com>
> 
> Signed-off-by: James Houghton <jthoughton@google.com>

This is a good start. I also want to make those helpers take
const type parameter. Seems you have forgotten to update them
when !CONFIG_HUGETLB_PAGE.

Thanks.

> ---
>  include/linux/hugetlb.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index e4cff27d1198..498a4ae3d462 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -715,7 +715,7 @@ static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
>  	return hstate_file(vma->vm_file);
>  }
>  
> -static inline unsigned long huge_page_size(struct hstate *h)
> +static inline unsigned long huge_page_size(const struct hstate *h)
>  {
>  	return (unsigned long)PAGE_SIZE << h->order;
>  }
> @@ -729,27 +729,27 @@ static inline unsigned long huge_page_mask(struct hstate *h)
>  	return h->mask;
>  }
>  
> -static inline unsigned int huge_page_order(struct hstate *h)
> +static inline unsigned int huge_page_order(const struct hstate *h)
>  {
>  	return h->order;
>  }
>  
> -static inline unsigned huge_page_shift(struct hstate *h)
> +static inline unsigned huge_page_shift(const struct hstate *h)
>  {
>  	return h->order + PAGE_SHIFT;
>  }
>  
> -static inline bool hstate_is_gigantic(struct hstate *h)
> +static inline bool hstate_is_gigantic(const struct hstate *h)
>  {
>  	return huge_page_order(h) >= MAX_ORDER;
>  }
>  
> -static inline unsigned int pages_per_huge_page(struct hstate *h)
> +static inline unsigned int pages_per_huge_page(const struct hstate *h)
>  {
>  	return 1 << h->order;
>  }
>  
> -static inline unsigned int blocks_per_huge_page(struct hstate *h)
> +static inline unsigned int blocks_per_huge_page(const struct hstate *h)
>  {
>  	return huge_page_size(h) / 512;
>  }
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog
> 
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
  2022-06-28 15:40     ` James Houghton
@ 2022-06-29  6:39       ` Muchun Song
  2022-06-29 21:06         ` Mike Kravetz
  0 siblings, 1 reply; 128+ messages in thread
From: Muchun Song @ 2022-06-29  6:39 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Tue, Jun 28, 2022 at 08:40:27AM -0700, James Houghton wrote:
> On Mon, Jun 27, 2022 at 11:42 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 06/24/22 17:36, James Houghton wrote:
> > > When using HugeTLB high-granularity mapping, we need to go through the
> > > supported hugepage sizes in decreasing order so that we pick the largest
> > > size that works. Consider the case where we're faulting in a 1G hugepage
> > > for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> > > a PUD. By going through the sizes in decreasing order, we will find that
> > > PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
> > >
> > > Signed-off-by: James Houghton <jthoughton@google.com>
> > > ---
> > >  mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 37 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index a57e1be41401..5df838d86f32 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -33,6 +33,7 @@
> > >  #include <linux/migrate.h>
> > >  #include <linux/nospec.h>
> > >  #include <linux/delayacct.h>
> > > +#include <linux/sort.h>
> > >
> > >  #include <asm/page.h>
> > >  #include <asm/pgalloc.h>
> > > @@ -48,6 +49,10 @@
> > >
> > >  int hugetlb_max_hstate __read_mostly;
> > >  unsigned int default_hstate_idx;
> > > +/*
> > > + * After hugetlb_init_hstates is called, hstates will be sorted from largest
> > > + * to smallest.
> > > + */
> > >  struct hstate hstates[HUGE_MAX_HSTATE];
> > >
> > >  #ifdef CONFIG_CMA
> > > @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
> > >       kfree(node_alloc_noretry);
> > >  }
> > >
> > > +static int compare_hstates_decreasing(const void *a, const void *b)
> > > +{
> > > +     const int shift_a = huge_page_shift((const struct hstate *)a);
> > > +     const int shift_b = huge_page_shift((const struct hstate *)b);
> > > +
> > > +     if (shift_a < shift_b)
> > > +             return 1;
> > > +     if (shift_a > shift_b)
> > > +             return -1;
> > > +     return 0;
> > > +}
> > > +
> > > +static void sort_hstates(void)
> > > +{
> > > +     unsigned long default_hstate_sz = huge_page_size(&default_hstate);
> > > +
> > > +     /* Sort from largest to smallest. */
> > > +     sort(hstates, hugetlb_max_hstate, sizeof(*hstates),
> > > +          compare_hstates_decreasing, NULL);
> > > +
> > > +     /*
> > > +      * We may have changed the location of the default hstate, so we need to
> > > +      * update it.
> > > +      */
> > > +     default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz));
> > > +}
> > > +
> > >  static void __init hugetlb_init_hstates(void)
> > >  {
> > >       struct hstate *h, *h2;
> > >
> > > -     for_each_hstate(h) {
> > > -             if (minimum_order > huge_page_order(h))
> > > -                     minimum_order = huge_page_order(h);
> > > +     sort_hstates();
> > >
> > > +     /* The last hstate is now the smallest. */
> > > +     minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]);
> > > +
> > > +     for_each_hstate(h) {
> > >               /* oversize hugepages were init'ed in early boot */
> > >               if (!hstate_is_gigantic(h))
> > >                       hugetlb_hstate_alloc_pages(h);
> >
> > This may/will cause problems for gigantic hugetlb pages allocated at boot
> > time.  See alloc_bootmem_huge_page() where a pointer to the associated hstate
> > is encoded within the allocated hugetlb page.  These pages are added to
> > hugetlb pools by the routine gather_bootmem_prealloc() which uses the saved
> > hstate to add prep the gigantic page and add to the correct pool.  Currently,
> > gather_bootmem_prealloc is called after hugetlb_init_hstates.  So, changing
> > hstate order will cause errors.
> >
> > I do not see any reason why we could not call gather_bootmem_prealloc before
> > hugetlb_init_hstates to avoid this issue.
> 
> Thanks for catching this, Mike. Your suggestion certainly seems to
> work, but it also seems kind of error prone. I'll have to look at the
> code more closely, but maybe it would be better if I just maintained a
> separate `struct hstate *sorted_hstate_ptrs[]`, where the original

I don't think this is a good idea.  If you really rely on the order of
the initialization in this patch.  The easier solution is changing
huge_bootmem_page->hstate to huge_bootmem_page->hugepagesz. Then we
can use size_to_hstate(huge_bootmem_page->hugepagesz) in
gather_bootmem_prealloc().

Thanks.

> locations of the hstates remain unchanged, as to not break
> gather_bootmem_prealloc/other things.
> 
> > --
> > Mike Kravetz
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity
  2022-06-24 17:36 ` [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity James Houghton
@ 2022-06-29 14:11   ` manish.mishra
  0 siblings, 0 replies; 128+ messages in thread
From: manish.mishra @ 2022-06-29 14:11 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This function is to be used to do a HugeTLB page table walk where we may
> need to split a leaf-level huge PTE into a new page table level.
>
> Consider the case where we want to install 4K inside an empty 1G page:
> 1. We walk to the PUD and notice that it is pte_none.
> 2. We split the PUD by calling `hugetlb_split_to_shift`, creating a
>     standard PUD that points to PMDs that are all pte_none.
> 3. We continue the PT walk to find the PMD. We split it just like we
>     split the PUD.
> 4. We find the PTE and give it back to the caller.
>
> To avoid concurrent splitting operations on the same page table entry,
> we require that the mapping rwsem is held for writing while collapsing
> and for reading when doing a high-granularity PT walk.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   include/linux/hugetlb.h | 23 ++++++++++++++
>   mm/hugetlb.c            | 67 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 90 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 605aa19d8572..321f5745d87f 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1176,14 +1176,37 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>   }
>   #endif	/* CONFIG_HUGETLB_PAGE */
>   
> +enum split_mode {
> +	HUGETLB_SPLIT_NEVER   = 0,
> +	HUGETLB_SPLIT_NONE    = 1 << 0,
> +	HUGETLB_SPLIT_PRESENT = 1 << 1,
huge
> +	HUGETLB_SPLIT_ALWAYS  = HUGETLB_SPLIT_NONE | HUGETLB_SPLIT_PRESENT,
> +};
>   #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
>   /* If HugeTLB high-granularity mappings are enabled for this VMA. */
>   bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
> +int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
> +				    struct mm_struct *mm,
> +				    struct vm_area_struct *vma,
> +				    unsigned long addr,
> +				    unsigned int desired_sz,
> +				    enum split_mode mode,
> +				    bool write_locked);
>   #else
>   static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
>   {
>   	return false;
>   }
> +static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
> +					   struct mm_struct *mm,
> +					   struct vm_area_struct *vma,
> +					   unsigned long addr,
> +					   unsigned int desired_sz,
> +					   enum split_mode mode,
> +					   bool write_locked)
> +{
> +	return -EINVAL;
> +}
>   #endif
>   
>   static inline spinlock_t *huge_pte_lock(struct hstate *h,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index eaffe7b4f67c..6e0c5fbfe32c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7166,6 +7166,73 @@ static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *v
>   	tlb_finish_mmu(&tlb);
>   	return ret;
>   }
> +
> +/*
> + * Similar to huge_pte_alloc except that this can be used to create or walk
> + * high-granularity mappings. It will automatically split existing HugeTLB PTEs
> + * if required by @mode. The resulting HugeTLB PTE will be returned in @hpte.
> + *
> + * There are three options for @mode:
> + *  - HUGETLB_SPLIT_NEVER   - Never split.
> + *  - HUGETLB_SPLIT_NONE    - Split empty PTEs.
> + *  - HUGETLB_SPLIT_PRESENT - Split present PTEs.
> + *  - HUGETLB_SPLIT_ALWAYS  - Split both empty and present PTEs.
> + */
> +int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
> +				    struct mm_struct *mm,
> +				    struct vm_area_struct *vma,
> +				    unsigned long addr,
> +				    unsigned int desired_shift,
> +				    enum split_mode mode,
> +				    bool write_locked)
> +{
> +	struct address_space *mapping = vma->vm_file->f_mapping;
> +	bool has_write_lock = write_locked;
> +	unsigned long desired_sz = 1UL << desired_shift;
> +	int ret;
> +
> +	BUG_ON(!hpte);
> +
> +	if (has_write_lock)
> +		i_mmap_assert_write_locked(mapping);
> +	else
> +		i_mmap_assert_locked(mapping);

> +
> +retry:
> +	ret = 0;
> +	hugetlb_pte_init(hpte);
> +
> +	ret = hugetlb_walk_to(mm, hpte, addr, desired_sz,
> +			      !(mode & HUGETLB_SPLIT_NONE));

hugetlb_walk_to when called with split_non mode can change mappings?

If so should be ensure we are holding write-lock here.

> +	if (ret || hugetlb_pte_size(hpte) == desired_sz)
> +		goto out;
> +
> +	if (
> +		((mode & HUGETLB_SPLIT_NONE) && hugetlb_pte_none(hpte)) ||
> +		((mode & HUGETLB_SPLIT_PRESENT) &&
> +		  hugetlb_pte_present_leaf(hpte))
> +	   ) {
> +		if (!has_write_lock) {
> +			i_mmap_unlock_read(mapping);
Should lock upgrade be used here?
> +			i_mmap_lock_write(mapping);
> +			has_write_lock = true;
> +			goto retry;
> +		}
> +		ret = hugetlb_split_to_shift(mm, vma, hpte, addr,
> +					     desired_shift);
> +	}
> +
> +out:
> +	if (has_write_lock && !write_locked) {
> +		/* Drop the write lock. */
> +		i_mmap_unlock_write(mapping);
> +		i_mmap_lock_read(mapping);
same here lock downgrade?
> +		has_write_lock = false;
> +		goto retry;
> +	}
> +
> +	return ret;
> +}
>   #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
>   
>   /*

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality
  2022-06-24 17:36 ` [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality James Houghton
  2022-06-27 13:50   ` manish.mishra
@ 2022-06-29 14:33   ` manish.mishra
  2022-06-29 16:20     ` James Houghton
  1 sibling, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-06-29 14:33 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> The new function, hugetlb_split_to_shift, will optimally split the page
> table to map a particular address at a particular granularity.
>
> This is useful for punching a hole in the mapping and for mapping small
> sections of a HugeTLB page (via UFFDIO_CONTINUE, for example).
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   mm/hugetlb.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 122 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3ec2a921ee6f..eaffe7b4f67c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -102,6 +102,18 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
>   /* Forward declaration */
>   static int hugetlb_acct_memory(struct hstate *h, long delta);
>   
> +/*
> + * Find the subpage that corresponds to `addr` in `hpage`.
> + */
> +static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hpage,
> +				 unsigned long addr)
> +{
> +	size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE;
> +
> +	BUG_ON(idx >= pages_per_huge_page(h));
> +	return &hpage[idx];
> +}
> +
>   static inline bool subpool_is_free(struct hugepage_subpool *spool)
>   {
>   	if (spool->count)
> @@ -7044,6 +7056,116 @@ static unsigned int __shift_for_hstate(struct hstate *h)
>   	for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
>   			       (tmp_h) <= &hstates[hugetlb_max_hstate]; \
>   			       (tmp_h)++)
> +
> +/*
> + * Given a particular address, split the HugeTLB PTE that currently maps it
> + * so that, for the given address, the PTE that maps it is `desired_shift`.
> + * This function will always split the HugeTLB PTE optimally.
> + *
> + * For example, given a HugeTLB 1G page that is mapped from VA 0 to 1G. If we
> + * call this function with addr=0 and desired_shift=PAGE_SHIFT, will result in
> + * these changes to the page table:
> + * 1. The PUD will be split into 2M PMDs.
> + * 2. The first PMD will be split again into 4K PTEs.
> + */
> +static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma,
> +			   const struct hugetlb_pte *hpte,
> +			   unsigned long addr, unsigned long desired_shift)
> +{
> +	unsigned long start, end, curr;
> +	unsigned long desired_sz = 1UL << desired_shift;
> +	struct hstate *h = hstate_vma(vma);
> +	int ret;
> +	struct hugetlb_pte new_hpte;
> +	struct mmu_notifier_range range;
> +	struct page *hpage = NULL;
> +	struct page *subpage;
> +	pte_t old_entry;
> +	struct mmu_gather tlb;
> +
> +	BUG_ON(!hpte->ptep);
> +	BUG_ON(hugetlb_pte_size(hpte) == desired_sz);
> +
> +	start = addr & hugetlb_pte_mask(hpte);
> +	end = start + hugetlb_pte_size(hpte);
> +
> +	i_mmap_assert_write_locked(vma->vm_file->f_mapping);
> +
> +	BUG_ON(!hpte->ptep);
> +	/* This function only works if we are looking at a leaf-level PTE. */
> +	BUG_ON(!hugetlb_pte_none(hpte) && !hugetlb_pte_present_leaf(hpte));
> +
> +	/*
> +	 * Clear the PTE so that we will allocate the PT structures when
> +	 * walking the page table.
> +	 */
> +	old_entry = huge_ptep_get_and_clear(mm, start, hpte->ptep);

Sorry missed it last time, what if hgm mapping present here and current hpte is

at higher level. Where we will clear and free child page-table pages.

I see it does not happen in huge_ptep_get_and_clear.

> +
> +	if (!huge_pte_none(old_entry))
> +		hpage = pte_page(old_entry);
> +
> +	BUG_ON(!IS_ALIGNED(start, desired_sz));
> +	BUG_ON(!IS_ALIGNED(end, desired_sz));
> +
> +	for (curr = start; curr < end;) {
> +		struct hstate *tmp_h;
> +		unsigned int shift;
> +
> +		for_each_hgm_shift(h, tmp_h, shift) {
> +			unsigned long sz = 1UL << shift;
> +
> +			if (!IS_ALIGNED(curr, sz) || curr + sz > end)
> +				continue;
> +			/*
> +			 * If we are including `addr`, we need to make sure
> +			 * splitting down to the correct size. Go to a smaller
> +			 * size if we are not.
> +			 */
> +			if (curr <= addr && curr + sz > addr &&
> +					shift > desired_shift)
> +				continue;
> +
> +			/*
> +			 * Continue the page table walk to the level we want,
> +			 * allocate PT structures as we go.
> +			 */
> +			hugetlb_pte_copy(&new_hpte, hpte);
> +			ret = hugetlb_walk_to(mm, &new_hpte, curr, sz,
> +					      /*stop_at_none=*/false);
> +			if (ret)
> +				goto err;
> +			BUG_ON(hugetlb_pte_size(&new_hpte) != sz);
> +			if (hpage) {
> +				pte_t new_entry;
> +
> +				subpage = hugetlb_find_subpage(h, hpage, curr);
> +				new_entry = make_huge_pte_with_shift(vma, subpage,
> +								     huge_pte_write(old_entry),
> +								     shift);
> +				set_huge_pte_at(mm, curr, new_hpte.ptep, new_entry);
> +			}
> +			curr += sz;
> +			goto next;
> +		}
> +		/* We couldn't find a size that worked. */
> +		BUG();
> +next:
> +		continue;
> +	}
> +
> +	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> +				start, end);
> +	mmu_notifier_invalidate_range_start(&range);
> +	return 0;
> +err:
> +	tlb_gather_mmu(&tlb, mm);
> +	/* Free any newly allocated page table entries. */
> +	hugetlb_free_range(&tlb, hpte, start, curr);
> +	/* Restore the old entry. */
> +	set_huge_pte_at(mm, start, hpte->ptep, old_entry);
> +	tlb_finish_mmu(&tlb);
> +	return ret;
> +}
>   #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
>   
>   /*

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page
  2022-06-24 17:36 ` [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
@ 2022-06-29 14:40   ` manish.mishra
  2022-06-29 15:56     ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-06-29 14:40 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This CL is the first main functional HugeTLB change. Together, these
> changes allow the HugeTLB fault path to handle faults on HGM-enabled
> VMAs. The two main behaviors that can be done now:
>    1. Faults can be passed to handle_userfault. (Userspace will want to
>       use UFFD_FEATURE_REAL_ADDRESS to get the real address to know which
>       region they should be call UFFDIO_CONTINUE on later.)
>    2. Faults on pages that have been partially mapped (and userfaultfd is
>       not being used) will get mapped at the largest possible size.
>       For example, if a 1G page has been partially mapped at 2M, and we
>       fault on an unmapped 2M section, hugetlb_no_page will create a 2M
>       PMD to map the faulting address.
>
> This commit does not handle hugetlb_wp right now, and it doesn't handle
> HugeTLB page migration and swap entries.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   include/linux/hugetlb.h |  12 ++++
>   mm/hugetlb.c            | 121 +++++++++++++++++++++++++++++++---------
>   2 files changed, 106 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 321f5745d87f..ac4ac8fbd901 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1185,6 +1185,9 @@ enum split_mode {
>   #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
>   /* If HugeTLB high-granularity mappings are enabled for this VMA. */
>   bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
> +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
> +			      struct vm_area_struct *vma, unsigned long start,
> +			      unsigned long end);
>   int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
>   				    struct mm_struct *mm,
>   				    struct vm_area_struct *vma,
> @@ -1197,6 +1200,15 @@ static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
>   {
>   	return false;
>   }
> +
> +static inline
> +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
> +			      struct vm_area_struct *vma, unsigned long start,
> +			      unsigned long end)
> +{
> +		BUG();
> +}
> +
>   static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
>   					   struct mm_struct *mm,
>   					   struct vm_area_struct *vma,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6e0c5fbfe32c..da30621656b8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5605,18 +5605,24 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma,
>   static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>   			struct vm_area_struct *vma,
>   			struct address_space *mapping, pgoff_t idx,
> -			unsigned long address, pte_t *ptep,
> +			unsigned long address, struct hugetlb_pte *hpte,
>   			pte_t old_pte, unsigned int flags)
>   {
>   	struct hstate *h = hstate_vma(vma);
>   	vm_fault_t ret = VM_FAULT_SIGBUS;
>   	int anon_rmap = 0;
>   	unsigned long size;
> -	struct page *page;
> +	struct page *page, *subpage;
>   	pte_t new_pte;
>   	spinlock_t *ptl;
>   	unsigned long haddr = address & huge_page_mask(h);
> +	unsigned long haddr_hgm = address & hugetlb_pte_mask(hpte);
>   	bool new_page, new_pagecache_page = false;
> +	/*
> +	 * This page is getting mapped for the first time, in which case we
> +	 * want to increment its mapcount.
> +	 */
> +	bool new_mapping = hpte->shift == huge_page_shift(h);
>   
>   	/*
>   	 * Currently, we are forced to kill the process in the event the
> @@ -5665,9 +5671,9 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>   			 * here.  Before returning error, get ptl and make
>   			 * sure there really is no pte entry.
>   			 */
> -			ptl = huge_pte_lock(h, mm, ptep);
> +			ptl = hugetlb_pte_lock(mm, hpte);
>   			ret = 0;
> -			if (huge_pte_none(huge_ptep_get(ptep)))
> +			if (hugetlb_pte_none(hpte))
>   				ret = vmf_error(PTR_ERR(page));
>   			spin_unlock(ptl);
>   			goto out;
> @@ -5731,18 +5737,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>   		vma_end_reservation(h, vma, haddr);
>   	}
>   
> -	ptl = huge_pte_lock(h, mm, ptep);
> +	ptl = hugetlb_pte_lock(mm, hpte);
>   	ret = 0;
>   	/* If pte changed from under us, retry */
> -	if (!pte_same(huge_ptep_get(ptep), old_pte))
> +	if (!pte_same(hugetlb_ptep_get(hpte), old_pte))
>   		goto backout;
>   
> -	if (anon_rmap) {
> -		ClearHPageRestoreReserve(page);
> -		hugepage_add_new_anon_rmap(page, vma, haddr);
> -	} else
> -		page_dup_file_rmap(page, true);
> -	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
> +	if (new_mapping) {
> +		/* Only increment this page's mapcount if we are mapping it
> +		 * for the first time.
> +		 */
> +		if (anon_rmap) {
> +			ClearHPageRestoreReserve(page);
> +			hugepage_add_new_anon_rmap(page, vma, haddr);
> +		} else
> +			page_dup_file_rmap(page, true);
> +	}
> +
> +	subpage = hugetlb_find_subpage(h, page, haddr_hgm);

               sorry did not understand why make_huge_pte we may be mapping just PAGE_SIZE

               too here.

> +	new_pte = make_huge_pte(vma, subpage, ((vma->vm_flags & VM_WRITE)
>   				&& (vma->vm_flags & VM_SHARED)));
>   	/*
>   	 * If this pte was previously wr-protected, keep it wr-protected even
> @@ -5750,12 +5763,13 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>   	 */
>   	if (unlikely(pte_marker_uffd_wp(old_pte)))
>   		new_pte = huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte));
> -	set_huge_pte_at(mm, haddr, ptep, new_pte);
> +	set_huge_pte_at(mm, haddr_hgm, hpte->ptep, new_pte);
>   
> -	hugetlb_count_add(pages_per_huge_page(h), mm);
> +	hugetlb_count_add(hugetlb_pte_size(hpte) / PAGE_SIZE, mm);
>   	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
> +		BUG_ON(hugetlb_pte_size(hpte) != huge_page_size(h));
>   		/* Optimization, do the COW without a second fault */
> -		ret = hugetlb_wp(mm, vma, address, ptep, flags, page, ptl);
> +		ret = hugetlb_wp(mm, vma, address, hpte->ptep, flags, page, ptl);
>   	}
>   
>   	spin_unlock(ptl);
> @@ -5816,11 +5830,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   	u32 hash;
>   	pgoff_t idx;
>   	struct page *page = NULL;
> +	struct page *subpage = NULL;
>   	struct page *pagecache_page = NULL;
>   	struct hstate *h = hstate_vma(vma);
>   	struct address_space *mapping;
>   	int need_wait_lock = 0;
>   	unsigned long haddr = address & huge_page_mask(h);
> +	unsigned long haddr_hgm;
> +	bool hgm_enabled = hugetlb_hgm_enabled(vma);
> +	struct hugetlb_pte hpte;
>   
>   	ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
>   	if (ptep) {
> @@ -5866,11 +5884,22 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   	hash = hugetlb_fault_mutex_hash(mapping, idx);
>   	mutex_lock(&hugetlb_fault_mutex_table[hash]);
>   
> -	entry = huge_ptep_get(ptep);
> +	hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h));
> +
> +	if (hgm_enabled) {
> +		ret = hugetlb_walk_to(mm, &hpte, address,
> +				      PAGE_SIZE, /*stop_at_none=*/true);
> +		if (ret) {
> +			ret = vmf_error(ret);
> +			goto out_mutex;
> +		}
> +	}
> +
> +	entry = hugetlb_ptep_get(&hpte);
>   	/* PTE markers should be handled the same way as none pte */
> -	if (huge_pte_none_mostly(entry)) {
> -		ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
> -				      entry, flags);
> +	if (hugetlb_pte_none_mostly(&hpte)) {
> +		ret = hugetlb_no_page(mm, vma, mapping, idx, address, &hpte,
> +				entry, flags);
>   		goto out_mutex;
>   	}
>   
> @@ -5908,14 +5937,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   								vma, haddr);
>   	}
>   
> -	ptl = huge_pte_lock(h, mm, ptep);
> +	ptl = hugetlb_pte_lock(mm, &hpte);
>   
>   	/* Check for a racing update before calling hugetlb_wp() */
> -	if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
> +	if (unlikely(!pte_same(entry, hugetlb_ptep_get(&hpte))))
>   		goto out_ptl;
>   
> +	/* haddr_hgm is the base address of the region that hpte maps. */
> +	haddr_hgm = address & hugetlb_pte_mask(&hpte);
> +
>   	/* Handle userfault-wp first, before trying to lock more pages */
> -	if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
> +	if (userfaultfd_wp(vma) && huge_pte_uffd_wp(hugetlb_ptep_get(&hpte)) &&
>   	    (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
>   		struct vm_fault vmf = {
>   			.vma = vma,
> @@ -5939,7 +5971,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   	 * pagecache_page, so here we need take the former one
>   	 * when page != pagecache_page or !pagecache_page.
>   	 */
> -	page = pte_page(entry);
> +	subpage = pte_page(entry);
> +	page = compound_head(subpage);
>   	if (page != pagecache_page)
>   		if (!trylock_page(page)) {
>   			need_wait_lock = 1;
> @@ -5950,7 +5983,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   
>   	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
>   		if (!huge_pte_write(entry)) {
> -			ret = hugetlb_wp(mm, vma, address, ptep, flags,
> +			BUG_ON(hugetlb_pte_size(&hpte) != huge_page_size(h));

is it in respect to fact that userfault_wp is not support with HGM mapping currently? Not

sure yet though how it is controlled may be next patches will have more details.

> +			ret = hugetlb_wp(mm, vma, address, hpte.ptep, flags,
>   					 pagecache_page, ptl);
>   			goto out_put_page;
>   		} else if (likely(flags & FAULT_FLAG_WRITE)) {
> @@ -5958,9 +5992,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   		}
>   	}
>   	entry = pte_mkyoung(entry);
> -	if (huge_ptep_set_access_flags(vma, haddr, ptep, entry,
> +	if (huge_ptep_set_access_flags(vma, haddr_hgm, hpte.ptep, entry,
>   						flags & FAULT_FLAG_WRITE))
> -		update_mmu_cache(vma, haddr, ptep);
> +		update_mmu_cache(vma, haddr_hgm, hpte.ptep);
>   out_put_page:
>   	if (page != pagecache_page)
>   		unlock_page(page);
> @@ -6951,7 +6985,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
>   				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>   		}
>   	}
> -	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
> +	if (!hugetlb_hgm_enabled(vma))
> +		BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
>   
>   	return pte;
>   }
> @@ -7057,6 +7092,38 @@ static unsigned int __shift_for_hstate(struct hstate *h)
>   			       (tmp_h) <= &hstates[hugetlb_max_hstate]; \
>   			       (tmp_h)++)
>   
> +/*
> + * Allocate a HugeTLB PTE that maps as much of [start, end) as possible with a
> + * single page table entry. The allocated HugeTLB PTE is returned in hpte.
> + */

Will it be used for madvise_collapase? If so will it make sense to keep it in different patch

as this one title says just for handle_page_fault routines.

> +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
> +			      struct vm_area_struct *vma, unsigned long start,
> +			      unsigned long end)
> +{
> +	struct hstate *h = hstate_vma(vma), *tmp_h;
> +	unsigned int shift;
> +	int ret;
> +
> +	for_each_hgm_shift(h, tmp_h, shift) {
> +		unsigned long sz = 1UL << shift;
> +
> +		if (!IS_ALIGNED(start, sz) || start + sz > end)
> +			continue;
> +		ret = huge_pte_alloc_high_granularity(hpte, mm, vma, start,
> +						      shift, HUGETLB_SPLIT_NONE,
> +						      /*write_locked=*/false);
> +		if (ret)
> +			return ret;
> +
> +		if (hpte->shift > shift)
> +			return -EEXIST;
> +
> +		BUG_ON(hpte->shift != shift);
> +		return 0;
> +	}
> +	return -EINVAL;
> +}
> +
>   /*
>    * Given a particular address, split the HugeTLB PTE that currently maps it
>    * so that, for the given address, the PTE that maps it is `desired_shift`.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page
  2022-06-29 14:40   ` manish.mishra
@ 2022-06-29 15:56     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-29 15:56 UTC (permalink / raw)
  To: manish.mishra
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Wed, Jun 29, 2022 at 7:41 AM manish.mishra <manish.mishra@nutanix.com> wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
> > This CL is the first main functional HugeTLB change. Together, these
> > changes allow the HugeTLB fault path to handle faults on HGM-enabled
> > VMAs. The two main behaviors that can be done now:
> >    1. Faults can be passed to handle_userfault. (Userspace will want to
> >       use UFFD_FEATURE_REAL_ADDRESS to get the real address to know which
> >       region they should be call UFFDIO_CONTINUE on later.)
> >    2. Faults on pages that have been partially mapped (and userfaultfd is
> >       not being used) will get mapped at the largest possible size.
> >       For example, if a 1G page has been partially mapped at 2M, and we
> >       fault on an unmapped 2M section, hugetlb_no_page will create a 2M
> >       PMD to map the faulting address.
> >
> > This commit does not handle hugetlb_wp right now, and it doesn't handle
> > HugeTLB page migration and swap entries.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >   include/linux/hugetlb.h |  12 ++++
> >   mm/hugetlb.c            | 121 +++++++++++++++++++++++++++++++---------
> >   2 files changed, 106 insertions(+), 27 deletions(-)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 321f5745d87f..ac4ac8fbd901 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -1185,6 +1185,9 @@ enum split_mode {
> >   #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> >   /* If HugeTLB high-granularity mappings are enabled for this VMA. */
> >   bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
> > +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
> > +                           struct vm_area_struct *vma, unsigned long start,
> > +                           unsigned long end);
> >   int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
> >                                   struct mm_struct *mm,
> >                                   struct vm_area_struct *vma,
> > @@ -1197,6 +1200,15 @@ static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> >   {
> >       return false;
> >   }
> > +
> > +static inline
> > +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
> > +                           struct vm_area_struct *vma, unsigned long start,
> > +                           unsigned long end)
> > +{
> > +             BUG();
> > +}
> > +
> >   static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
> >                                          struct mm_struct *mm,
> >                                          struct vm_area_struct *vma,
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 6e0c5fbfe32c..da30621656b8 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -5605,18 +5605,24 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma,
> >   static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >                       struct vm_area_struct *vma,
> >                       struct address_space *mapping, pgoff_t idx,
> > -                     unsigned long address, pte_t *ptep,
> > +                     unsigned long address, struct hugetlb_pte *hpte,
> >                       pte_t old_pte, unsigned int flags)
> >   {
> >       struct hstate *h = hstate_vma(vma);
> >       vm_fault_t ret = VM_FAULT_SIGBUS;
> >       int anon_rmap = 0;
> >       unsigned long size;
> > -     struct page *page;
> > +     struct page *page, *subpage;
> >       pte_t new_pte;
> >       spinlock_t *ptl;
> >       unsigned long haddr = address & huge_page_mask(h);
> > +     unsigned long haddr_hgm = address & hugetlb_pte_mask(hpte);
> >       bool new_page, new_pagecache_page = false;
> > +     /*
> > +      * This page is getting mapped for the first time, in which case we
> > +      * want to increment its mapcount.
> > +      */
> > +     bool new_mapping = hpte->shift == huge_page_shift(h);
> >
> >       /*
> >        * Currently, we are forced to kill the process in the event the
> > @@ -5665,9 +5671,9 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >                        * here.  Before returning error, get ptl and make
> >                        * sure there really is no pte entry.
> >                        */
> > -                     ptl = huge_pte_lock(h, mm, ptep);
> > +                     ptl = hugetlb_pte_lock(mm, hpte);
> >                       ret = 0;
> > -                     if (huge_pte_none(huge_ptep_get(ptep)))
> > +                     if (hugetlb_pte_none(hpte))
> >                               ret = vmf_error(PTR_ERR(page));
> >                       spin_unlock(ptl);
> >                       goto out;
> > @@ -5731,18 +5737,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >               vma_end_reservation(h, vma, haddr);
> >       }
> >
> > -     ptl = huge_pte_lock(h, mm, ptep);
> > +     ptl = hugetlb_pte_lock(mm, hpte);
> >       ret = 0;
> >       /* If pte changed from under us, retry */
> > -     if (!pte_same(huge_ptep_get(ptep), old_pte))
> > +     if (!pte_same(hugetlb_ptep_get(hpte), old_pte))
> >               goto backout;
> >
> > -     if (anon_rmap) {
> > -             ClearHPageRestoreReserve(page);
> > -             hugepage_add_new_anon_rmap(page, vma, haddr);
> > -     } else
> > -             page_dup_file_rmap(page, true);
> > -     new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
> > +     if (new_mapping) {
> > +             /* Only increment this page's mapcount if we are mapping it
> > +              * for the first time.
> > +              */
> > +             if (anon_rmap) {
> > +                     ClearHPageRestoreReserve(page);
> > +                     hugepage_add_new_anon_rmap(page, vma, haddr);
> > +             } else
> > +                     page_dup_file_rmap(page, true);
> > +     }
> > +
> > +     subpage = hugetlb_find_subpage(h, page, haddr_hgm);
>
>                sorry did not understand why make_huge_pte we may be mapping just PAGE_SIZE
>
>                too here.
>

This should be make_huge_pte_with_shift(), with shift =
hugetlb_pte_shift(hpte). Thanks.

> > +     new_pte = make_huge_pte(vma, subpage, ((vma->vm_flags & VM_WRITE)
> >                               && (vma->vm_flags & VM_SHARED)));
> >       /*
> >        * If this pte was previously wr-protected, keep it wr-protected even
> > @@ -5750,12 +5763,13 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >        */
> >       if (unlikely(pte_marker_uffd_wp(old_pte)))
> >               new_pte = huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte));
> > -     set_huge_pte_at(mm, haddr, ptep, new_pte);
> > +     set_huge_pte_at(mm, haddr_hgm, hpte->ptep, new_pte);
> >
> > -     hugetlb_count_add(pages_per_huge_page(h), mm);
> > +     hugetlb_count_add(hugetlb_pte_size(hpte) / PAGE_SIZE, mm);
> >       if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
> > +             BUG_ON(hugetlb_pte_size(hpte) != huge_page_size(h));
> >               /* Optimization, do the COW without a second fault */
> > -             ret = hugetlb_wp(mm, vma, address, ptep, flags, page, ptl);
> > +             ret = hugetlb_wp(mm, vma, address, hpte->ptep, flags, page, ptl);
> >       }
> >
> >       spin_unlock(ptl);
> > @@ -5816,11 +5830,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >       u32 hash;
> >       pgoff_t idx;
> >       struct page *page = NULL;
> > +     struct page *subpage = NULL;
> >       struct page *pagecache_page = NULL;
> >       struct hstate *h = hstate_vma(vma);
> >       struct address_space *mapping;
> >       int need_wait_lock = 0;
> >       unsigned long haddr = address & huge_page_mask(h);
> > +     unsigned long haddr_hgm;
> > +     bool hgm_enabled = hugetlb_hgm_enabled(vma);
> > +     struct hugetlb_pte hpte;
> >
> >       ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
> >       if (ptep) {
> > @@ -5866,11 +5884,22 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >       hash = hugetlb_fault_mutex_hash(mapping, idx);
> >       mutex_lock(&hugetlb_fault_mutex_table[hash]);
> >
> > -     entry = huge_ptep_get(ptep);
> > +     hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h));
> > +
> > +     if (hgm_enabled) {
> > +             ret = hugetlb_walk_to(mm, &hpte, address,
> > +                                   PAGE_SIZE, /*stop_at_none=*/true);
> > +             if (ret) {
> > +                     ret = vmf_error(ret);
> > +                     goto out_mutex;
> > +             }
> > +     }
> > +
> > +     entry = hugetlb_ptep_get(&hpte);
> >       /* PTE markers should be handled the same way as none pte */
> > -     if (huge_pte_none_mostly(entry)) {
> > -             ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
> > -                                   entry, flags);
> > +     if (hugetlb_pte_none_mostly(&hpte)) {
> > +             ret = hugetlb_no_page(mm, vma, mapping, idx, address, &hpte,
> > +                             entry, flags);
> >               goto out_mutex;
> >       }
> >
> > @@ -5908,14 +5937,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >                                                               vma, haddr);
> >       }
> >
> > -     ptl = huge_pte_lock(h, mm, ptep);
> > +     ptl = hugetlb_pte_lock(mm, &hpte);
> >
> >       /* Check for a racing update before calling hugetlb_wp() */
> > -     if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
> > +     if (unlikely(!pte_same(entry, hugetlb_ptep_get(&hpte))))
> >               goto out_ptl;
> >
> > +     /* haddr_hgm is the base address of the region that hpte maps. */
> > +     haddr_hgm = address & hugetlb_pte_mask(&hpte);
> > +
> >       /* Handle userfault-wp first, before trying to lock more pages */
> > -     if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
> > +     if (userfaultfd_wp(vma) && huge_pte_uffd_wp(hugetlb_ptep_get(&hpte)) &&
> >           (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
> >               struct vm_fault vmf = {
> >                       .vma = vma,
> > @@ -5939,7 +5971,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >        * pagecache_page, so here we need take the former one
> >        * when page != pagecache_page or !pagecache_page.
> >        */
> > -     page = pte_page(entry);
> > +     subpage = pte_page(entry);
> > +     page = compound_head(subpage);
> >       if (page != pagecache_page)
> >               if (!trylock_page(page)) {
> >                       need_wait_lock = 1;
> > @@ -5950,7 +5983,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >
> >       if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
> >               if (!huge_pte_write(entry)) {
> > -                     ret = hugetlb_wp(mm, vma, address, ptep, flags,
> > +                     BUG_ON(hugetlb_pte_size(&hpte) != huge_page_size(h));
>
> is it in respect to fact that userfault_wp is not support with HGM mapping currently? Not
>
> sure yet though how it is controlled may be next patches will have more details.

Yeah this BUG_ON is just because I haven't implemented support for
userfaultfd_wp yet (userfaultfd_wp for HugeTLB was added pretty
recently, while I was working on this patch series). I'll improve WP
support for the next version.

>
> > +                     ret = hugetlb_wp(mm, vma, address, hpte.ptep, flags,
> >                                        pagecache_page, ptl);
> >                       goto out_put_page;
> >               } else if (likely(flags & FAULT_FLAG_WRITE)) {
> > @@ -5958,9 +5992,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >               }
> >       }
> >       entry = pte_mkyoung(entry);
> > -     if (huge_ptep_set_access_flags(vma, haddr, ptep, entry,
> > +     if (huge_ptep_set_access_flags(vma, haddr_hgm, hpte.ptep, entry,
> >                                               flags & FAULT_FLAG_WRITE))
> > -             update_mmu_cache(vma, haddr, ptep);
> > +             update_mmu_cache(vma, haddr_hgm, hpte.ptep);
> >   out_put_page:
> >       if (page != pagecache_page)
> >               unlock_page(page);
> > @@ -6951,7 +6985,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> >                               pte = (pte_t *)pmd_alloc(mm, pud, addr);
> >               }
> >       }
> > -     BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
> > +     if (!hugetlb_hgm_enabled(vma))
> > +             BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
> >
> >       return pte;
> >   }
> > @@ -7057,6 +7092,38 @@ static unsigned int __shift_for_hstate(struct hstate *h)
> >                              (tmp_h) <= &hstates[hugetlb_max_hstate]; \
> >                              (tmp_h)++)
> >
> > +/*
> > + * Allocate a HugeTLB PTE that maps as much of [start, end) as possible with a
> > + * single page table entry. The allocated HugeTLB PTE is returned in hpte.
> > + */
>
> Will it be used for madvise_collapase? If so will it make sense to keep it in different patch
>
> as this one title says just for handle_page_fault routines.

This is used by userfaultfd/UFFDIO_CONTINUE -- I will move this diff
to the patch that uses it (certainly shouldn't be in this patch).

>
> > +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm,
> > +                           struct vm_area_struct *vma, unsigned long start,
> > +                           unsigned long end)
> > +{
> > +     struct hstate *h = hstate_vma(vma), *tmp_h;
> > +     unsigned int shift;
> > +     int ret;
> > +
> > +     for_each_hgm_shift(h, tmp_h, shift) {
> > +             unsigned long sz = 1UL << shift;
> > +
> > +             if (!IS_ALIGNED(start, sz) || start + sz > end)
> > +                     continue;
> > +             ret = huge_pte_alloc_high_granularity(hpte, mm, vma, start,
> > +                                                   shift, HUGETLB_SPLIT_NONE,
> > +                                                   /*write_locked=*/false);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             if (hpte->shift > shift)
> > +                     return -EEXIST;
> > +
> > +             BUG_ON(hpte->shift != shift);
> > +             return 0;
> > +     }
> > +     return -EINVAL;
> > +}
> > +
> >   /*
> >    * Given a particular address, split the HugeTLB PTE that currently maps it
> >    * so that, for the given address, the PTE that maps it is `desired_shift`.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality
  2022-06-27 13:50   ` manish.mishra
@ 2022-06-29 16:10     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-29 16:10 UTC (permalink / raw)
  To: manish.mishra
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 6:51 AM manish.mishra <manish.mishra@nutanix.com> wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
> > The new function, hugetlb_split_to_shift, will optimally split the page
> > table to map a particular address at a particular granularity.
> >
> > This is useful for punching a hole in the mapping and for mapping small
> > sections of a HugeTLB page (via UFFDIO_CONTINUE, for example).
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >   mm/hugetlb.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 122 insertions(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 3ec2a921ee6f..eaffe7b4f67c 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -102,6 +102,18 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> >   /* Forward declaration */
> >   static int hugetlb_acct_memory(struct hstate *h, long delta);
> >
> > +/*
> > + * Find the subpage that corresponds to `addr` in `hpage`.
> > + */
> > +static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hpage,
> > +                              unsigned long addr)
> > +{
> > +     size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE;
> > +
> > +     BUG_ON(idx >= pages_per_huge_page(h));
> > +     return &hpage[idx];
> > +}
> > +
> >   static inline bool subpool_is_free(struct hugepage_subpool *spool)
> >   {
> >       if (spool->count)
> > @@ -7044,6 +7056,116 @@ static unsigned int __shift_for_hstate(struct hstate *h)
> >       for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
> >                              (tmp_h) <= &hstates[hugetlb_max_hstate]; \
> >                              (tmp_h)++)
> > +
> > +/*
> > + * Given a particular address, split the HugeTLB PTE that currently maps it
> > + * so that, for the given address, the PTE that maps it is `desired_shift`.
> > + * This function will always split the HugeTLB PTE optimally.
> > + *
> > + * For example, given a HugeTLB 1G page that is mapped from VA 0 to 1G. If we
> > + * call this function with addr=0 and desired_shift=PAGE_SHIFT, will result in
> > + * these changes to the page table:
> > + * 1. The PUD will be split into 2M PMDs.
> > + * 2. The first PMD will be split again into 4K PTEs.
> > + */
> > +static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma,
> > +                        const struct hugetlb_pte *hpte,
> > +                        unsigned long addr, unsigned long desired_shift)
> > +{
> > +     unsigned long start, end, curr;
> > +     unsigned long desired_sz = 1UL << desired_shift;
> > +     struct hstate *h = hstate_vma(vma);
> > +     int ret;
> > +     struct hugetlb_pte new_hpte;
> > +     struct mmu_notifier_range range;
> > +     struct page *hpage = NULL;
> > +     struct page *subpage;
> > +     pte_t old_entry;
> > +     struct mmu_gather tlb;
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     BUG_ON(hugetlb_pte_size(hpte) == desired_sz);
> can it be BUG_ON(hugetlb_pte_size(hpte) <= desired_sz)

Sure -- I think that's better.

> > +
> > +     start = addr & hugetlb_pte_mask(hpte);
> > +     end = start + hugetlb_pte_size(hpte);
> > +
> > +     i_mmap_assert_write_locked(vma->vm_file->f_mapping);
>
> As it is just changing mappings is holding f_mapping required? I mean in future
>
> is it any paln or way to use some per process level sub-lock?

We don't need to hold a per-mapping lock here, you're right; a per-VMA
lock will do just fine. I'll replace this with a per-VMA lock for the
next version.

>
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     /* This function only works if we are looking at a leaf-level PTE. */
> > +     BUG_ON(!hugetlb_pte_none(hpte) && !hugetlb_pte_present_leaf(hpte));
> > +
> > +     /*
> > +      * Clear the PTE so that we will allocate the PT structures when
> > +      * walking the page table.
> > +      */
> > +     old_entry = huge_ptep_get_and_clear(mm, start, hpte->ptep);
> > +
> > +     if (!huge_pte_none(old_entry))
> > +             hpage = pte_page(old_entry);
> > +
> > +     BUG_ON(!IS_ALIGNED(start, desired_sz));
> > +     BUG_ON(!IS_ALIGNED(end, desired_sz));
> > +
> > +     for (curr = start; curr < end;) {
> > +             struct hstate *tmp_h;
> > +             unsigned int shift;
> > +
> > +             for_each_hgm_shift(h, tmp_h, shift) {
> > +                     unsigned long sz = 1UL << shift;
> > +
> > +                     if (!IS_ALIGNED(curr, sz) || curr + sz > end)
> > +                             continue;
> > +                     /*
> > +                      * If we are including `addr`, we need to make sure
> > +                      * splitting down to the correct size. Go to a smaller
> > +                      * size if we are not.
> > +                      */
> > +                     if (curr <= addr && curr + sz > addr &&
> > +                                     shift > desired_shift)
> > +                             continue;
> > +
> > +                     /*
> > +                      * Continue the page table walk to the level we want,
> > +                      * allocate PT structures as we go.
> > +                      */
>
> As i understand this for_each_hgm_shift loop is just to find right size of shift,
>
> then code below this line can be put out of loop, no strong feeling but it looks
>
> more proper may make code easier to understand.

Agreed. I'll clean this up.

>
> > +                     hugetlb_pte_copy(&new_hpte, hpte);
> > +                     ret = hugetlb_walk_to(mm, &new_hpte, curr, sz,
> > +                                           /*stop_at_none=*/false);
> > +                     if (ret)
> > +                             goto err;
> > +                     BUG_ON(hugetlb_pte_size(&new_hpte) != sz);
> > +                     if (hpage) {
> > +                             pte_t new_entry;
> > +
> > +                             subpage = hugetlb_find_subpage(h, hpage, curr);
> > +                             new_entry = make_huge_pte_with_shift(vma, subpage,
> > +                                                                  huge_pte_write(old_entry),
> > +                                                                  shift);
> > +                             set_huge_pte_at(mm, curr, new_hpte.ptep, new_entry);
> > +                     }
> > +                     curr += sz;
> > +                     goto next;
> > +             }
> > +             /* We couldn't find a size that worked. */
> > +             BUG();
> > +next:
> > +             continue;
> > +     }
> > +
> > +     mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> > +                             start, end);
> > +     mmu_notifier_invalidate_range_start(&range);
>
> sorry did not understand where tlb flush will be taken care in case of success?
>
> I see set_huge_pte_at does not do it internally by self.

A TLB flush isn't necessary in the success case -- pages that were
mapped before will continue to be mapped the same way, so the TLB
entries will still be valid. If we're splitting a none P*D, then
there's nothing to flush. If we're splitting a present P*D, then the
flush will come if/when we clear any of the page table entries below
the P*D.

>
> > +     return 0;
> > +err:
> > +     tlb_gather_mmu(&tlb, mm);
> > +     /* Free any newly allocated page table entries. */
> > +     hugetlb_free_range(&tlb, hpte, start, curr);
> > +     /* Restore the old entry. */
> > +     set_huge_pte_at(mm, start, hpte->ptep, old_entry);
> > +     tlb_finish_mmu(&tlb);
> > +     return ret;
> > +}
> >   #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
> >
> >   /*

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality
  2022-06-29 14:33   ` manish.mishra
@ 2022-06-29 16:20     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-29 16:20 UTC (permalink / raw)
  To: manish.mishra
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Wed, Jun 29, 2022 at 7:33 AM manish.mishra <manish.mishra@nutanix.com> wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
> > The new function, hugetlb_split_to_shift, will optimally split the page
> > table to map a particular address at a particular granularity.
> >
> > This is useful for punching a hole in the mapping and for mapping small
> > sections of a HugeTLB page (via UFFDIO_CONTINUE, for example).
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >   mm/hugetlb.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 122 insertions(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 3ec2a921ee6f..eaffe7b4f67c 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -102,6 +102,18 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> >   /* Forward declaration */
> >   static int hugetlb_acct_memory(struct hstate *h, long delta);
> >
> > +/*
> > + * Find the subpage that corresponds to `addr` in `hpage`.
> > + */
> > +static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hpage,
> > +                              unsigned long addr)
> > +{
> > +     size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE;
> > +
> > +     BUG_ON(idx >= pages_per_huge_page(h));
> > +     return &hpage[idx];
> > +}
> > +
> >   static inline bool subpool_is_free(struct hugepage_subpool *spool)
> >   {
> >       if (spool->count)
> > @@ -7044,6 +7056,116 @@ static unsigned int __shift_for_hstate(struct hstate *h)
> >       for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
> >                              (tmp_h) <= &hstates[hugetlb_max_hstate]; \
> >                              (tmp_h)++)
> > +
> > +/*
> > + * Given a particular address, split the HugeTLB PTE that currently maps it
> > + * so that, for the given address, the PTE that maps it is `desired_shift`.
> > + * This function will always split the HugeTLB PTE optimally.
> > + *
> > + * For example, given a HugeTLB 1G page that is mapped from VA 0 to 1G. If we
> > + * call this function with addr=0 and desired_shift=PAGE_SHIFT, will result in
> > + * these changes to the page table:
> > + * 1. The PUD will be split into 2M PMDs.
> > + * 2. The first PMD will be split again into 4K PTEs.
> > + */
> > +static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma,
> > +                        const struct hugetlb_pte *hpte,
> > +                        unsigned long addr, unsigned long desired_shift)
> > +{
> > +     unsigned long start, end, curr;
> > +     unsigned long desired_sz = 1UL << desired_shift;
> > +     struct hstate *h = hstate_vma(vma);
> > +     int ret;
> > +     struct hugetlb_pte new_hpte;
> > +     struct mmu_notifier_range range;
> > +     struct page *hpage = NULL;
> > +     struct page *subpage;
> > +     pte_t old_entry;
> > +     struct mmu_gather tlb;
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     BUG_ON(hugetlb_pte_size(hpte) == desired_sz);
> > +
> > +     start = addr & hugetlb_pte_mask(hpte);
> > +     end = start + hugetlb_pte_size(hpte);
> > +
> > +     i_mmap_assert_write_locked(vma->vm_file->f_mapping);
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     /* This function only works if we are looking at a leaf-level PTE. */
> > +     BUG_ON(!hugetlb_pte_none(hpte) && !hugetlb_pte_present_leaf(hpte));
> > +
> > +     /*
> > +      * Clear the PTE so that we will allocate the PT structures when
> > +      * walking the page table.
> > +      */
> > +     old_entry = huge_ptep_get_and_clear(mm, start, hpte->ptep);
>
> Sorry missed it last time, what if hgm mapping present here and current hpte is
>
> at higher level. Where we will clear and free child page-table pages.
>
> I see it does not happen in huge_ptep_get_and_clear.

This shouldn't happen because earlier we have
BUG_ON(!hugetlb_pte_none(hpte) && !hugetlb_pte_present_leaf(hpte));

i.e., hpte must either be none or present and leaf-level.

>
> > +
> > +     if (!huge_pte_none(old_entry))
> > +             hpage = pte_page(old_entry);
> > +
> > +     BUG_ON(!IS_ALIGNED(start, desired_sz));
> > +     BUG_ON(!IS_ALIGNED(end, desired_sz));
> > +
> > +     for (curr = start; curr < end;) {
> > +             struct hstate *tmp_h;
> > +             unsigned int shift;
> > +
> > +             for_each_hgm_shift(h, tmp_h, shift) {
> > +                     unsigned long sz = 1UL << shift;
> > +
> > +                     if (!IS_ALIGNED(curr, sz) || curr + sz > end)
> > +                             continue;
> > +                     /*
> > +                      * If we are including `addr`, we need to make sure
> > +                      * splitting down to the correct size. Go to a smaller
> > +                      * size if we are not.
> > +                      */
> > +                     if (curr <= addr && curr + sz > addr &&
> > +                                     shift > desired_shift)
> > +                             continue;
> > +
> > +                     /*
> > +                      * Continue the page table walk to the level we want,
> > +                      * allocate PT structures as we go.
> > +                      */
> > +                     hugetlb_pte_copy(&new_hpte, hpte);
> > +                     ret = hugetlb_walk_to(mm, &new_hpte, curr, sz,
> > +                                           /*stop_at_none=*/false);
> > +                     if (ret)
> > +                             goto err;
> > +                     BUG_ON(hugetlb_pte_size(&new_hpte) != sz);
> > +                     if (hpage) {
> > +                             pte_t new_entry;
> > +
> > +                             subpage = hugetlb_find_subpage(h, hpage, curr);
> > +                             new_entry = make_huge_pte_with_shift(vma, subpage,
> > +                                                                  huge_pte_write(old_entry),
> > +                                                                  shift);
> > +                             set_huge_pte_at(mm, curr, new_hpte.ptep, new_entry);
> > +                     }
> > +                     curr += sz;
> > +                     goto next;
> > +             }
> > +             /* We couldn't find a size that worked. */
> > +             BUG();
> > +next:
> > +             continue;
> > +     }
> > +
> > +     mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> > +                             start, end);
> > +     mmu_notifier_invalidate_range_start(&range);
> > +     return 0;
> > +err:
> > +     tlb_gather_mmu(&tlb, mm);
> > +     /* Free any newly allocated page table entries. */
> > +     hugetlb_free_range(&tlb, hpte, start, curr);
> > +     /* Restore the old entry. */
> > +     set_huge_pte_at(mm, start, hpte->ptep, old_entry);
> > +     tlb_finish_mmu(&tlb);
> > +     return ret;
> > +}
> >   #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
> >
> >   /*

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-28 20:44   ` Mike Kravetz
@ 2022-06-29 16:24     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-29 16:24 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Tue, Jun 28, 2022 at 1:45 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 06/24/22 17:36, James Houghton wrote:
> > After high-granularity mapping, page table entries for HugeTLB pages can
> > be of any size/type. (For example, we can have a 1G page mapped with a
> > mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> > PTE after we have done a page table walk.
> >
> > Without this, we'd have to pass around the "size" of the PTE everywhere.
> > We effectively did this before; it could be fetched from the hstate,
> > which we pass around pretty much everywhere.
> >
> > This commit includes definitions for some basic helper functions that
> > are used later. These helper functions wrap existing PTE
> > inspection/modification functions, where the correct version is picked
> > depending on if the HugeTLB PTE is actually "huge" or not. (Previously,
> > all HugeTLB PTEs were "huge").
> >
> > For example, hugetlb_ptep_get wraps huge_ptep_get and ptep_get, where
> > ptep_get is used when the HugeTLB PTE is PAGE_SIZE, and huge_ptep_get is
> > used in all other cases.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >  include/linux/hugetlb.h | 84 +++++++++++++++++++++++++++++++++++++++++
> >  mm/hugetlb.c            | 57 ++++++++++++++++++++++++++++
> >  2 files changed, 141 insertions(+)
>
> There is nothing 'wrong' with this patch, but it does make me wonder.
> After introducing hugetlb_pte, is all code dealing with hugetlb mappings
> going to be using hugetlb_ptes?  It would be quite confusing if there is
> a mix of hugetlb_ptes and non-hugetlb_ptes.  This will be revealed later
> in the series, but a comment about suture direction would be helpful
> here.

That is indeed the direction I am trying to go -- I'll make sure to
comment on this in this patch. I am planning to replace all other
non-hugetlb_pte uses with hugetlb_pte in the next version of this
series (I see it as necessary to get HGM merged).

> --
> Mike Kravetz
>
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 5fe1db46d8c9..1d4ec9dfdebf 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -46,6 +46,68 @@ enum {
> >       __NR_USED_SUBPAGE,
> >  };
> >
> > +struct hugetlb_pte {
> > +     pte_t *ptep;
> > +     unsigned int shift;
> > +};
> > +
> > +static inline
> > +void hugetlb_pte_init(struct hugetlb_pte *hpte)
> > +{
> > +     hpte->ptep = NULL;
> > +}
> > +
> > +static inline
> > +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep,
> > +                       unsigned int shift)
> > +{
> > +     BUG_ON(!ptep);
> > +     hpte->ptep = ptep;
> > +     hpte->shift = shift;
> > +}
> > +
> > +static inline
> > +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte)
> > +{
> > +     BUG_ON(!hpte->ptep);
> > +     return 1UL << hpte->shift;
> > +}
> > +
> > +static inline
> > +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte)
> > +{
> > +     BUG_ON(!hpte->ptep);
> > +     return ~(hugetlb_pte_size(hpte) - 1);
> > +}
> > +
> > +static inline
> > +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte)
> > +{
> > +     BUG_ON(!hpte->ptep);
> > +     return hpte->shift;
> > +}
> > +
> > +static inline
> > +bool hugetlb_pte_huge(const struct hugetlb_pte *hpte)
> > +{
> > +     return !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) ||
> > +             hugetlb_pte_shift(hpte) > PAGE_SHIFT;
> > +}
> > +
> > +static inline
> > +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
> > +{
> > +     dest->ptep = src->ptep;
> > +     dest->shift = src->shift;
> > +}
> > +
> > +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte);
> > +bool hugetlb_pte_none(const struct hugetlb_pte *hpte);
> > +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
> > +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
> > +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> > +                    unsigned long address);
> > +
> >  struct hugepage_subpool {
> >       spinlock_t lock;
> >       long count;
> > @@ -1130,6 +1192,28 @@ static inline spinlock_t *huge_pte_lock_shift(unsigned int shift,
> >       return ptl;
> >  }
> >
> > +static inline
> > +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     // Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
> > +     // the regular page table lock.
> > +     if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> > +             return huge_pte_lockptr(hugetlb_pte_shift(hpte),
> > +                             mm, hpte->ptep);
> > +     return &mm->page_table_lock;
> > +}
> > +
> > +static inline
> > +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > +     spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
> > +
> > +     spin_lock(ptl);
> > +     return ptl;
> > +}
> > +
> >  #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
> >  extern void __init hugetlb_cma_reserve(int order);
> >  extern void __init hugetlb_cma_check(void);
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index d6d0d4c03def..1a1434e29740 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1120,6 +1120,63 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
> >       return false;
> >  }
> >
> > +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
> > +{
> > +     pgd_t pgd;
> > +     p4d_t p4d;
> > +     pud_t pud;
> > +     pmd_t pmd;
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {
> > +             pgd = *(pgd_t *)hpte->ptep;
> > +             return pgd_present(pgd) && pgd_leaf(pgd);
> > +     } else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
> > +             p4d = *(p4d_t *)hpte->ptep;
> > +             return p4d_present(p4d) && p4d_leaf(p4d);
> > +     } else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
> > +             pud = *(pud_t *)hpte->ptep;
> > +             return pud_present(pud) && pud_leaf(pud);
> > +     } else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
> > +             pmd = *(pmd_t *)hpte->ptep;
> > +             return pmd_present(pmd) && pmd_leaf(pmd);
> > +     } else if (hugetlb_pte_size(hpte) >= PAGE_SIZE)
> > +             return pte_present(*hpte->ptep);
> > +     BUG();
> > +}
> > +
> > +bool hugetlb_pte_none(const struct hugetlb_pte *hpte)
> > +{
> > +     if (hugetlb_pte_huge(hpte))
> > +             return huge_pte_none(huge_ptep_get(hpte->ptep));
> > +     return pte_none(ptep_get(hpte->ptep));
> > +}
> > +
> > +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte)
> > +{
> > +     if (hugetlb_pte_huge(hpte))
> > +             return huge_pte_none_mostly(huge_ptep_get(hpte->ptep));
> > +     return pte_none_mostly(ptep_get(hpte->ptep));
> > +}
> > +
> > +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte)
> > +{
> > +     if (hugetlb_pte_huge(hpte))
> > +             return huge_ptep_get(hpte->ptep);
> > +     return ptep_get(hpte->ptep);
> > +}
> > +
> > +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> > +                    unsigned long address)
> > +{
> > +     BUG_ON(!hpte->ptep);
> > +     unsigned long sz = hugetlb_pte_size(hpte);
> > +
> > +     if (sz > PAGE_SIZE)
> > +             return huge_pte_clear(mm, address, hpte->ptep, sz);
> > +     return pte_clear(mm, address, hpte->ptep);
> > +}
> > +
> >  static void enqueue_huge_page(struct hstate *h, struct page *page)
> >  {
> >       int nid = page_to_nid(page);
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-27 12:47   ` manish.mishra
@ 2022-06-29 16:28     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-29 16:28 UTC (permalink / raw)
  To: manish.mishra
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jun 27, 2022 at 5:47 AM manish.mishra <manish.mishra@nutanix.com> wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
> > After high-granularity mapping, page table entries for HugeTLB pages can
> > be of any size/type. (For example, we can have a 1G page mapped with a
> > mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> > PTE after we have done a page table walk.
> >
> > Without this, we'd have to pass around the "size" of the PTE everywhere.
> > We effectively did this before; it could be fetched from the hstate,
> > which we pass around pretty much everywhere.
> >
> > This commit includes definitions for some basic helper functions that
> > are used later. These helper functions wrap existing PTE
> > inspection/modification functions, where the correct version is picked
> > depending on if the HugeTLB PTE is actually "huge" or not. (Previously,
> > all HugeTLB PTEs were "huge").
> >
> > For example, hugetlb_ptep_get wraps huge_ptep_get and ptep_get, where
> > ptep_get is used when the HugeTLB PTE is PAGE_SIZE, and huge_ptep_get is
> > used in all other cases.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >   include/linux/hugetlb.h | 84 +++++++++++++++++++++++++++++++++++++++++
> >   mm/hugetlb.c            | 57 ++++++++++++++++++++++++++++
> >   2 files changed, 141 insertions(+)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 5fe1db46d8c9..1d4ec9dfdebf 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -46,6 +46,68 @@ enum {
> >       __NR_USED_SUBPAGE,
> >   };
> >
> > +struct hugetlb_pte {
> > +     pte_t *ptep;
> > +     unsigned int shift;
> > +};
> > +
> > +static inline
> > +void hugetlb_pte_init(struct hugetlb_pte *hpte)
> > +{
> > +     hpte->ptep = NULL;
> I agree it does not matter but still will hpte->shift = 0 too be better?
> > +}
> > +
> > +static inline
> > +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep,
> > +                       unsigned int shift)
> > +{
> > +     BUG_ON(!ptep);
> > +     hpte->ptep = ptep;
> > +     hpte->shift = shift;
> > +}
> > +
> > +static inline
> > +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte)
> > +{
> > +     BUG_ON(!hpte->ptep);
> > +     return 1UL << hpte->shift;
> > +}
> > +
> > +static inline
> > +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte)
> > +{
> > +     BUG_ON(!hpte->ptep);
> > +     return ~(hugetlb_pte_size(hpte) - 1);
> > +}
> > +
> > +static inline
> > +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte)
> > +{
> > +     BUG_ON(!hpte->ptep);
> > +     return hpte->shift;
> > +}
> > +
> > +static inline
> > +bool hugetlb_pte_huge(const struct hugetlb_pte *hpte)
> > +{
> > +     return !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) ||
> > +             hugetlb_pte_shift(hpte) > PAGE_SHIFT;
> > +}
> > +
> > +static inline
> > +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
> > +{
> > +     dest->ptep = src->ptep;
> > +     dest->shift = src->shift;
> > +}
> > +
> > +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte);
> > +bool hugetlb_pte_none(const struct hugetlb_pte *hpte);
> > +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
> > +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
> > +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> > +                    unsigned long address);
> > +
> >   struct hugepage_subpool {
> >       spinlock_t lock;
> >       long count;
> > @@ -1130,6 +1192,28 @@ static inline spinlock_t *huge_pte_lock_shift(unsigned int shift,
> >       return ptl;
> >   }
> >
> > +static inline
> > +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     // Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
> > +     // the regular page table lock.
> > +     if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> > +             return huge_pte_lockptr(hugetlb_pte_shift(hpte),
> > +                             mm, hpte->ptep);
> > +     return &mm->page_table_lock;
> > +}
> > +
> > +static inline
> > +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > +     spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
> > +
> > +     spin_lock(ptl);
> > +     return ptl;
> > +}
> > +
> >   #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
> >   extern void __init hugetlb_cma_reserve(int order);
> >   extern void __init hugetlb_cma_check(void);
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index d6d0d4c03def..1a1434e29740 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1120,6 +1120,63 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
> >       return false;
> >   }
> >
> > +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
> > +{
> > +     pgd_t pgd;
> > +     p4d_t p4d;
> > +     pud_t pud;
> > +     pmd_t pmd;
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {
> > +             pgd = *(pgd_t *)hpte->ptep;
>
> sorry did not understand in these conditions why
>
> hugetlb_pte_size(hpte) >= PGDIR_SIZE. I mean why >= check
>
> and not just == check?

I did >= PGDIR_SIZE just because it was consistent with the rest of
the sizes, but, indeed, > PGDIR_SIZE makes little sense, so I'll
replace it with ==.

>
> > +             return pgd_present(pgd) && pgd_leaf(pgd);
> > +     } else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
> > +             p4d = *(p4d_t *)hpte->ptep;
> > +             return p4d_present(p4d) && p4d_leaf(p4d);
> > +     } else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
> > +             pud = *(pud_t *)hpte->ptep;
> > +             return pud_present(pud) && pud_leaf(pud);
> > +     } else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
> > +             pmd = *(pmd_t *)hpte->ptep;
> > +             return pmd_present(pmd) && pmd_leaf(pmd);
> > +     } else if (hugetlb_pte_size(hpte) >= PAGE_SIZE)
> > +             return pte_present(*hpte->ptep);
> > +     BUG();
> > +}
> > +
> > +bool hugetlb_pte_none(const struct hugetlb_pte *hpte)
> > +{
> > +     if (hugetlb_pte_huge(hpte))
> > +             return huge_pte_none(huge_ptep_get(hpte->ptep));
> > +     return pte_none(ptep_get(hpte->ptep));
> > +}
> > +
> > +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte)
> > +{
> > +     if (hugetlb_pte_huge(hpte))
> > +             return huge_pte_none_mostly(huge_ptep_get(hpte->ptep));
> > +     return pte_none_mostly(ptep_get(hpte->ptep));
> > +}
> > +
> > +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte)
> > +{
> > +     if (hugetlb_pte_huge(hpte))
> > +             return huge_ptep_get(hpte->ptep);
> > +     return ptep_get(hpte->ptep);
> > +}
> > +
> > +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> > +                    unsigned long address)
> > +{
> > +     BUG_ON(!hpte->ptep);
> > +     unsigned long sz = hugetlb_pte_size(hpte);
> > +
> > +     if (sz > PAGE_SIZE)
> > +             return huge_pte_clear(mm, address, hpte->ptep, sz);
>
> just for cosistency something like above?
>
> if (hugetlb_pte_huge(hpte))
> +               return huge_pte_clear
> ;

Will do, yes. (I added hugetlb_pte_huge quite late, and I guess I
missed updating this spot. :))

>
> > +     return pte_clear(mm, address, hpte->ptep);
> > +}
> > +
> >   static void enqueue_huge_page(struct hstate *h, struct page *page)
> >   {
> >       int nid = page_to_nid(page);

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-28 20:25   ` Mina Almasry
@ 2022-06-29 16:42     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-29 16:42 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Tue, Jun 28, 2022 at 1:25 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> >
> > After high-granularity mapping, page table entries for HugeTLB pages can
> > be of any size/type. (For example, we can have a 1G page mapped with a
> > mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> > PTE after we have done a page table walk.
> >
> > Without this, we'd have to pass around the "size" of the PTE everywhere.
> > We effectively did this before; it could be fetched from the hstate,
> > which we pass around pretty much everywhere.
> >
> > This commit includes definitions for some basic helper functions that
> > are used later. These helper functions wrap existing PTE
> > inspection/modification functions, where the correct version is picked
> > depending on if the HugeTLB PTE is actually "huge" or not. (Previously,
> > all HugeTLB PTEs were "huge").
> >
> > For example, hugetlb_ptep_get wraps huge_ptep_get and ptep_get, where
> > ptep_get is used when the HugeTLB PTE is PAGE_SIZE, and huge_ptep_get is
> > used in all other cases.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >  include/linux/hugetlb.h | 84 +++++++++++++++++++++++++++++++++++++++++
> >  mm/hugetlb.c            | 57 ++++++++++++++++++++++++++++
> >  2 files changed, 141 insertions(+)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 5fe1db46d8c9..1d4ec9dfdebf 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -46,6 +46,68 @@ enum {
> >         __NR_USED_SUBPAGE,
> >  };
> >
> > +struct hugetlb_pte {
> > +       pte_t *ptep;
> > +       unsigned int shift;
> > +};
> > +
> > +static inline
> > +void hugetlb_pte_init(struct hugetlb_pte *hpte)
> > +{
> > +       hpte->ptep = NULL;
>
> shift = 0; ?

I don't think this is necessary (but, admittedly, it is quite harmless
to add). ptep = NULL means that the hugetlb_pte isn't valid, and shift
could be anything. Originally I had a separate `bool valid`, but
ptep=NULL was exactly the same as valid=false.

>
> > +}
> > +
> > +static inline
> > +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep,
> > +                         unsigned int shift)
> > +{
> > +       BUG_ON(!ptep);
> > +       hpte->ptep = ptep;
> > +       hpte->shift = shift;
> > +}
> > +
> > +static inline
> > +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte)
> > +{
> > +       BUG_ON(!hpte->ptep);
> > +       return 1UL << hpte->shift;
> > +}
> > +
>
> This helper is quite redundant in my opinion.

Putting 1UL << hugetlb_pte_shit(hpte) everywhere is kind of annoying. :)

>
> > +static inline
> > +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte)
> > +{
> > +       BUG_ON(!hpte->ptep);
> > +       return ~(hugetlb_pte_size(hpte) - 1);
> > +}
> > +
> > +static inline
> > +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte)
> > +{
> > +       BUG_ON(!hpte->ptep);
> > +       return hpte->shift;
> > +}
> > +
>
> This one jumps as quite redundant too.

To make sure we aren't using an invalid hugetlb_pte, I want to remove
all places where I directly access hpte->shift -- really they should
all go through hugetlb_pte_shift.

>
> > +static inline
> > +bool hugetlb_pte_huge(const struct hugetlb_pte *hpte)
> > +{
> > +       return !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) ||
> > +               hugetlb_pte_shift(hpte) > PAGE_SHIFT;
> > +}
> > +
>
> I'm guessing the !IS_ENABLED() check is because only the HGM code
> would store a non-huge pte in a hugetlb_pte struct. I think it's a bit
> fragile because anyone can add code in the future that uses
> hugetlb_pte in unexpected ways, but I will concede that it is correct
> as written.

I added this so that, if HGM isn't enabled, the compiler would have an
easier time optimizing things. I don't really have strong feelings
about keeping/removing it.

>
> > +static inline
> > +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
> > +{
> > +       dest->ptep = src->ptep;
> > +       dest->shift = src->shift;
> > +}
> > +
> > +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte);
> > +bool hugetlb_pte_none(const struct hugetlb_pte *hpte);
> > +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte);
> > +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte);
> > +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> > +                      unsigned long address);
> > +
> >  struct hugepage_subpool {
> >         spinlock_t lock;
> >         long count;
> > @@ -1130,6 +1192,28 @@ static inline spinlock_t *huge_pte_lock_shift(unsigned int shift,
> >         return ptl;
> >  }
> >
> > +static inline
>
> Maybe for organization, move all the static functions you're adding
> above the hugetlb_pte_* declarations you're adding?

Will do.

>
> > +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > +
> > +       BUG_ON(!hpte->ptep);
> > +       // Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
> > +       // the regular page table lock.
>
> Does checkpatch.pl not complain about // style comments? I think those
> are not allowed, no?

It didn't :( I thought I went through and removed them all -- I guess
I missed some.

>
> > +       if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> > +               return huge_pte_lockptr(hugetlb_pte_shift(hpte),
> > +                               mm, hpte->ptep);
> > +       return &mm->page_table_lock;
> > +}
> > +
> > +static inline
> > +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > +       spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
> > +
> > +       spin_lock(ptl);
> > +       return ptl;
> > +}
> > +
> >  #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
> >  extern void __init hugetlb_cma_reserve(int order);
> >  extern void __init hugetlb_cma_check(void);
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index d6d0d4c03def..1a1434e29740 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1120,6 +1120,63 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
> >         return false;
> >  }
> >
> > +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte)
> > +{
> > +       pgd_t pgd;
> > +       p4d_t p4d;
> > +       pud_t pud;
> > +       pmd_t pmd;
> > +
> > +       BUG_ON(!hpte->ptep);
> > +       if (hugetlb_pte_size(hpte) >= PGDIR_SIZE) {
> > +               pgd = *(pgd_t *)hpte->ptep;
> > +               return pgd_present(pgd) && pgd_leaf(pgd);
> > +       } else if (hugetlb_pte_size(hpte) >= P4D_SIZE) {
> > +               p4d = *(p4d_t *)hpte->ptep;
> > +               return p4d_present(p4d) && p4d_leaf(p4d);
> > +       } else if (hugetlb_pte_size(hpte) >= PUD_SIZE) {
> > +               pud = *(pud_t *)hpte->ptep;
> > +               return pud_present(pud) && pud_leaf(pud);
> > +       } else if (hugetlb_pte_size(hpte) >= PMD_SIZE) {
> > +               pmd = *(pmd_t *)hpte->ptep;
> > +               return pmd_present(pmd) && pmd_leaf(pmd);
> > +       } else if (hugetlb_pte_size(hpte) >= PAGE_SIZE)
> > +               return pte_present(*hpte->ptep);
>
> The use of >= is a bit curious to me. Shouldn't these be ==?

These (except PGDIR_SIZE) should be >=. This is because some
architectures support multiple huge PTE sizes at the same page table
level. For example, on arm64, you can have 2M PMDs, and you can also
have 32M PMDs[1].

[1]: https://www.kernel.org/doc/html/latest/arm64/hugetlbpage.html

>
> Also probably doesn't matter but I was thinking to use *_SHIFTs
> instead of *_SIZE so you don't have to calculate the size 5 times in
> this routine, or calculate hugetlb_pte_size() once for some less code
> duplication and re-use?

I'll change this to use the shift, and I'll move the computation so
it's only done once (it is probably helpful for readability too). (I
imagine the compiler only actually computes the size once here.)

>
> > +       BUG();
> > +}
> > +
> > +bool hugetlb_pte_none(const struct hugetlb_pte *hpte)
> > +{
> > +       if (hugetlb_pte_huge(hpte))
> > +               return huge_pte_none(huge_ptep_get(hpte->ptep));
> > +       return pte_none(ptep_get(hpte->ptep));
> > +}
> > +
> > +bool hugetlb_pte_none_mostly(const struct hugetlb_pte *hpte)
> > +{
> > +       if (hugetlb_pte_huge(hpte))
> > +               return huge_pte_none_mostly(huge_ptep_get(hpte->ptep));
> > +       return pte_none_mostly(ptep_get(hpte->ptep));
> > +}
> > +
> > +pte_t hugetlb_ptep_get(const struct hugetlb_pte *hpte)
> > +{
> > +       if (hugetlb_pte_huge(hpte))
> > +               return huge_ptep_get(hpte->ptep);
> > +       return ptep_get(hpte->ptep);
> > +}
> > +
> > +void hugetlb_pte_clear(struct mm_struct *mm, const struct hugetlb_pte *hpte,
> > +                      unsigned long address)
> > +{
> > +       BUG_ON(!hpte->ptep);
> > +       unsigned long sz = hugetlb_pte_size(hpte);
> > +
> > +       if (sz > PAGE_SIZE)
> > +               return huge_pte_clear(mm, address, hpte->ptep, sz);
> > +       return pte_clear(mm, address, hpte->ptep);
> > +}
> > +
> >  static void enqueue_huge_page(struct hstate *h, struct page *page)
> >  {
> >         int nid = page_to_nid(page);
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-28 17:56       ` Dr. David Alan Gilbert
@ 2022-06-29 18:31         ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-29 18:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Mina Almasry, Mike Kravetz, Muchun Song, Peter Xu,
	David Hildenbrand, David Rientjes, Axel Rasmussen, Jue Wang,
	Manish Mishra, linux-mm, linux-kernel

On Tue, Jun 28, 2022 at 10:56 AM Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Mina Almasry (almasrymina@google.com) wrote:
> > On Mon, Jun 27, 2022 at 9:27 AM James Houghton <jthoughton@google.com> wrote:
> > >
> > > On Fri, Jun 24, 2022 at 11:41 AM Mina Almasry <almasrymina@google.com> wrote:
> > > >
> > > > On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> > > > >
> > > > > [trimmed...]
> > > > > ---- Userspace API ----
> > > > >
> > > > > This patch series introduces a single way to take advantage of
> > > > > high-granularity mapping: via UFFDIO_CONTINUE. UFFDIO_CONTINUE allows
> > > > > userspace to resolve MINOR page faults on shared VMAs.
> > > > >
> > > > > To collapse a HugeTLB address range that has been mapped with several
> > > > > UFFDIO_CONTINUE operations, userspace can issue MADV_COLLAPSE. We expect
> > > > > userspace to know when all pages (that they care about) have been fetched.
> > > > >
> > > >
> > > > Thanks James! Cover letter looks good. A few questions:
> > > >
> > > > Why not have the kernel collapse the hugepage once all the 4K pages
> > > > have been fetched automatically? It would remove the need for a new
> > > > userspace API, and AFACT there aren't really any cases where it is
> > > > beneficial to have a hugepage sharded into 4K mappings when those
> > > > mappings can be collapsed.
> > >
> > > The reason that we don't automatically collapse mappings is because it
> > > would take additional complexity, and it is less flexible. Consider
> > > the case of 1G pages on x86: currently, userspace can collapse the
> > > whole page when it's all ready, but they can also choose to collapse a
> > > 2M piece of it. On architectures with more supported hugepage sizes
> > > (e.g., arm64), userspace has even more possibilities for when to
> > > collapse. This likely further complicates a potential
> > > automatic-collapse solution. Userspace may also want to collapse the
> > > mapping for an entire hugepage without completely mapping the hugepage
> > > first (this would also be possible by issuing UFFDIO_CONTINUE on all
> > > the holes, though).
> > >
> >
> > To be honest I'm don't think I'm a fan of this. I don't think this
> > saves complexity, but rather pushes it to the userspace. I.e. the
> > userspace now must track which regions are faulted in and which are
> > not to call MADV_COLLAPSE at the right time. Also, if the userspace
> > gets it wrong it may accidentally not call MADV_COLLAPSE (and not get
> > any hugepages) or call MADV_COLLAPSE too early and have to deal with a
> > storm of maybe hundreds of minor faults at once which may take too
> > long to resolve and may impact guest stability, yes?
>
> I think it depends on whether the userspace is already holding bitmaps
> and data structures to let it know when the right time to call collapse
> is; if it already has to do all that book keeping for it's own postcopy
> or whatever process, then getting userspace to call it is easy.
> (I don't know the answer to whether it does have!)

Userspace generally has a lot of information about which pages have
been UFFDIO_CONTINUE'd, but they may not have the information (say,
some atomic count per hpage) to tell them exactly when to collapse.

I think it's worth discussing the tmpfs/THP case right now, too. Right
now, after userfaultfd post-copy, all THPs we have will all be
PTE-mapped. To deal with this, we need to use Zach's MADV_COLLAPSE to
collapse the mappings to PMD mappings (we don't want to wait for
khugepaged to happen upon them -- we want good performance ASAP :)).
In fact, IIUC, khugepaged actually won't collapse these *ever* right
now. I suppose we could enlighten tmpfs's UFFDIO_CONTINUE to
automatically collapse too (thus avoiding the need for MADV_COLLAPSE),
but that could be complicated/unwanted (if that is something we might
want, maybe we should have a separate discussion).

So, as it stands today, we intend to use MADV_COLLAPSE explicitly in
the tmpfs case as soon as it is supported, and so it follows that it's
ok to require userspace to do the same thing for HugeTLBFS-backed
memory.

>
> Dave
>
> > For these reasons I think automatic collapsing is something that will
> > eventually be implemented by us or someone else, and at that point
> > MADV_COLLAPSE for hugetlb memory will become obsolete; i.e. this patch
> > is adding a userspace API that will probably need to be maintained for
> > perpetuity but actually is likely going to be going obsolete "soon".
> > For this reason I had hoped that automatic collapsing would come with
> > V1.

Small, unimportant clarification: the API, as described here, won't be
*completely* meaningless if we end up implementing automatic
collapsing :) It still has the effect of not requiring other
UFFDIO_CONTINUE operations to be done for the collapsed region.

> >
> > I wonder if we can have a very simple first try at automatic
> > collapsing for V1? I.e., can we support collapsing to the hstate size
> > and only that? So 4K pages can only be either collapsed to 2MB or 1G
> > on x86 depending on the hstate size. I think this may be not too
> > difficult to implement: we can have a counter similar to mapcount that
> > tracks how many of the subpages are mapped (subpage_mapcount). Once
> > all the subpages are mapped (the counter reaches a certain value),
> > trigger collapsing similar to hstate size MADV_COLLAPSE.
> >

In my estimation, to implement automatic collapsing, for one VMA, we
will need a per-hstate count, where when the count reaches the maximum
number, we collapse automatically to the next most optimal size. So if
we finish filling in enough PTEs for a CONT_PTE, we will collapse to a
CONT_PTE. If we finish filling up CONT_PTEs to a PMD, then collapse to
a PMD.

If you are suggesting to only collapse to the hstate size at the end,
then we lose flexibility.

> > I gather that no one else reviewing this has raised this issue thus
> > far so it might not be a big deal and I will continue to review the
> > RFC, but I had hoped for automatic collapsing myself for the reasons
> > above.

Thanks for the thorough review, Mina. :)

> >
> > > >
> > > > > ---- HugeTLB Changes ----
> > > > >
> > > > > - Mapcount
> > > > > The way mapcount is handled is different from the way that it was handled
> > > > > before. If the PUD for a hugepage is not none, a hugepage's mapcount will
> > > > > be increased. This scheme means that, for hugepages that aren't mapped at
> > > > > high granularity, their mapcounts will remain the same as what they would
> > > > > have been pre-HGM.
> > > > >
> > > >
> > > > Sorry, I didn't quite follow this. It says mapcount is handled
> > > > differently, but the same if the page is not mapped at high
> > > > granularity. Can you elaborate on how the mapcount handling will be
> > > > different when the page is mapped at high granularity?
> > >
> > > I guess I didn't phrase this very well. For the sake of simplicity,
> > > consider 1G pages on x86, typically mapped with leaf-level PUDs.
> > > Previously, there were two possibilities for how a hugepage was
> > > mapped, either it was (1) completely mapped (PUD is present and a
> > > leaf), or (2) it wasn't mapped (PUD is none). Now we have a third
> > > case, where the PUD is not none but also not a leaf (this usually
> > > means that the page is partially mapped). We handle this case as if
> > > the whole page was mapped. That is, if we partially map a hugepage
> > > that was previously unmapped (making the PUD point to PMDs), we
> > > increment its mapcount, and if we completely unmap a partially mapped
> > > hugepage (making the PUD none), we decrement its mapcount. If we
> > > collapse a non-leaf PUD to a leaf PUD, we don't change mapcount.
> > >
> > > It is possible for a PUD to be present and not a leaf (mapcount has
> > > been incremented) but for the page to still be unmapped: if the PMDs
> > > (or PTEs) underneath are all none. This case is atypical, and as of
> > > this RFC (without bestowing MADV_DONTNEED with HGM flexibility), I
> > > think it would be very difficult to get this to happen.
> > >
> >
> > Thank you for the detailed explanation. Please add it to the cover letter.
> >
> > I wonder the case "PUD present but all the PMD are none": is that a
> > bug? I don't understand the usefulness of that. Not a comment on this
> > patch but rather a curiosity.
> >
> > > >
> > > > > - Page table walking and manipulation
> > > > > A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
> > > > > high-granularity mappings. Eventually, it's possible to merge
> > > > > hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.
> > > > >
> > > > > We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
> > > > > This is because we generally need to know the "size" of a PTE (previously
> > > > > always just huge_page_size(hstate)).
> > > > >
> > > > > For every page table manipulation function that has a huge version (e.g.
> > > > > huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
> > > > > hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
> > > > > PTE really is "huge".
> > > > >
> > > > > - Synchronization
> > > > > For existing bits of HugeTLB, synchronization is unchanged. For splitting
> > > > > and collapsing HugeTLB PTEs, we require that the i_mmap_rw_sem is held for
> > > > > writing, and for doing high-granularity page table walks, we require it to
> > > > > be held for reading.
> > > > >
> > > > > ---- Limitations & Future Changes ----
> > > > >
> > > > > This patch series only implements high-granularity mapping for VM_SHARED
> > > > > VMAs.  I intend to implement enough HGM to support 4K unmapping for memory
> > > > > failure recovery for both shared and private mappings.
> > > > >
> > > > > The memory failure use case poses its own challenges that can be
> > > > > addressed, but I will do so in a separate RFC.
> > > > >
> > > > > Performance has not been heavily scrutinized with this patch series. There
> > > > > are places where lock contention can significantly reduce performance. This
> > > > > will be addressed later.
> > > > >
> > > > > The patch series, as it stands right now, is compatible with the VMEMMAP
> > > > > page struct optimization[3], as we do not need to modify data contained
> > > > > in the subpage page structs.
> > > > >
> > > > > Other omissions:
> > > > >  - Compatibility with userfaultfd write-protect (will be included in v1).
> > > > >  - Support for mremap() (will be included in v1). This looks a lot like
> > > > >    the support we have for fork().
> > > > >  - Documentation changes (will be included in v1).
> > > > >  - Completely ignores PMD sharing and hugepage migration (will be included
> > > > >    in v1).
> > > > >  - Implementations for architectures that don't use GENERAL_HUGETLB other
> > > > >    than arm64.
> > > > >
> > > > > ---- Patch Breakdown ----
> > > > >
> > > > > Patch 1     - Preliminary changes
> > > > > Patch 2-10  - HugeTLB HGM core changes
> > > > > Patch 11-13 - HugeTLB HGM page table walking functionality
> > > > > Patch 14-19 - HugeTLB HGM compatibility with other bits
> > > > > Patch 20-23 - Userfaultfd and collapse changes
> > > > > Patch 24-26 - arm64 support and selftests
> > > > >
> > > > > [1] This used to be called HugeTLB double mapping, a bad and confusing
> > > > >     name. "High-granularity mapping" is not a great name either. I am open
> > > > >     to better names.
> > > >
> > > > I would drop 1 extra word and do "granular mapping", as in the mapping
> > > > is more granular than what it normally is (2MB/1G, etc).
> > >
> > > Noted. :)
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-28 17:26     ` Mina Almasry
  2022-06-28 17:56       ` Dr. David Alan Gilbert
@ 2022-06-29 20:39       ` Axel Rasmussen
  1 sibling, 0 replies; 128+ messages in thread
From: Axel Rasmussen @ 2022-06-29 20:39 UTC (permalink / raw)
  To: Mina Almasry
  Cc: James Houghton, Mike Kravetz, Muchun Song, Peter Xu,
	David Hildenbrand, David Rientjes, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, Linux MM, LKML

On Tue, Jun 28, 2022 at 10:27 AM Mina Almasry <almasrymina@google.com> wrote:
>
> On Mon, Jun 27, 2022 at 9:27 AM James Houghton <jthoughton@google.com> wrote:
> >
> > On Fri, Jun 24, 2022 at 11:41 AM Mina Almasry <almasrymina@google.com> wrote:
> > >
> > > On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> > > >
> > > > [trimmed...]
> > > > ---- Userspace API ----
> > > >
> > > > This patch series introduces a single way to take advantage of
> > > > high-granularity mapping: via UFFDIO_CONTINUE. UFFDIO_CONTINUE allows
> > > > userspace to resolve MINOR page faults on shared VMAs.
> > > >
> > > > To collapse a HugeTLB address range that has been mapped with several
> > > > UFFDIO_CONTINUE operations, userspace can issue MADV_COLLAPSE. We expect
> > > > userspace to know when all pages (that they care about) have been fetched.
> > > >
> > >
> > > Thanks James! Cover letter looks good. A few questions:
> > >
> > > Why not have the kernel collapse the hugepage once all the 4K pages
> > > have been fetched automatically? It would remove the need for a new
> > > userspace API, and AFACT there aren't really any cases where it is
> > > beneficial to have a hugepage sharded into 4K mappings when those
> > > mappings can be collapsed.
> >
> > The reason that we don't automatically collapse mappings is because it
> > would take additional complexity, and it is less flexible. Consider
> > the case of 1G pages on x86: currently, userspace can collapse the
> > whole page when it's all ready, but they can also choose to collapse a
> > 2M piece of it. On architectures with more supported hugepage sizes
> > (e.g., arm64), userspace has even more possibilities for when to
> > collapse. This likely further complicates a potential
> > automatic-collapse solution. Userspace may also want to collapse the
> > mapping for an entire hugepage without completely mapping the hugepage
> > first (this would also be possible by issuing UFFDIO_CONTINUE on all
> > the holes, though).
> >
>
> To be honest I'm don't think I'm a fan of this. I don't think this
> saves complexity, but rather pushes it to the userspace. I.e. the
> userspace now must track which regions are faulted in and which are
> not to call MADV_COLLAPSE at the right time. Also, if the userspace
> gets it wrong it may accidentally not call MADV_COLLAPSE (and not get
> any hugepages) or call MADV_COLLAPSE too early and have to deal with a
> storm of maybe hundreds of minor faults at once which may take too
> long to resolve and may impact guest stability, yes?

I disagree, I think this is state userspace needs to maintain anyway,
even if we ignore the use case James' series is about.

One example: today, you can't UFFDIO_CONTINUE a region which is
already mapped - you'll get -EEXIST. So, userspace needs to be sure
not to double-continue an area. We could think about relaxing this,
but there's a tradeoff - being more permissive means it's "easier to
use", but, it also means we're less strict about catching potentially
buggy userspaces.

There's another case that I don't see any way to get rid of. The way
live migration at least for GCE works is, we have two things
installing new pages: the on-demand fetcher, which reacts to UFFD
events and resolves them. And then we have the background fetcher,
which goes along and fetches pages which haven't been touched /
requested yet (and which may never be, it's not uncommon for a guest
to have at least *some* pages which are very infrequently / never
touched). In order for the background fetcher to know what pages to
transfer over the network, or not, userspace has to remember which
ones it's already installed.

Another point is, consider the use case of UFFDIO_CONTINUE over
UFFDIO_COPY. When userspace gets a UFFD event for a page, the
assumption is that it's somewhat likely the page is already up to
date, because we already copied it over from the source machine before
we stopped the guest and restarted it running on the target machine
("precopy"). So, we want to maintain a dirty bitmap, which tells us
which pages are clean or not - when we get a UFFD event, we check the
bitmap, and only if the page is dirty do we actually go fetch it over
the network - otherwise we just UFFDIO_CONTINUE and we're done.

>
> For these reasons I think automatic collapsing is something that will
> eventually be implemented by us or someone else, and at that point
> MADV_COLLAPSE for hugetlb memory will become obsolete; i.e. this patch
> is adding a userspace API that will probably need to be maintained for
> perpetuity but actually is likely going to be going obsolete "soon".
> For this reason I had hoped that automatic collapsing would come with
> V1.
>
> I wonder if we can have a very simple first try at automatic
> collapsing for V1? I.e., can we support collapsing to the hstate size
> and only that? So 4K pages can only be either collapsed to 2MB or 1G
> on x86 depending on the hstate size. I think this may be not too
> difficult to implement: we can have a counter similar to mapcount that
> tracks how many of the subpages are mapped (subpage_mapcount). Once
> all the subpages are mapped (the counter reaches a certain value),
> trigger collapsing similar to hstate size MADV_COLLAPSE.

I'm not sure I agree this is likely.

Two problems:

One is, say you UFFDIO_CONTINUE a 4k PTE. If we wanted collapsing to
happen automatically, we'd need to answer the question: is this the
last 4k PTE in a 2M region, so now it can be collapsed? Today the only
way to know is to go check - walk the PTEs. This is expensive, and
it's something we'd have to do on each and every UFFDIO_CONTINUE
operation -- this sucks because we're incurring the cost on every
operation, even though most of them (only 1 / 512, say) the answer
will be "no it wasn't the last one, we can't collapse yet". For
on-demand paging, it's really critical installing the page is as fast
as possible -- in an ideal world it would be exactly as fast as a
"normal" minor fault and the guest would not even be able to tell at
all that it was in the process of being migrated.

Now, as you pointed out, we can just store a mapcount somewhere which
keeps track of how many PTEs in each 2M region are installed or not.
So, then we can more quickly check in UFFDIO_CONTINUE. But, we have
the memory overhead and CPU time overhead of maintaining this
metadata. And, it's not like having the kernel do this means userspace
doesn't have to - like I described above, I think userspace would
*also* need to keep track of this same thing anyway, so now we're
doing it 2x.

Another problem I see is, it seems like collapsing automatically would
involve letting UFFD know a bit too much for my liking about hugetlbfs
internals. It seems to me more ideal to have it know as little as
possible about how hugetlbfs works internally.



Also, there are some benefits to letting userspace decide when / if to
collapse.

For example, userspace might decide it prefers to MADV_COLLAPSE
immediately, in the demand paging thread. Or, it might decide it's
okay to let it be collapsed a bit later, and leave that up to some
other background thread. It might MADV_COLLAPSE as soon as it sees a
complete 2M region, or maybe it wants to batch things up and waits
until it has a full 1G region to collapse. It might also do different
things for different regions, e.g. depending on if they were hot or
cold (demand paged vs. background fetched). I don't see any single
"right way" to do things here, I just see tradeoffs, which userspace
is in a good position to decide on.

>
> I gather that no one else reviewing this has raised this issue thus
> far so it might not be a big deal and I will continue to review the
> RFC, but I had hoped for automatic collapsing myself for the reasons
> above.
>
> > >
> > > > ---- HugeTLB Changes ----
> > > >
> > > > - Mapcount
> > > > The way mapcount is handled is different from the way that it was handled
> > > > before. If the PUD for a hugepage is not none, a hugepage's mapcount will
> > > > be increased. This scheme means that, for hugepages that aren't mapped at
> > > > high granularity, their mapcounts will remain the same as what they would
> > > > have been pre-HGM.
> > > >
> > >
> > > Sorry, I didn't quite follow this. It says mapcount is handled
> > > differently, but the same if the page is not mapped at high
> > > granularity. Can you elaborate on how the mapcount handling will be
> > > different when the page is mapped at high granularity?
> >
> > I guess I didn't phrase this very well. For the sake of simplicity,
> > consider 1G pages on x86, typically mapped with leaf-level PUDs.
> > Previously, there were two possibilities for how a hugepage was
> > mapped, either it was (1) completely mapped (PUD is present and a
> > leaf), or (2) it wasn't mapped (PUD is none). Now we have a third
> > case, where the PUD is not none but also not a leaf (this usually
> > means that the page is partially mapped). We handle this case as if
> > the whole page was mapped. That is, if we partially map a hugepage
> > that was previously unmapped (making the PUD point to PMDs), we
> > increment its mapcount, and if we completely unmap a partially mapped
> > hugepage (making the PUD none), we decrement its mapcount. If we
> > collapse a non-leaf PUD to a leaf PUD, we don't change mapcount.
> >
> > It is possible for a PUD to be present and not a leaf (mapcount has
> > been incremented) but for the page to still be unmapped: if the PMDs
> > (or PTEs) underneath are all none. This case is atypical, and as of
> > this RFC (without bestowing MADV_DONTNEED with HGM flexibility), I
> > think it would be very difficult to get this to happen.
> >
>
> Thank you for the detailed explanation. Please add it to the cover letter.
>
> I wonder the case "PUD present but all the PMD are none": is that a
> bug? I don't understand the usefulness of that. Not a comment on this
> patch but rather a curiosity.
>
> > >
> > > > - Page table walking and manipulation
> > > > A new function, hugetlb_walk_to, handles walking HugeTLB page tables for
> > > > high-granularity mappings. Eventually, it's possible to merge
> > > > hugetlb_walk_to with huge_pte_offset and huge_pte_alloc.
> > > >
> > > > We keep track of HugeTLB page table entries with a new struct, hugetlb_pte.
> > > > This is because we generally need to know the "size" of a PTE (previously
> > > > always just huge_page_size(hstate)).
> > > >
> > > > For every page table manipulation function that has a huge version (e.g.
> > > > huge_ptep_get and ptep_get), there is a wrapper for it (e.g.
> > > > hugetlb_ptep_get).  The correct version is used depending on if a HugeTLB
> > > > PTE really is "huge".
> > > >
> > > > - Synchronization
> > > > For existing bits of HugeTLB, synchronization is unchanged. For splitting
> > > > and collapsing HugeTLB PTEs, we require that the i_mmap_rw_sem is held for
> > > > writing, and for doing high-granularity page table walks, we require it to
> > > > be held for reading.
> > > >
> > > > ---- Limitations & Future Changes ----
> > > >
> > > > This patch series only implements high-granularity mapping for VM_SHARED
> > > > VMAs.  I intend to implement enough HGM to support 4K unmapping for memory
> > > > failure recovery for both shared and private mappings.
> > > >
> > > > The memory failure use case poses its own challenges that can be
> > > > addressed, but I will do so in a separate RFC.
> > > >
> > > > Performance has not been heavily scrutinized with this patch series. There
> > > > are places where lock contention can significantly reduce performance. This
> > > > will be addressed later.
> > > >
> > > > The patch series, as it stands right now, is compatible with the VMEMMAP
> > > > page struct optimization[3], as we do not need to modify data contained
> > > > in the subpage page structs.
> > > >
> > > > Other omissions:
> > > >  - Compatibility with userfaultfd write-protect (will be included in v1).
> > > >  - Support for mremap() (will be included in v1). This looks a lot like
> > > >    the support we have for fork().
> > > >  - Documentation changes (will be included in v1).
> > > >  - Completely ignores PMD sharing and hugepage migration (will be included
> > > >    in v1).
> > > >  - Implementations for architectures that don't use GENERAL_HUGETLB other
> > > >    than arm64.
> > > >
> > > > ---- Patch Breakdown ----
> > > >
> > > > Patch 1     - Preliminary changes
> > > > Patch 2-10  - HugeTLB HGM core changes
> > > > Patch 11-13 - HugeTLB HGM page table walking functionality
> > > > Patch 14-19 - HugeTLB HGM compatibility with other bits
> > > > Patch 20-23 - Userfaultfd and collapse changes
> > > > Patch 24-26 - arm64 support and selftests
> > > >
> > > > [1] This used to be called HugeTLB double mapping, a bad and confusing
> > > >     name. "High-granularity mapping" is not a great name either. I am open
> > > >     to better names.
> > >
> > > I would drop 1 extra word and do "granular mapping", as in the mapping
> > > is more granular than what it normally is (2MB/1G, etc).
> >
> > Noted. :)

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-29  6:09     ` Muchun Song
@ 2022-06-29 21:03       ` Mike Kravetz
  2022-06-29 21:39         ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: Mike Kravetz @ 2022-06-29 21:03 UTC (permalink / raw)
  To: Muchun Song
  Cc: James Houghton, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/29/22 14:09, Muchun Song wrote:
> On Mon, Jun 27, 2022 at 01:51:53PM -0700, Mike Kravetz wrote:
> > On 06/24/22 17:36, James Houghton wrote:
> > > This is needed to handle PTL locking with high-granularity mapping. We
> > > won't always be using the PMD-level PTL even if we're using the 2M
> > > hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> > > case, we need to lock the PTL for the 4K PTE.
> > 
> > I'm not really sure why this would be required.
> > Why not use the PMD level lock for 4K PTEs?  Seems that would scale better
> > with less contention than using the more coarse mm lock.  
> >
> 
> Your words make me thing of another question unrelated to this patch.
> We __know__ that arm64 supports continues PTE HugeTLB. huge_pte_lockptr()
> did not consider this case, in this case, those HugeTLB pages are contended
> with mm lock. Seems we should optimize this case. Something like:
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 0d790fa3f297..68a1e071bfc0 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -893,7 +893,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>                                            struct mm_struct *mm, pte_t *pte)
>  {
> -       if (huge_page_size(h) == PMD_SIZE)
> +       if (huge_page_size(h) <= PMD_SIZE)
>                 return pmd_lockptr(mm, (pmd_t *) pte);
>         VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
>         return &mm->page_table_lock;
> 
> I did not check if elsewhere needs to be changed as well. Just a primary
> thought.

That seems perfectly reasonable to me.

Also unrelated, but using the pmd lock is REQUIRED for pmd sharing.  The
mm lock is process specific and does not synchronize shared access.  I
found that out the hard way. :)

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
  2022-06-29  6:39       ` Muchun Song
@ 2022-06-29 21:06         ` Mike Kravetz
  2022-06-29 21:13           ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: Mike Kravetz @ 2022-06-29 21:06 UTC (permalink / raw)
  To: Muchun Song
  Cc: James Houghton, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/29/22 14:39, Muchun Song wrote:
> On Tue, Jun 28, 2022 at 08:40:27AM -0700, James Houghton wrote:
> > On Mon, Jun 27, 2022 at 11:42 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >
> > > On 06/24/22 17:36, James Houghton wrote:
> > > > When using HugeTLB high-granularity mapping, we need to go through the
> > > > supported hugepage sizes in decreasing order so that we pick the largest
> > > > size that works. Consider the case where we're faulting in a 1G hugepage
> > > > for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> > > > a PUD. By going through the sizes in decreasing order, we will find that
> > > > PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
> > > >
> > >
> > > This may/will cause problems for gigantic hugetlb pages allocated at boot
> > > time.  See alloc_bootmem_huge_page() where a pointer to the associated hstate
> > > is encoded within the allocated hugetlb page.  These pages are added to
> > > hugetlb pools by the routine gather_bootmem_prealloc() which uses the saved
> > > hstate to add prep the gigantic page and add to the correct pool.  Currently,
> > > gather_bootmem_prealloc is called after hugetlb_init_hstates.  So, changing
> > > hstate order will cause errors.
> > >
> > > I do not see any reason why we could not call gather_bootmem_prealloc before
> > > hugetlb_init_hstates to avoid this issue.
> > 
> > Thanks for catching this, Mike. Your suggestion certainly seems to
> > work, but it also seems kind of error prone. I'll have to look at the
> > code more closely, but maybe it would be better if I just maintained a
> > separate `struct hstate *sorted_hstate_ptrs[]`, where the original
> 
> I don't think this is a good idea.  If you really rely on the order of
> the initialization in this patch.  The easier solution is changing
> huge_bootmem_page->hstate to huge_bootmem_page->hugepagesz. Then we
> can use size_to_hstate(huge_bootmem_page->hugepagesz) in
> gather_bootmem_prealloc().
> 

That is a much better solution.  Thanks Muchun!

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
  2022-06-29 21:06         ` Mike Kravetz
@ 2022-06-29 21:13           ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-06-29 21:13 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Wed, Jun 29, 2022 at 2:06 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 06/29/22 14:39, Muchun Song wrote:
> > On Tue, Jun 28, 2022 at 08:40:27AM -0700, James Houghton wrote:
> > > On Mon, Jun 27, 2022 at 11:42 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > > >
> > > > On 06/24/22 17:36, James Houghton wrote:
> > > > > When using HugeTLB high-granularity mapping, we need to go through the
> > > > > supported hugepage sizes in decreasing order so that we pick the largest
> > > > > size that works. Consider the case where we're faulting in a 1G hugepage
> > > > > for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> > > > > a PUD. By going through the sizes in decreasing order, we will find that
> > > > > PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
> > > > >
> > > >
> > > > This may/will cause problems for gigantic hugetlb pages allocated at boot
> > > > time.  See alloc_bootmem_huge_page() where a pointer to the associated hstate
> > > > is encoded within the allocated hugetlb page.  These pages are added to
> > > > hugetlb pools by the routine gather_bootmem_prealloc() which uses the saved
> > > > hstate to add prep the gigantic page and add to the correct pool.  Currently,
> > > > gather_bootmem_prealloc is called after hugetlb_init_hstates.  So, changing
> > > > hstate order will cause errors.
> > > >
> > > > I do not see any reason why we could not call gather_bootmem_prealloc before
> > > > hugetlb_init_hstates to avoid this issue.
> > >
> > > Thanks for catching this, Mike. Your suggestion certainly seems to
> > > work, but it also seems kind of error prone. I'll have to look at the
> > > code more closely, but maybe it would be better if I just maintained a
> > > separate `struct hstate *sorted_hstate_ptrs[]`, where the original
> >
> > I don't think this is a good idea.  If you really rely on the order of
> > the initialization in this patch.  The easier solution is changing
> > huge_bootmem_page->hstate to huge_bootmem_page->hugepagesz. Then we
> > can use size_to_hstate(huge_bootmem_page->hugepagesz) in
> > gather_bootmem_prealloc().
> >
>
> That is a much better solution.  Thanks Muchun!

Indeed. Thank you, Muchun. :)

>
> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-29 21:03       ` Mike Kravetz
@ 2022-06-29 21:39         ` James Houghton
  2022-06-29 22:24           ` Mike Kravetz
  0 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-06-29 21:39 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Wed, Jun 29, 2022 at 2:04 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 06/29/22 14:09, Muchun Song wrote:
> > On Mon, Jun 27, 2022 at 01:51:53PM -0700, Mike Kravetz wrote:
> > > On 06/24/22 17:36, James Houghton wrote:
> > > > This is needed to handle PTL locking with high-granularity mapping. We
> > > > won't always be using the PMD-level PTL even if we're using the 2M
> > > > hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> > > > case, we need to lock the PTL for the 4K PTE.
> > >
> > > I'm not really sure why this would be required.
> > > Why not use the PMD level lock for 4K PTEs?  Seems that would scale better
> > > with less contention than using the more coarse mm lock.
> > >
> >
> > Your words make me thing of another question unrelated to this patch.
> > We __know__ that arm64 supports continues PTE HugeTLB. huge_pte_lockptr()
> > did not consider this case, in this case, those HugeTLB pages are contended
> > with mm lock. Seems we should optimize this case. Something like:
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 0d790fa3f297..68a1e071bfc0 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -893,7 +893,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
> >  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> >                                            struct mm_struct *mm, pte_t *pte)
> >  {
> > -       if (huge_page_size(h) == PMD_SIZE)
> > +       if (huge_page_size(h) <= PMD_SIZE)
> >                 return pmd_lockptr(mm, (pmd_t *) pte);
> >         VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
> >         return &mm->page_table_lock;
> >
> > I did not check if elsewhere needs to be changed as well. Just a primary
> > thought.

I'm not sure if this works. If hugetlb_pte_size(hpte) is PAGE_SIZE,
then `hpte.ptep` will be a pte_t, not a pmd_t -- I assume that breaks
things. So I think, when doing a HugeTLB PT walk down to PAGE_SIZE, we
need to separately keep track of the location of the PMD so that we
can use it to get the PMD lock.

>
> That seems perfectly reasonable to me.
>
> Also unrelated, but using the pmd lock is REQUIRED for pmd sharing.  The
> mm lock is process specific and does not synchronize shared access.  I
> found that out the hard way. :)
>
> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-29 21:39         ` James Houghton
@ 2022-06-29 22:24           ` Mike Kravetz
  2022-06-30  9:35             ` Muchun Song
  0 siblings, 1 reply; 128+ messages in thread
From: Mike Kravetz @ 2022-06-29 22:24 UTC (permalink / raw)
  To: James Houghton
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/29/22 14:39, James Houghton wrote:
> On Wed, Jun 29, 2022 at 2:04 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 06/29/22 14:09, Muchun Song wrote:
> > > On Mon, Jun 27, 2022 at 01:51:53PM -0700, Mike Kravetz wrote:
> > > > On 06/24/22 17:36, James Houghton wrote:
> > > > > This is needed to handle PTL locking with high-granularity mapping. We
> > > > > won't always be using the PMD-level PTL even if we're using the 2M
> > > > > hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> > > > > case, we need to lock the PTL for the 4K PTE.
> > > >
> > > > I'm not really sure why this would be required.
> > > > Why not use the PMD level lock for 4K PTEs?  Seems that would scale better
> > > > with less contention than using the more coarse mm lock.
> > > >
> > >
> > > Your words make me thing of another question unrelated to this patch.
> > > We __know__ that arm64 supports continues PTE HugeTLB. huge_pte_lockptr()
> > > did not consider this case, in this case, those HugeTLB pages are contended
> > > with mm lock. Seems we should optimize this case. Something like:
> > >
> > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > index 0d790fa3f297..68a1e071bfc0 100644
> > > --- a/include/linux/hugetlb.h
> > > +++ b/include/linux/hugetlb.h
> > > @@ -893,7 +893,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
> > >  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> > >                                            struct mm_struct *mm, pte_t *pte)
> > >  {
> > > -       if (huge_page_size(h) == PMD_SIZE)
> > > +       if (huge_page_size(h) <= PMD_SIZE)
> > >                 return pmd_lockptr(mm, (pmd_t *) pte);
> > >         VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
> > >         return &mm->page_table_lock;
> > >
> > > I did not check if elsewhere needs to be changed as well. Just a primary
> > > thought.
> 
> I'm not sure if this works. If hugetlb_pte_size(hpte) is PAGE_SIZE,
> then `hpte.ptep` will be a pte_t, not a pmd_t -- I assume that breaks
> things. So I think, when doing a HugeTLB PT walk down to PAGE_SIZE, we
> need to separately keep track of the location of the PMD so that we
> can use it to get the PMD lock.

I assume Muchun was talking about changing this in current code (before
your changes) where huge_page_size(h) can not be PAGE_SIZE.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-29 22:24           ` Mike Kravetz
@ 2022-06-30  9:35             ` Muchun Song
  2022-06-30 16:23               ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: Muchun Song @ 2022-06-30  9:35 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: James Houghton, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Wed, Jun 29, 2022 at 03:24:45PM -0700, Mike Kravetz wrote:
> On 06/29/22 14:39, James Houghton wrote:
> > On Wed, Jun 29, 2022 at 2:04 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >
> > > On 06/29/22 14:09, Muchun Song wrote:
> > > > On Mon, Jun 27, 2022 at 01:51:53PM -0700, Mike Kravetz wrote:
> > > > > On 06/24/22 17:36, James Houghton wrote:
> > > > > > This is needed to handle PTL locking with high-granularity mapping. We
> > > > > > won't always be using the PMD-level PTL even if we're using the 2M
> > > > > > hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> > > > > > case, we need to lock the PTL for the 4K PTE.
> > > > >
> > > > > I'm not really sure why this would be required.
> > > > > Why not use the PMD level lock for 4K PTEs?  Seems that would scale better
> > > > > with less contention than using the more coarse mm lock.
> > > > >
> > > >
> > > > Your words make me thing of another question unrelated to this patch.
> > > > We __know__ that arm64 supports continues PTE HugeTLB. huge_pte_lockptr()
> > > > did not consider this case, in this case, those HugeTLB pages are contended
> > > > with mm lock. Seems we should optimize this case. Something like:
> > > >
> > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > > index 0d790fa3f297..68a1e071bfc0 100644
> > > > --- a/include/linux/hugetlb.h
> > > > +++ b/include/linux/hugetlb.h
> > > > @@ -893,7 +893,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
> > > >  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> > > >                                            struct mm_struct *mm, pte_t *pte)
> > > >  {
> > > > -       if (huge_page_size(h) == PMD_SIZE)
> > > > +       if (huge_page_size(h) <= PMD_SIZE)
> > > >                 return pmd_lockptr(mm, (pmd_t *) pte);
> > > >         VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
> > > >         return &mm->page_table_lock;
> > > >
> > > > I did not check if elsewhere needs to be changed as well. Just a primary
> > > > thought.
> > 
> > I'm not sure if this works. If hugetlb_pte_size(hpte) is PAGE_SIZE,
> > then `hpte.ptep` will be a pte_t, not a pmd_t -- I assume that breaks
> > things. So I think, when doing a HugeTLB PT walk down to PAGE_SIZE, we
> > need to separately keep track of the location of the PMD so that we
> > can use it to get the PMD lock.
> 
> I assume Muchun was talking about changing this in current code (before
> your changes) where huge_page_size(h) can not be PAGE_SIZE.
>

Yes, that's what I meant.

Thanks. 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-28  8:20         ` Dr. David Alan Gilbert
@ 2022-06-30 16:09           ` Peter Xu
  0 siblings, 0 replies; 128+ messages in thread
From: Peter Xu @ 2022-06-30 16:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: James Houghton, Matthew Wilcox, Mike Kravetz, Muchun Song,
	David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Manish Mishra, linux-mm, linux-kernel, Nadav Amit

On Tue, Jun 28, 2022 at 09:20:41AM +0100, Dr. David Alan Gilbert wrote:
> One other thing I thought of; you provide the modified 'CONTINUE'
> behaviour, which works for postcopy as long as you use two mappings in
> userspace; one protected by userfault, and one which you do the writes
> to, and then issue the CONTINUE into the protected mapping; that's fine,
> but it's not currently how we have our postcopy code wired up in qemu,
> we have one mapping and use UFFDIO_COPY to place the page.
> Requiring the two mappings is fine, but it's probably worth pointing out
> the need for it somewhere.

It'll be about CONTINUE, maybe not directly related to sub-page mapping,
but indeed that's something we may need to do.  It's also in my poc [1]
previously (I never got time to get back to it yet though..).

It's just that two mappings are not required.  E.g., one could use a fd on
the file and lseek()/write() to the file to update content rather than
using another mapping.  It might be just slower.

Or, IMHO an app can legally just delay faulting of some mapping using minor
mode and maybe the app doesn't even need to modify the page content before
CONTINUE for some reason, then it's even not needed to have either the
other mapping or the fd.  Fundamentally, MINOR mode and CONTINUE provides
another way to trap page fault when page cache existed.  It doesn't really
define whether or how the data will be modified.

It's just that for QEMU unfortunately we may need to have that two mappings
just for this use case indeed..

[1] https://github.com/xzpeter/qemu/commit/41538a9a8ff5c981af879afe48e4ecca9a1aabc8

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-30  9:35             ` Muchun Song
@ 2022-06-30 16:23               ` James Houghton
  2022-06-30 17:40                 ` Mike Kravetz
  2022-07-01  3:32                 ` Muchun Song
  0 siblings, 2 replies; 128+ messages in thread
From: James Houghton @ 2022-06-30 16:23 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Thu, Jun 30, 2022 at 2:35 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Wed, Jun 29, 2022 at 03:24:45PM -0700, Mike Kravetz wrote:
> > On 06/29/22 14:39, James Houghton wrote:
> > > On Wed, Jun 29, 2022 at 2:04 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > > >
> > > > On 06/29/22 14:09, Muchun Song wrote:
> > > > > On Mon, Jun 27, 2022 at 01:51:53PM -0700, Mike Kravetz wrote:
> > > > > > On 06/24/22 17:36, James Houghton wrote:
> > > > > > > This is needed to handle PTL locking with high-granularity mapping. We
> > > > > > > won't always be using the PMD-level PTL even if we're using the 2M
> > > > > > > hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> > > > > > > case, we need to lock the PTL for the 4K PTE.
> > > > > >
> > > > > > I'm not really sure why this would be required.
> > > > > > Why not use the PMD level lock for 4K PTEs?  Seems that would scale better
> > > > > > with less contention than using the more coarse mm lock.
> > > > > >
> > > > >
> > > > > Your words make me thing of another question unrelated to this patch.
> > > > > We __know__ that arm64 supports continues PTE HugeTLB. huge_pte_lockptr()
> > > > > did not consider this case, in this case, those HugeTLB pages are contended
> > > > > with mm lock. Seems we should optimize this case. Something like:
> > > > >
> > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > > > index 0d790fa3f297..68a1e071bfc0 100644
> > > > > --- a/include/linux/hugetlb.h
> > > > > +++ b/include/linux/hugetlb.h
> > > > > @@ -893,7 +893,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
> > > > >  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> > > > >                                            struct mm_struct *mm, pte_t *pte)
> > > > >  {
> > > > > -       if (huge_page_size(h) == PMD_SIZE)
> > > > > +       if (huge_page_size(h) <= PMD_SIZE)
> > > > >                 return pmd_lockptr(mm, (pmd_t *) pte);
> > > > >         VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
> > > > >         return &mm->page_table_lock;
> > > > >
> > > > > I did not check if elsewhere needs to be changed as well. Just a primary
> > > > > thought.
> > >
> > > I'm not sure if this works. If hugetlb_pte_size(hpte) is PAGE_SIZE,
> > > then `hpte.ptep` will be a pte_t, not a pmd_t -- I assume that breaks
> > > things. So I think, when doing a HugeTLB PT walk down to PAGE_SIZE, we
> > > need to separately keep track of the location of the PMD so that we
> > > can use it to get the PMD lock.
> >
> > I assume Muchun was talking about changing this in current code (before
> > your changes) where huge_page_size(h) can not be PAGE_SIZE.
> >
>
> Yes, that's what I meant.

Right -- but I think my point still stands. If `huge_page_size(h)` is
CONT_PTE_SIZE, then the `pte_t *` passed to `huge_pte_lockptr` will
*actually* point to a `pte_t` and not a `pmd_t` (I'm pretty sure the
distinction is important). So it seems like we need to separately keep
track of the real pmd_t that is being used in the CONT_PTE_SIZE case
(and therefore, when considering HGM, the PAGE_SIZE case).

However, we *can* make this optimization for CONT_PMD_SIZE (maybe this
is what you originally meant, Muchun?), so instead of
`huge_page_size(h) == PMD_SIZE`, we could do `huge_page_size(h) >=
PMD_SIZE && huge_page_size(h) < PUD_SIZE`.

>
> Thanks.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-30 16:23               ` James Houghton
@ 2022-06-30 17:40                 ` Mike Kravetz
  2022-07-01  3:32                 ` Muchun Song
  1 sibling, 0 replies; 128+ messages in thread
From: Mike Kravetz @ 2022-06-30 17:40 UTC (permalink / raw)
  To: James Houghton
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/30/22 09:23, James Houghton wrote:
> On Thu, Jun 30, 2022 at 2:35 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Wed, Jun 29, 2022 at 03:24:45PM -0700, Mike Kravetz wrote:
> > > On 06/29/22 14:39, James Houghton wrote:
> > > > On Wed, Jun 29, 2022 at 2:04 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > > > >
> > > > > On 06/29/22 14:09, Muchun Song wrote:
> > > > > > On Mon, Jun 27, 2022 at 01:51:53PM -0700, Mike Kravetz wrote:
> > > > > > > On 06/24/22 17:36, James Houghton wrote:
> > > > > > > > This is needed to handle PTL locking with high-granularity mapping. We
> > > > > > > > won't always be using the PMD-level PTL even if we're using the 2M
> > > > > > > > hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
> > > > > > > > case, we need to lock the PTL for the 4K PTE.
> > > > > > >
> > > > > > > I'm not really sure why this would be required.
> > > > > > > Why not use the PMD level lock for 4K PTEs?  Seems that would scale better
> > > > > > > with less contention than using the more coarse mm lock.
> > > > > > >
> > > > > >
> > > > > > Your words make me thing of another question unrelated to this patch.
> > > > > > We __know__ that arm64 supports continues PTE HugeTLB. huge_pte_lockptr()
> > > > > > did not consider this case, in this case, those HugeTLB pages are contended
> > > > > > with mm lock. Seems we should optimize this case. Something like:
> > > > > >
> > > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > > > > index 0d790fa3f297..68a1e071bfc0 100644
> > > > > > --- a/include/linux/hugetlb.h
> > > > > > +++ b/include/linux/hugetlb.h
> > > > > > @@ -893,7 +893,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
> > > > > >  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> > > > > >                                            struct mm_struct *mm, pte_t *pte)
> > > > > >  {
> > > > > > -       if (huge_page_size(h) == PMD_SIZE)
> > > > > > +       if (huge_page_size(h) <= PMD_SIZE)
> > > > > >                 return pmd_lockptr(mm, (pmd_t *) pte);
> > > > > >         VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
> > > > > >         return &mm->page_table_lock;
> > > > > >
> > > > > > I did not check if elsewhere needs to be changed as well. Just a primary
> > > > > > thought.
> > > >
> > > > I'm not sure if this works. If hugetlb_pte_size(hpte) is PAGE_SIZE,
> > > > then `hpte.ptep` will be a pte_t, not a pmd_t -- I assume that breaks
> > > > things. So I think, when doing a HugeTLB PT walk down to PAGE_SIZE, we
> > > > need to separately keep track of the location of the PMD so that we
> > > > can use it to get the PMD lock.
> > >
> > > I assume Muchun was talking about changing this in current code (before
> > > your changes) where huge_page_size(h) can not be PAGE_SIZE.
> > >
> >
> > Yes, that's what I meant.
> 
> Right -- but I think my point still stands. If `huge_page_size(h)` is
> CONT_PTE_SIZE, then the `pte_t *` passed to `huge_pte_lockptr` will
> *actually* point to a `pte_t` and not a `pmd_t` (I'm pretty sure the
> distinction is important). So it seems like we need to separately keep
> track of the real pmd_t that is being used in the CONT_PTE_SIZE case
> (and therefore, when considering HGM, the PAGE_SIZE case).

Ah yes, that is correct.  We would be passing in a pte not pmd in this
case.

> 
> However, we *can* make this optimization for CONT_PMD_SIZE (maybe this
> is what you originally meant, Muchun?), so instead of
> `huge_page_size(h) == PMD_SIZE`, we could do `huge_page_size(h) >=
> PMD_SIZE && huge_page_size(h) < PUD_SIZE`.
> 

Another 'optimization' may exist in hugetlb address range scanning code.
We currently have something like:

for addr=start, addr< end, addr += huge_page_size
	pte = huge_pte_offset(addr)
	ptl = huge_pte_lock(pte)
	...
	...
	spin_unlock(ptl)

Seems like ptl will be the same for all entries on the same pmd page.
We 'may' be able to go from 512 lock/unlock cycles to 1.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-28  0:04         ` Nadav Amit
@ 2022-06-30 19:21           ` Peter Xu
  2022-07-01  5:54             ` Nadav Amit
  0 siblings, 1 reply; 128+ messages in thread
From: Peter Xu @ 2022-06-30 19:21 UTC (permalink / raw)
  To: Nadav Amit
  Cc: James Houghton, Dr. David Alan Gilbert, Matthew Wilcox,
	Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra, linux-mm,
	linux-kernel

On Tue, Jun 28, 2022 at 12:04:28AM +0000, Nadav Amit wrote:
> > [1] commit 824ddc601adc ("userfaultfd: provide unmasked address on page-fault")
> 
> Indeed this change of behavior (not aligning to huge-pages when flags is
> not set) was unintentional. If you want to fix it in a separate patch so
> it would be backported, that may be a good idea.

The fix seems to be straightforward, though.  Nadav, wanna post a patch
yourself?

That seems to be an accident and it's just that having sub-page mapping
rely on the accident is probably not desirable..  So irrelevant of the
separate patch I'd suggest we keep the requirement on enabling the exact
addr feature for sub-page mapping.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument.
  2022-06-30 16:23               ` James Houghton
  2022-06-30 17:40                 ` Mike Kravetz
@ 2022-07-01  3:32                 ` Muchun Song
  1 sibling, 0 replies; 128+ messages in thread
From: Muchun Song @ 2022-07-01  3:32 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel, zhengqi.arch



> On Jul 1, 2022, at 00:23, James Houghton <jthoughton@google.com> wrote:
> 
> On Thu, Jun 30, 2022 at 2:35 AM Muchun Song <songmuchun@bytedance.com> wrote:
>> 
>> On Wed, Jun 29, 2022 at 03:24:45PM -0700, Mike Kravetz wrote:
>>> On 06/29/22 14:39, James Houghton wrote:
>>>> On Wed, Jun 29, 2022 at 2:04 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>>> 
>>>>> On 06/29/22 14:09, Muchun Song wrote:
>>>>>> On Mon, Jun 27, 2022 at 01:51:53PM -0700, Mike Kravetz wrote:
>>>>>>> On 06/24/22 17:36, James Houghton wrote:
>>>>>>>> This is needed to handle PTL locking with high-granularity mapping. We
>>>>>>>> won't always be using the PMD-level PTL even if we're using the 2M
>>>>>>>> hugepage hstate. It's possible that we're dealing with 4K PTEs, in which
>>>>>>>> case, we need to lock the PTL for the 4K PTE.
>>>>>>> 
>>>>>>> I'm not really sure why this would be required.
>>>>>>> Why not use the PMD level lock for 4K PTEs? Seems that would scale better
>>>>>>> with less contention than using the more coarse mm lock.
>>>>>>> 
>>>>>> 
>>>>>> Your words make me thing of another question unrelated to this patch.
>>>>>> We __know__ that arm64 supports continues PTE HugeTLB. huge_pte_lockptr()
>>>>>> did not consider this case, in this case, those HugeTLB pages are contended
>>>>>> with mm lock. Seems we should optimize this case. Something like:
>>>>>> 
>>>>>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>>>>>> index 0d790fa3f297..68a1e071bfc0 100644
>>>>>> --- a/include/linux/hugetlb.h
>>>>>> +++ b/include/linux/hugetlb.h
>>>>>> @@ -893,7 +893,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
>>>>>> static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>>>>>> struct mm_struct *mm, pte_t *pte)
>>>>>> {
>>>>>> - if (huge_page_size(h) == PMD_SIZE)
>>>>>> + if (huge_page_size(h) <= PMD_SIZE)
>>>>>> return pmd_lockptr(mm, (pmd_t *) pte);
>>>>>> VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
>>>>>> return &mm->page_table_lock;
>>>>>> 
>>>>>> I did not check if elsewhere needs to be changed as well. Just a primary
>>>>>> thought.
>>>> 
>>>> I'm not sure if this works. If hugetlb_pte_size(hpte) is PAGE_SIZE,
>>>> then `hpte.ptep` will be a pte_t, not a pmd_t -- I assume that breaks
>>>> things. So I think, when doing a HugeTLB PT walk down to PAGE_SIZE, we
>>>> need to separately keep track of the location of the PMD so that we
>>>> can use it to get the PMD lock.
>>> 
>>> I assume Muchun was talking about changing this in current code (before
>>> your changes) where huge_page_size(h) can not be PAGE_SIZE.
>>> 
>> 
>> Yes, that's what I meant.
> 
> Right -- but I think my point still stands. If `huge_page_size(h)` is
> CONT_PTE_SIZE, then the `pte_t *` passed to `huge_pte_lockptr` will
> *actually* point to a `pte_t` and not a `pmd_t` (I'm pretty sure the

Right. It is a pte in this case.

> distinction is important). So it seems like we need to separately keep
> track of the real pmd_t that is being used in the CONT_PTE_SIZE case

If we want to find pmd_t from pte_t, I think we can introduce a new field
in struct page just like the thread [1] does.

[1] https://lore.kernel.org/lkml/20211110105428.32458-7-zhengqi.arch@bytedance.com/

> (and therefore, when considering HGM, the PAGE_SIZE case).
> 
> However, we *can* make this optimization for CONT_PMD_SIZE (maybe this
> is what you originally meant, Muchun?), so instead of
> `huge_page_size(h) == PMD_SIZE`, we could do `huge_page_size(h) >=
> PMD_SIZE && huge_page_size(h) < PUD_SIZE`.

Right. It is a good start to optimize CONT_PMD_SIZE case.

Thanks.

> 
>> 
>> Thanks.


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping
  2022-06-30 19:21           ` Peter Xu
@ 2022-07-01  5:54             ` Nadav Amit
  0 siblings, 0 replies; 128+ messages in thread
From: Nadav Amit @ 2022-07-01  5:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: James Houghton, Dr. David Alan Gilbert, Matthew Wilcox,
	Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra, linux-mm,
	linux-kernel

On Jun 30, 2022, at 12:21 PM, Peter Xu <peterx@redhat.com> wrote:

> ⚠ External Email
> 
> On Tue, Jun 28, 2022 at 12:04:28AM +0000, Nadav Amit wrote:
>>> [1] commit 824ddc601adc ("userfaultfd: provide unmasked address on page-fault")
>> 
>> Indeed this change of behavior (not aligning to huge-pages when flags is
>> not set) was unintentional. If you want to fix it in a separate patch so
>> it would be backported, that may be a good idea.
> 
> The fix seems to be straightforward, though.  Nadav, wanna post a patch
> yourself?

Yes, even I can do it :)

Just busy right now, so I’ll try to do it over the weekend.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift
  2022-06-28 21:58   ` Mina Almasry
@ 2022-07-07 21:39     ` Mike Kravetz
  2022-07-08 15:52     ` James Houghton
  1 sibling, 0 replies; 128+ messages in thread
From: Mike Kravetz @ 2022-07-07 21:39 UTC (permalink / raw)
  To: Mina Almasry
  Cc: James Houghton, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/28/22 14:58, Mina Almasry wrote:
> On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> >
> > This is a helper macro to loop through all the usable page sizes for a
> > high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will
> > loop, in descending order, through the page sizes that HugeTLB supports
> > for this architecture; it always includes PAGE_SIZE.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >  mm/hugetlb.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 8b10b941458d..557b0afdb503 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6989,6 +6989,16 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> >         /* All shared VMAs have HGM enabled. */
> >         return vma->vm_flags & VM_SHARED;
> >  }
> > +static unsigned int __shift_for_hstate(struct hstate *h)
> > +{
> > +       if (h >= &hstates[hugetlb_max_hstate])
> > +               return PAGE_SHIFT;
> 
> h > &hstates[hugetlb_max_hstate] means that h is out of bounds, no? am
> I missing something here?
> 
> So is this intending to do:
> 
> if (h == hstates[hugetlb_max_hstate]
>     return PAGE_SHIFT;
> 
> ? If so, could we write it as so?
> 
> I'm also wondering why __shift_for_hstate(hstate[hugetlb_max_hstate])
> == PAGE_SHIFT? Isn't the last hstate the smallest hstate which should
> be 2MB on x86? Shouldn't this return PMD_SHIFT in that case?
> 

I too am missing how this is working for similar reasons.
-- 
Mike Kravetz

> > +       return huge_page_shift(h);
> > +}
> > +#define for_each_hgm_shift(hstate, tmp_h, shift) \
> > +       for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
> > +                              (tmp_h) <= &hstates[hugetlb_max_hstate]; \
> > +                              (tmp_h)++)
> >  #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
> >
> >  /*
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks
  2022-06-27 13:07   ` manish.mishra
@ 2022-07-07 23:03     ` Mike Kravetz
  0 siblings, 0 replies; 128+ messages in thread
From: Mike Kravetz @ 2022-07-07 23:03 UTC (permalink / raw)
  To: manish.mishra
  Cc: James Houghton, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/27/22 18:37, manish.mishra wrote:
> 
> On 24/06/22 11:06 pm, James Houghton wrote:
> > This adds it for architectures that use GENERAL_HUGETLB, including x86.

I expect this will be used in arch independent code and there will need to
be at least a stub for all architectures?

> > 
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >   include/linux/hugetlb.h |  2 ++
> >   mm/hugetlb.c            | 45 +++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 47 insertions(+)
> > 
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index e7a6b944d0cc..605aa19d8572 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -258,6 +258,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> >   			unsigned long addr, unsigned long sz);
> >   pte_t *huge_pte_offset(struct mm_struct *mm,
> >   		       unsigned long addr, unsigned long sz);
> > +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
> > +		    unsigned long addr, unsigned long sz, bool stop_at_none);
> >   int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
> >   				unsigned long *addr, pte_t *ptep);
> >   void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 557b0afdb503..3ec2a921ee6f 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6981,6 +6981,51 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
> >   	return (pte_t *)pmd;
> >   }
> 
> 
> not strong feeling but this name looks confusing to me as it does
> 
> not only walk over page-tables but can also alloc.
> 

Somewhat agree.  With this we have:
- huge_pte_offset to walk/lookup a pte
- huge_pte_alloc to allocate ptes
- hugetlb_walk_to which does some/all of both

Do not see anything obviously wrong with the routine, but future
direction would be to combine/clean up these routines with similar
purpose.
-- 
Mike Kravetz

> > +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
> > +		    unsigned long addr, unsigned long sz, bool stop_at_none)
> > +{
> > +	pte_t *ptep;
> > +
> > +	if (!hpte->ptep) {
> > +		pgd_t *pgd = pgd_offset(mm, addr);
> > +
> > +		if (!pgd)
> > +			return -ENOMEM;
> > +		ptep = (pte_t *)p4d_alloc(mm, pgd, addr);
> > +		if (!ptep)
> > +			return -ENOMEM;
> > +		hugetlb_pte_populate(hpte, ptep, P4D_SHIFT);
> > +	}
> > +
> > +	while (hugetlb_pte_size(hpte) > sz &&
> > +			!hugetlb_pte_present_leaf(hpte) &&
> > +			!(stop_at_none && hugetlb_pte_none(hpte))) {
> 
> Should this ordering of if-else condition be in reverse, i mean it will look
> 
> more natural and possibly less condition checks as we go from top to bottom.
> 
> > +		if (hpte->shift == PMD_SHIFT) {
> > +			ptep = pte_alloc_map(mm, (pmd_t *)hpte->ptep, addr);
> > +			if (!ptep)
> > +				return -ENOMEM;
> > +			hpte->shift = PAGE_SHIFT;
> > +			hpte->ptep = ptep;
> > +		} else if (hpte->shift == PUD_SHIFT) {
> > +			ptep = (pte_t *)pmd_alloc(mm, (pud_t *)hpte->ptep,
> > +						  addr);
> > +			if (!ptep)
> > +				return -ENOMEM;
> > +			hpte->shift = PMD_SHIFT;
> > +			hpte->ptep = ptep;
> > +		} else if (hpte->shift == P4D_SHIFT) {
> > +			ptep = (pte_t *)pud_alloc(mm, (p4d_t *)hpte->ptep,
> > +						  addr);
> > +			if (!ptep)
> > +				return -ENOMEM;
> > +			hpte->shift = PUD_SHIFT;
> > +			hpte->ptep = ptep;
> > +		} else
> > +			BUG();
> > +	}
> > +	return 0;
> > +}
> > +
> >   #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
> >   #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift
  2022-06-28 21:58   ` Mina Almasry
  2022-07-07 21:39     ` Mike Kravetz
@ 2022-07-08 15:52     ` James Houghton
  2022-07-09 21:55       ` Mina Almasry
  1 sibling, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-07-08 15:52 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Tue, Jun 28, 2022 at 2:58 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> >
> > This is a helper macro to loop through all the usable page sizes for a
> > high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will
> > loop, in descending order, through the page sizes that HugeTLB supports
> > for this architecture; it always includes PAGE_SIZE.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >  mm/hugetlb.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 8b10b941458d..557b0afdb503 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6989,6 +6989,16 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> >         /* All shared VMAs have HGM enabled. */
> >         return vma->vm_flags & VM_SHARED;
> >  }
> > +static unsigned int __shift_for_hstate(struct hstate *h)
> > +{
> > +       if (h >= &hstates[hugetlb_max_hstate])
> > +               return PAGE_SHIFT;
>
> h > &hstates[hugetlb_max_hstate] means that h is out of bounds, no? am
> I missing something here?

Yeah, it goes out of bounds intentionally. Maybe I should have called
this out. We need for_each_hgm_shift to include PAGE_SHIFT, and there
is no hstate for it. So to handle it, we iterate past the end of the
hstate array, and when we are past the end, we return PAGE_SHIFT and
stop iterating further. This is admittedly kind of gross; if you have
other suggestions for a way to get a clean `for_each_hgm_shift` macro
like this, I'm all ears. :)

>
> So is this intending to do:
>
> if (h == hstates[hugetlb_max_hstate]
>     return PAGE_SHIFT;
>
> ? If so, could we write it as so?

Yeah, this works. I'll write it this way instead. If that condition is
true, `h` is out of bounds (`hugetlb_max_hstate` is past the end, not
the index for the final element). I guess `hugetlb_max_hstate` is a
bit of a misnomer.

>
> I'm also wondering why __shift_for_hstate(hstate[hugetlb_max_hstate])
> == PAGE_SHIFT? Isn't the last hstate the smallest hstate which should
> be 2MB on x86? Shouldn't this return PMD_SHIFT in that case?

`huge_page_shift(hstate[hugetlb_max_hstate-1])` is PMD_SHIFT on x86.
Actually reading `hstate[hugetlb_max_hstate]` would be bad, which is
why `__shift_for_hstate` exists: to return PAGE_SIZE when we would
otherwise attempt to compute
`huge_page_shift(hstate[hugetlb_max_hstate])`.

>
> > +       return huge_page_shift(h);
> > +}
> > +#define for_each_hgm_shift(hstate, tmp_h, shift) \
> > +       for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
> > +                              (tmp_h) <= &hstates[hugetlb_max_hstate]; \

Note the <= here. If we wanted to always remain inbounds here, we'd
want < instead. But we don't have an hstate for PAGE_SIZE.

> > +                              (tmp_h)++)
> >  #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
> >
> >  /*
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift
  2022-07-08 15:52     ` James Houghton
@ 2022-07-09 21:55       ` Mina Almasry
  0 siblings, 0 replies; 128+ messages in thread
From: Mina Almasry @ 2022-07-09 21:55 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jul 8, 2022 at 8:52 AM James Houghton <jthoughton@google.com> wrote:
>
> On Tue, Jun 28, 2022 at 2:58 PM Mina Almasry <almasrymina@google.com> wrote:
> >
> > On Fri, Jun 24, 2022 at 10:37 AM James Houghton <jthoughton@google.com> wrote:
> > >
> > > This is a helper macro to loop through all the usable page sizes for a
> > > high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will
> > > loop, in descending order, through the page sizes that HugeTLB supports
> > > for this architecture; it always includes PAGE_SIZE.
> > >
> > > Signed-off-by: James Houghton <jthoughton@google.com>
> > > ---
> > >  mm/hugetlb.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 8b10b941458d..557b0afdb503 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -6989,6 +6989,16 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> > >         /* All shared VMAs have HGM enabled. */
> > >         return vma->vm_flags & VM_SHARED;
> > >  }
> > > +static unsigned int __shift_for_hstate(struct hstate *h)
> > > +{
> > > +       if (h >= &hstates[hugetlb_max_hstate])
> > > +               return PAGE_SHIFT;
> >
> > h > &hstates[hugetlb_max_hstate] means that h is out of bounds, no? am
> > I missing something here?
>
> Yeah, it goes out of bounds intentionally. Maybe I should have called
> this out. We need for_each_hgm_shift to include PAGE_SHIFT, and there
> is no hstate for it. So to handle it, we iterate past the end of the
> hstate array, and when we are past the end, we return PAGE_SHIFT and
> stop iterating further. This is admittedly kind of gross; if you have
> other suggestions for a way to get a clean `for_each_hgm_shift` macro
> like this, I'm all ears. :)
>
> >
> > So is this intending to do:
> >
> > if (h == hstates[hugetlb_max_hstate]
> >     return PAGE_SHIFT;
> >
> > ? If so, could we write it as so?
>
> Yeah, this works. I'll write it this way instead. If that condition is
> true, `h` is out of bounds (`hugetlb_max_hstate` is past the end, not
> the index for the final element). I guess `hugetlb_max_hstate` is a
> bit of a misnomer.
>
> >
> > I'm also wondering why __shift_for_hstate(hstate[hugetlb_max_hstate])
> > == PAGE_SHIFT? Isn't the last hstate the smallest hstate which should
> > be 2MB on x86? Shouldn't this return PMD_SHIFT in that case?
>
> `huge_page_shift(hstate[hugetlb_max_hstate-1])` is PMD_SHIFT on x86.
> Actually reading `hstate[hugetlb_max_hstate]` would be bad, which is
> why `__shift_for_hstate` exists: to return PAGE_SIZE when we would
> otherwise attempt to compute
> `huge_page_shift(hstate[hugetlb_max_hstate])`.
>
> >
> > > +       return huge_page_shift(h);
> > > +}
> > > +#define for_each_hgm_shift(hstate, tmp_h, shift) \
> > > +       for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
> > > +                              (tmp_h) <= &hstates[hugetlb_max_hstate]; \
>
> Note the <= here. If we wanted to always remain inbounds here, we'd
> want < instead. But we don't have an hstate for PAGE_SIZE.
>

I see, thanks for the explanation. I can see 2 options here to make
the code more understandable:

option (a), don't go past the array. I.e. for_each_hgm_shift() will
loop over all the hugetlb-supported shifts on this arch, and the
calling code falls back to PAGE_SHIFT if the hugetlb page shifts don't
work for it. I admit that could lead to code dup in the calling code,
but I have not gotten to the patch that calls this yet.

option (b), simply add a comment and/or make it more obvious that
you're intentionally going out of bounds, and you want to loop over
PAGE_SHIFT at the end. Something like:

+ /* Returns huge_page_shift(h) if h is a pointer to an hstate in
hstates[] array, PAGE_SIZE otherwise. */
+static unsigned int __shift_for_hstate(struct hstate *h)
+{
+       if (h < &hstates[0] || h > &hstates[hugetlb_max_hstate - 1])
+               return PAGE_SHIFT;
+       return huge_page_shift(h);
+}
+
+ /* Loops over all the HGM shifts supported on this arch, from the
largest shift possible down to PAGE_SHIFT inclusive. */
+#define for_each_hgm_shift(hstate, tmp_h, shift) \
+       for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
+                              (tmp_h) <= &hstates[hugetlb_max_hstate]; \
+                              (tmp_h)++)
 #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */

> > > +                              (tmp_h)++)
> > >  #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
> > >
> > >  /*
> > > --
> > > 2.37.0.rc0.161.g10f37bed90-goog
> > >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
                     ` (3 preceding siblings ...)
  2022-06-28 20:44   ` Mike Kravetz
@ 2022-07-11 23:32   ` Mike Kravetz
  2022-07-12  9:42     ` Dr. David Alan Gilbert
  2022-09-08 17:38   ` Peter Xu
  5 siblings, 1 reply; 128+ messages in thread
From: Mike Kravetz @ 2022-07-11 23:32 UTC (permalink / raw)
  To: James Houghton
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/24/22 17:36, James Houghton wrote:
> After high-granularity mapping, page table entries for HugeTLB pages can
> be of any size/type. (For example, we can have a 1G page mapped with a
> mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> PTE after we have done a page table walk.

This has been rolling around in my head.

Will this first use case (live migration) actually make use of this
'mixed mapping' model where hugetlb pages could be mapped at the PUD,
PMD and PTE level all within the same vma?  I only understand the use
case from a high level.  But, it seems that we would want to only want
to migrate PTE (or PMD) sized pages and not necessarily a mix.

The only reason I ask is because the code might be much simpler if all
mappings within a vma were of the same size.  Of course, the
performance/latency of converting a large mapping may be prohibitively
expensive.

Looking to the future when supporting memory error handling/page poisoning
it seems like we would certainly want multiple size mappings.

Just a thought.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range
  2022-06-24 17:36 ` [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
@ 2022-07-11 23:41   ` Mike Kravetz
  2022-07-12 17:19     ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: Mike Kravetz @ 2022-07-11 23:41 UTC (permalink / raw)
  To: James Houghton
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On 06/24/22 17:36, James Houghton wrote:
> This allows fork() to work with high-granularity mappings. The page
> table structure is copied such that partially mapped regions will remain
> partially mapped in the same way for the new process.
> 
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  mm/hugetlb.c | 74 +++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 59 insertions(+), 15 deletions(-)

FYI -
With https://lore.kernel.org/linux-mm/20220621235620.291305-5-mike.kravetz@oracle.com/
copy_hugetlb_page_range() should never be called for shared mappings.
Since HGM only works on shared mappings, code in this patch will never
be executed.

I have a TODO to remove shared mapping support from copy_hugetlb_page_range.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-07-11 23:32   ` Mike Kravetz
@ 2022-07-12  9:42     ` Dr. David Alan Gilbert
  2022-07-12 17:51       ` Mike Kravetz
  2022-07-15 16:35       ` Peter Xu
  0 siblings, 2 replies; 128+ messages in thread
From: Dr. David Alan Gilbert @ 2022-07-12  9:42 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: James Houghton, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Manish Mishra, linux-mm, linux-kernel

* Mike Kravetz (mike.kravetz@oracle.com) wrote:
> On 06/24/22 17:36, James Houghton wrote:
> > After high-granularity mapping, page table entries for HugeTLB pages can
> > be of any size/type. (For example, we can have a 1G page mapped with a
> > mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> > PTE after we have done a page table walk.
> 
> This has been rolling around in my head.
> 
> Will this first use case (live migration) actually make use of this
> 'mixed mapping' model where hugetlb pages could be mapped at the PUD,
> PMD and PTE level all within the same vma?  I only understand the use
> case from a high level.  But, it seems that we would want to only want
> to migrate PTE (or PMD) sized pages and not necessarily a mix.

I suspect we would pick one size and use that size for all transfers
when in postcopy; not sure if there are any side cases though.

> The only reason I ask is because the code might be much simpler if all
> mappings within a vma were of the same size.  Of course, the
> performance/latency of converting a large mapping may be prohibitively
> expensive.

Imagine we're migrating a few TB VM, backed by 1GB hugepages, I'm guessing it
would be nice to clean up the PTE/PMDs for split 1GB pages as they're
completed rather than having thousands of them for the whole VM.
(I'm not sure if that is already doable)

Dave

> Looking to the future when supporting memory error handling/page poisoning
> it seems like we would certainly want multiple size mappings.
> 
> Just a thought.
> -- 
> Mike Kravetz
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range
  2022-07-11 23:41   ` Mike Kravetz
@ 2022-07-12 17:19     ` James Houghton
  2022-07-12 18:06       ` Mike Kravetz
  0 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-07-12 17:19 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Mon, Jul 11, 2022 at 4:41 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 06/24/22 17:36, James Houghton wrote:
> > This allows fork() to work with high-granularity mappings. The page
> > table structure is copied such that partially mapped regions will remain
> > partially mapped in the same way for the new process.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >  mm/hugetlb.c | 74 +++++++++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 59 insertions(+), 15 deletions(-)
>
> FYI -
> With https://lore.kernel.org/linux-mm/20220621235620.291305-5-mike.kravetz@oracle.com/
> copy_hugetlb_page_range() should never be called for shared mappings.
> Since HGM only works on shared mappings, code in this patch will never
> be executed.
>
> I have a TODO to remove shared mapping support from copy_hugetlb_page_range.

Thanks Mike. If I understand things correctly, it seems like I don't
have to do anything to support fork() then; we just don't copy the
page table structure from the old VMA to the new one. That is, as
opposed to having the same bits of the old VMA being mapped in the new
one, the new VMA will have an empty page table. This would slightly
change how userfaultfd's behavior on the new VMA, but that seems fine
to me.

- James

> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-07-12  9:42     ` Dr. David Alan Gilbert
@ 2022-07-12 17:51       ` Mike Kravetz
  2022-07-15 16:35       ` Peter Xu
  1 sibling, 0 replies; 128+ messages in thread
From: Mike Kravetz @ 2022-07-12 17:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: James Houghton, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Manish Mishra, linux-mm, linux-kernel

On 07/12/22 10:42, Dr. David Alan Gilbert wrote:
> * Mike Kravetz (mike.kravetz@oracle.com) wrote:
> > On 06/24/22 17:36, James Houghton wrote:
> > > After high-granularity mapping, page table entries for HugeTLB pages can
> > > be of any size/type. (For example, we can have a 1G page mapped with a
> > > mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> > > PTE after we have done a page table walk.
> > 
> > This has been rolling around in my head.
> > 
> > Will this first use case (live migration) actually make use of this
> > 'mixed mapping' model where hugetlb pages could be mapped at the PUD,
> > PMD and PTE level all within the same vma?  I only understand the use
> > case from a high level.  But, it seems that we would want to only want
> > to migrate PTE (or PMD) sized pages and not necessarily a mix.
> 
> I suspect we would pick one size and use that size for all transfers
> when in postcopy; not sure if there are any side cases though.
> 
> > The only reason I ask is because the code might be much simpler if all
> > mappings within a vma were of the same size.  Of course, the
> > performance/latency of converting a large mapping may be prohibitively
> > expensive.
> 
> Imagine we're migrating a few TB VM, backed by 1GB hugepages, I'm guessing it
> would be nice to clean up the PTE/PMDs for split 1GB pages as they're
> completed rather than having thousands of them for the whole VM.
> (I'm not sure if that is already doable)

Seems that would be doable by calling MADV_COLLAPSE for 1GB pages as
they are completed.

Thanks for information on post copy.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range
  2022-07-12 17:19     ` James Houghton
@ 2022-07-12 18:06       ` Mike Kravetz
  2022-07-15 21:39         ` Axel Rasmussen
  0 siblings, 1 reply; 128+ messages in thread
From: Mike Kravetz @ 2022-07-12 18:06 UTC (permalink / raw)
  To: James Houghton, Axel Rasmussen
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Mina Almasry, Jue Wang, Manish Mishra, Dr . David Alan Gilbert,
	linux-mm, linux-kernel

On 07/12/22 10:19, James Houghton wrote:
> On Mon, Jul 11, 2022 at 4:41 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 06/24/22 17:36, James Houghton wrote:
> > > This allows fork() to work with high-granularity mappings. The page
> > > table structure is copied such that partially mapped regions will remain
> > > partially mapped in the same way for the new process.
> > >
> > > Signed-off-by: James Houghton <jthoughton@google.com>
> > > ---
> > >  mm/hugetlb.c | 74 +++++++++++++++++++++++++++++++++++++++++-----------
> > >  1 file changed, 59 insertions(+), 15 deletions(-)
> >
> > FYI -
> > With https://lore.kernel.org/linux-mm/20220621235620.291305-5-mike.kravetz@oracle.com/
> > copy_hugetlb_page_range() should never be called for shared mappings.
> > Since HGM only works on shared mappings, code in this patch will never
> > be executed.
> >
> > I have a TODO to remove shared mapping support from copy_hugetlb_page_range.
> 
> Thanks Mike. If I understand things correctly, it seems like I don't
> have to do anything to support fork() then; we just don't copy the
> page table structure from the old VMA to the new one.

Yes, for now.  We will not copy the page tables for shared mappings.
When adding support for private mapping, we will need to handle the
HGM case.

>                                                       That is, as
> opposed to having the same bits of the old VMA being mapped in the new
> one, the new VMA will have an empty page table. This would slightly
> change how userfaultfd's behavior on the new VMA, but that seems fine
> to me.

Right.  Since the 'mapping size information' is essentially carried in
the page tables, it will be lost if page tables are not copied.

Not sure if anyone would depend on that behavior.

Axel, this may also impact minor fault processing.  Any concerns?
Patch is sitting in Andrew's tree for next merge window.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 06/26] mm: make free_p?d_range functions public
  2022-06-28 20:35   ` Mike Kravetz
@ 2022-07-12 20:52     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-07-12 20:52 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Muchun Song, Peter Xu, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Tue, Jun 28, 2022 at 1:35 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 06/24/22 17:36, James Houghton wrote:
> > This makes them usable for HugeTLB page table freeing operations.
> > After HugeTLB high-granularity mapping, the page table for a HugeTLB VMA
> > can get more complex, and these functions handle freeing page tables
> > generally.
> >
>
> Hmmmm?
>
> free_pgd_range is not generally called directly for hugetlb mappings.
> There is a wrapper hugetlb_free_pgd_range which can have architecture
> specific implementations.  It makes me wonder if these lower level
> routines can be directly used on hugetlb mappings.  My 'guess' is that any
> such details will be hidden in the callers.  Suspect this will become clear
> in later patches.

Thanks for pointing out hugetlb_free_pgd_range. I think I'll need to
change how freeing HugeTLB HGM PTEs is written, because as written, we
don't do any architecture-specific things. I think I have a good idea
for what I need to do, probably something like this: make
`hugetlb_free_range` overridable, and then provide an implementation
for it for all the architectures that
HAVE_ARCH_HUGETLB_FREE_PGD_RANGE. Making the regular `free_p?d_range`
functions public *does* help with implementing the regular/general
`hugetlb_free_range` function though, so I think this commit is still
useful.

- James

> --
> Mike Kravetz
>
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >  include/linux/mm.h | 7 +++++++
> >  mm/memory.c        | 8 ++++----
> >  2 files changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index bc8f326be0ce..07f5da512147 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1847,6 +1847,13 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
> >
> >  struct mmu_notifier_range;
> >
> > +void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, unsigned long addr);
> > +void free_pmd_range(struct mmu_gather *tlb, pud_t *pud, unsigned long addr,
> > +             unsigned long end, unsigned long floor, unsigned long ceiling);
> > +void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d, unsigned long addr,
> > +             unsigned long end, unsigned long floor, unsigned long ceiling);
> > +void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd, unsigned long addr,
> > +             unsigned long end, unsigned long floor, unsigned long ceiling);
> >  void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
> >               unsigned long end, unsigned long floor, unsigned long ceiling);
> >  int
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 7a089145cad4..bb3b9b5b94fb 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -227,7 +227,7 @@ static void check_sync_rss_stat(struct task_struct *task)
> >   * Note: this doesn't free the actual pages themselves. That
> >   * has been handled earlier when unmapping all the memory regions.
> >   */
> > -static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
> > +void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
> >                          unsigned long addr)
> >  {
> >       pgtable_t token = pmd_pgtable(*pmd);
> > @@ -236,7 +236,7 @@ static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
> >       mm_dec_nr_ptes(tlb->mm);
> >  }
> >
> > -static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
> > +inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
> >                               unsigned long addr, unsigned long end,
> >                               unsigned long floor, unsigned long ceiling)
> >  {
> > @@ -270,7 +270,7 @@ static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
> >       mm_dec_nr_pmds(tlb->mm);
> >  }
> >
> > -static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
> > +inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
> >                               unsigned long addr, unsigned long end,
> >                               unsigned long floor, unsigned long ceiling)
> >  {
> > @@ -304,7 +304,7 @@ static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
> >       mm_dec_nr_puds(tlb->mm);
> >  }
> >
> > -static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
> > +inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
> >                               unsigned long addr, unsigned long end,
> >                               unsigned long floor, unsigned long ceiling)
> >  {
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE
  2022-06-24 17:36 ` [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE James Houghton
@ 2022-07-15 16:21   ` Peter Xu
  2022-07-15 16:58     ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: Peter Xu @ 2022-07-15 16:21 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 05:36:50PM +0000, James Houghton wrote:
> The changes here are very similar to the changes made to
> hugetlb_no_page, where we do a high-granularity page table walk and
> do accounting slightly differently because we are mapping only a piece
> of a page.
> 
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  fs/userfaultfd.c        |  3 +++
>  include/linux/hugetlb.h |  6 +++--
>  mm/hugetlb.c            | 54 +++++++++++++++++++++-----------------
>  mm/userfaultfd.c        | 57 +++++++++++++++++++++++++++++++----------
>  4 files changed, 82 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index e943370107d0..77c1b8a7d0b9 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -245,6 +245,9 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
>  	if (!ptep)
>  		goto out;
>  
> +	if (hugetlb_hgm_enabled(vma))
> +		goto out;
> +

This is weird.  It means we'll never wait for sub-page mapping enabled
vmas.  Why?

Not to mention hugetlb_hgm_enabled() currently is simply VM_SHARED, so it
means we'll stop waiting for all shared hugetlbfs uffd page faults..

I'd expect in the in-house postcopy tests you should see vcpu threads
spinning on the page faults until it's serviced.

IMO we still need to properly wait when the pgtable doesn't have the
faulted address covered.  For sub-page mapping it'll probably need to walk
into sub-page levels.

>  	ret = false;
>  	pte = huge_ptep_get(ptep);
>  
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ac4ac8fbd901..c207b1ac6195 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -221,13 +221,15 @@ unsigned long hugetlb_total_pages(void);
>  vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  			unsigned long address, unsigned int flags);
>  #ifdef CONFIG_USERFAULTFD
> -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
> +int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> +				struct hugetlb_pte *dst_hpte,
>  				struct vm_area_struct *dst_vma,
>  				unsigned long dst_addr,
>  				unsigned long src_addr,
>  				enum mcopy_atomic_mode mode,
>  				struct page **pagep,
> -				bool wp_copy);
> +				bool wp_copy,
> +				bool new_mapping);
>  #endif /* CONFIG_USERFAULTFD */
>  bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
>  						struct vm_area_struct *vma,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 0ec2f231524e..09fa57599233 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5808,6 +5808,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>  		vma_end_reservation(h, vma, haddr);
>  	}
>  
> +	/* This lock will get pretty expensive at 4K. */
>  	ptl = hugetlb_pte_lock(mm, hpte);
>  	ret = 0;
>  	/* If pte changed from under us, retry */
> @@ -6098,24 +6099,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   * modifications for huge pages.
>   */
>  int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> -			    pte_t *dst_pte,
> +			    struct hugetlb_pte *dst_hpte,
>  			    struct vm_area_struct *dst_vma,
>  			    unsigned long dst_addr,
>  			    unsigned long src_addr,
>  			    enum mcopy_atomic_mode mode,
>  			    struct page **pagep,
> -			    bool wp_copy)
> +			    bool wp_copy,
> +			    bool new_mapping)
>  {
>  	bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
>  	struct hstate *h = hstate_vma(dst_vma);
>  	struct address_space *mapping = dst_vma->vm_file->f_mapping;
> +	unsigned long haddr = dst_addr & huge_page_mask(h);
>  	pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr);
>  	unsigned long size;
>  	int vm_shared = dst_vma->vm_flags & VM_SHARED;
>  	pte_t _dst_pte;
>  	spinlock_t *ptl;
>  	int ret = -ENOMEM;
> -	struct page *page;
> +	struct page *page, *subpage;
>  	int writable;
>  	bool page_in_pagecache = false;
>  
> @@ -6130,12 +6133,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>  		 * a non-missing case. Return -EEXIST.
>  		 */
>  		if (vm_shared &&
> -		    hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) {
> +		    hugetlbfs_pagecache_present(h, dst_vma, haddr)) {
>  			ret = -EEXIST;
>  			goto out;
>  		}
>  
> -		page = alloc_huge_page(dst_vma, dst_addr, 0);
> +		page = alloc_huge_page(dst_vma, haddr, 0);
>  		if (IS_ERR(page)) {
>  			ret = -ENOMEM;
>  			goto out;
> @@ -6151,13 +6154,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>  			/* Free the allocated page which may have
>  			 * consumed a reservation.
>  			 */
> -			restore_reserve_on_error(h, dst_vma, dst_addr, page);
> +			restore_reserve_on_error(h, dst_vma, haddr, page);
>  			put_page(page);
>  
>  			/* Allocate a temporary page to hold the copied
>  			 * contents.
>  			 */
> -			page = alloc_huge_page_vma(h, dst_vma, dst_addr);
> +			page = alloc_huge_page_vma(h, dst_vma, haddr);
>  			if (!page) {
>  				ret = -ENOMEM;
>  				goto out;
> @@ -6171,14 +6174,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>  		}
>  	} else {
>  		if (vm_shared &&
> -		    hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) {
> +		    hugetlbfs_pagecache_present(h, dst_vma, haddr)) {
>  			put_page(*pagep);
>  			ret = -EEXIST;
>  			*pagep = NULL;
>  			goto out;
>  		}
>  
> -		page = alloc_huge_page(dst_vma, dst_addr, 0);
> +		page = alloc_huge_page(dst_vma, haddr, 0);
>  		if (IS_ERR(page)) {
>  			ret = -ENOMEM;
>  			*pagep = NULL;
> @@ -6216,8 +6219,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>  		page_in_pagecache = true;
>  	}
>  
> -	ptl = huge_pte_lockptr(huge_page_shift(h), dst_mm, dst_pte);
> -	spin_lock(ptl);
> +	ptl = hugetlb_pte_lock(dst_mm, dst_hpte);
>  
>  	/*
>  	 * Recheck the i_size after holding PT lock to make sure not
> @@ -6239,14 +6241,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>  	 * registered, we firstly wr-protect a none pte which has no page cache
>  	 * page backing it, then access the page.
>  	 */
> -	if (!huge_pte_none_mostly(huge_ptep_get(dst_pte)))
> +	if (!hugetlb_pte_none_mostly(dst_hpte))
>  		goto out_release_unlock;
>  
> -	if (vm_shared) {
> -		page_dup_file_rmap(page, true);
> -	} else {
> -		ClearHPageRestoreReserve(page);
> -		hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
> +	if (new_mapping) {

IIUC you wanted to avoid the mapcount accountings when it's the sub-page
that was going to be mapped.

Is it a must we get this only from the caller?  Can we know we're doing
sub-page mapping already here and make a decision with e.g. dst_hpte?

It looks weird to me to pass this explicitly from the caller, especially
that's when we don't really have the pgtable lock so I'm wondering about
possible race conditions too on having stale new_mapping values.

> +		if (vm_shared) {
> +			page_dup_file_rmap(page, true);
> +		} else {
> +			ClearHPageRestoreReserve(page);
> +			hugepage_add_new_anon_rmap(page, dst_vma, haddr);
> +		}
>  	}
>  
>  	/*
> @@ -6258,7 +6262,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>  	else
>  		writable = dst_vma->vm_flags & VM_WRITE;
>  
> -	_dst_pte = make_huge_pte(dst_vma, page, writable);
> +	subpage = hugetlb_find_subpage(h, page, dst_addr);
> +	if (subpage != page)
> +		BUG_ON(!hugetlb_hgm_enabled(dst_vma));
> +
> +	_dst_pte = make_huge_pte(dst_vma, subpage, writable);
>  	/*
>  	 * Always mark UFFDIO_COPY page dirty; note that this may not be
>  	 * extremely important for hugetlbfs for now since swapping is not
> @@ -6271,14 +6279,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>  	if (wp_copy)
>  		_dst_pte = huge_pte_mkuffd_wp(_dst_pte);
>  
> -	set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> +	set_huge_pte_at(dst_mm, dst_addr, dst_hpte->ptep, _dst_pte);
>  
> -	(void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte,
> -					dst_vma->vm_flags & VM_WRITE);
> -	hugetlb_count_add(pages_per_huge_page(h), dst_mm);
> +	(void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_hpte->ptep,
> +			_dst_pte, dst_vma->vm_flags & VM_WRITE);
> +	hugetlb_count_add(hugetlb_pte_size(dst_hpte) / PAGE_SIZE, dst_mm);
>  
>  	/* No need to invalidate - it was non-present before */
> -	update_mmu_cache(dst_vma, dst_addr, dst_pte);
> +	update_mmu_cache(dst_vma, dst_addr, dst_hpte->ptep);
>  
>  	spin_unlock(ptl);
>  	if (!is_continue)
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 4f4892a5f767..ee40d98068bf 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -310,14 +310,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
>  {
>  	int vm_shared = dst_vma->vm_flags & VM_SHARED;
>  	ssize_t err;
> -	pte_t *dst_pte;
>  	unsigned long src_addr, dst_addr;
>  	long copied;
>  	struct page *page;
> -	unsigned long vma_hpagesize;
> +	unsigned long vma_hpagesize, vma_altpagesize;
>  	pgoff_t idx;
>  	u32 hash;
>  	struct address_space *mapping;
> +	bool use_hgm = hugetlb_hgm_enabled(dst_vma) &&
> +		mode == MCOPY_ATOMIC_CONTINUE;
> +	struct hstate *h = hstate_vma(dst_vma);
>  
>  	/*
>  	 * There is no default zero huge page for all huge page sizes as
> @@ -335,12 +337,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
>  	copied = 0;
>  	page = NULL;
>  	vma_hpagesize = vma_kernel_pagesize(dst_vma);
> +	if (use_hgm)
> +		vma_altpagesize = PAGE_SIZE;

Do we need to check the "len" to know whether we should use sub-page
mapping or original hpage size?  E.g. any old UFFDIO_CONTINUE code will
still want the old behavior I think.

> +	else
> +		vma_altpagesize = vma_hpagesize;
>  
>  	/*
>  	 * Validate alignment based on huge page size
>  	 */
>  	err = -EINVAL;
> -	if (dst_start & (vma_hpagesize - 1) || len & (vma_hpagesize - 1))
> +	if (dst_start & (vma_altpagesize - 1) || len & (vma_altpagesize - 1))
>  		goto out_unlock;
>  
>  retry:
> @@ -361,6 +367,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
>  		vm_shared = dst_vma->vm_flags & VM_SHARED;
>  	}
>  
> +	BUG_ON(!vm_shared && use_hgm);
> +
>  	/*
>  	 * If not shared, ensure the dst_vma has a anon_vma.
>  	 */
> @@ -371,11 +379,13 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
>  	}
>  
>  	while (src_addr < src_start + len) {
> +		struct hugetlb_pte hpte;
> +		bool new_mapping;
>  		BUG_ON(dst_addr >= dst_start + len);
>  
>  		/*
>  		 * Serialize via i_mmap_rwsem and hugetlb_fault_mutex.
> -		 * i_mmap_rwsem ensures the dst_pte remains valid even
> +		 * i_mmap_rwsem ensures the hpte.ptep remains valid even
>  		 * in the case of shared pmds.  fault mutex prevents
>  		 * races with other faulting threads.
>  		 */
> @@ -383,27 +393,47 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
>  		i_mmap_lock_read(mapping);
>  		idx = linear_page_index(dst_vma, dst_addr);
>  		hash = hugetlb_fault_mutex_hash(mapping, idx);
> +		/* This lock will get expensive at 4K. */
>  		mutex_lock(&hugetlb_fault_mutex_table[hash]);
>  
> -		err = -ENOMEM;
> -		dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize);
> -		if (!dst_pte) {
> +		err = 0;
> +
> +		pte_t *ptep = huge_pte_alloc(dst_mm, dst_vma, dst_addr,
> +					     vma_hpagesize);
> +		if (!ptep)
> +			err = -ENOMEM;
> +		else {
> +			hugetlb_pte_populate(&hpte, ptep,
> +					huge_page_shift(h));
> +			/*
> +			 * If the hstate-level PTE is not none, then a mapping
> +			 * was previously established.
> +			 * The per-hpage mutex prevents double-counting.
> +			 */
> +			new_mapping = hugetlb_pte_none(&hpte);
> +			if (use_hgm)
> +				err = hugetlb_alloc_largest_pte(&hpte, dst_mm, dst_vma,
> +								dst_addr,
> +								dst_start + len);
> +		}
> +
> +		if (err) {
>  			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>  			i_mmap_unlock_read(mapping);
>  			goto out_unlock;
>  		}
>  
>  		if (mode != MCOPY_ATOMIC_CONTINUE &&
> -		    !huge_pte_none_mostly(huge_ptep_get(dst_pte))) {
> +		    !hugetlb_pte_none_mostly(&hpte)) {
>  			err = -EEXIST;
>  			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>  			i_mmap_unlock_read(mapping);
>  			goto out_unlock;
>  		}
>  
> -		err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma,
> +		err = hugetlb_mcopy_atomic_pte(dst_mm, &hpte, dst_vma,
>  					       dst_addr, src_addr, mode, &page,
> -					       wp_copy);
> +					       wp_copy, new_mapping);
>  
>  		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>  		i_mmap_unlock_read(mapping);
> @@ -413,6 +443,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
>  		if (unlikely(err == -ENOENT)) {
>  			mmap_read_unlock(dst_mm);
>  			BUG_ON(!page);
> +			BUG_ON(hpte.shift != huge_page_shift(h));
>  
>  			err = copy_huge_page_from_user(page,
>  						(const void __user *)src_addr,
> @@ -430,9 +461,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
>  			BUG_ON(page);
>  
>  		if (!err) {
> -			dst_addr += vma_hpagesize;
> -			src_addr += vma_hpagesize;
> -			copied += vma_hpagesize;
> +			dst_addr += hugetlb_pte_size(&hpte);
> +			src_addr += hugetlb_pte_size(&hpte);
> +			copied += hugetlb_pte_size(&hpte);
>  
>  			if (fatal_signal_pending(current))
>  				err = -EINTR;
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-07-12  9:42     ` Dr. David Alan Gilbert
  2022-07-12 17:51       ` Mike Kravetz
@ 2022-07-15 16:35       ` Peter Xu
  2022-07-15 21:52         ` Axel Rasmussen
  1 sibling, 1 reply; 128+ messages in thread
From: Peter Xu @ 2022-07-15 16:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Mike Kravetz, James Houghton, Muchun Song, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Manish Mishra, linux-mm, linux-kernel

On Tue, Jul 12, 2022 at 10:42:17AM +0100, Dr. David Alan Gilbert wrote:
> * Mike Kravetz (mike.kravetz@oracle.com) wrote:
> > On 06/24/22 17:36, James Houghton wrote:
> > > After high-granularity mapping, page table entries for HugeTLB pages can
> > > be of any size/type. (For example, we can have a 1G page mapped with a
> > > mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> > > PTE after we have done a page table walk.
> > 
> > This has been rolling around in my head.
> > 
> > Will this first use case (live migration) actually make use of this
> > 'mixed mapping' model where hugetlb pages could be mapped at the PUD,
> > PMD and PTE level all within the same vma?  I only understand the use
> > case from a high level.  But, it seems that we would want to only want
> > to migrate PTE (or PMD) sized pages and not necessarily a mix.
> 
> I suspect we would pick one size and use that size for all transfers
> when in postcopy; not sure if there are any side cases though.

Yes, I'm also curious whether the series can be much simplified if we have
a static way to do sub-page mappings, e.g., when sub-page mapping enabled
we always map to PAGE_SIZE only; if not we keep the old hpage size mappings
only.

> > Looking to the future when supporting memory error handling/page poisoning
> > it seems like we would certainly want multiple size mappings.

If we treat page poisoning as very rare events anyway, IMHO it'll even be
acceptable if we always split 1G pages into 4K ones but only rule out the
real poisoned 4K phys page.  After all IIUC the major goal is for reducing
poisoned memory footprint.

It'll be definitely nicer if we can keep 511 2M pages and 511 4K pages in
that case so the 511 2M pages performs slightly better, but it'll be
something extra to me.  It can always be something worked upon a simpler
version of sub-page mapping which is only PAGE_SIZE based.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE
  2022-07-15 16:21   ` Peter Xu
@ 2022-07-15 16:58     ` James Houghton
  2022-07-15 17:20       ` Peter Xu
  0 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-07-15 16:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jul 15, 2022 at 9:21 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Jun 24, 2022 at 05:36:50PM +0000, James Houghton wrote:
> > The changes here are very similar to the changes made to
> > hugetlb_no_page, where we do a high-granularity page table walk and
> > do accounting slightly differently because we are mapping only a piece
> > of a page.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >  fs/userfaultfd.c        |  3 +++
> >  include/linux/hugetlb.h |  6 +++--
> >  mm/hugetlb.c            | 54 +++++++++++++++++++++-----------------
> >  mm/userfaultfd.c        | 57 +++++++++++++++++++++++++++++++----------
> >  4 files changed, 82 insertions(+), 38 deletions(-)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index e943370107d0..77c1b8a7d0b9 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -245,6 +245,9 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
> >       if (!ptep)
> >               goto out;
> >
> > +     if (hugetlb_hgm_enabled(vma))
> > +             goto out;
> > +
>
> This is weird.  It means we'll never wait for sub-page mapping enabled
> vmas.  Why?
>

`ret` is true in this case, so we're actually *always* waiting.

> Not to mention hugetlb_hgm_enabled() currently is simply VM_SHARED, so it
> means we'll stop waiting for all shared hugetlbfs uffd page faults..
>
> I'd expect in the in-house postcopy tests you should see vcpu threads
> spinning on the page faults until it's serviced.
>
> IMO we still need to properly wait when the pgtable doesn't have the
> faulted address covered.  For sub-page mapping it'll probably need to walk
> into sub-page levels.

Ok, SGTM. I'll do that for the next version. I'm not sure of the
consequences of returning `true` here when we should be returning
`false`.

>
> >       ret = false;
> >       pte = huge_ptep_get(ptep);
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index ac4ac8fbd901..c207b1ac6195 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -221,13 +221,15 @@ unsigned long hugetlb_total_pages(void);
> >  vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >                       unsigned long address, unsigned int flags);
> >  #ifdef CONFIG_USERFAULTFD
> > -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
> > +int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> > +                             struct hugetlb_pte *dst_hpte,
> >                               struct vm_area_struct *dst_vma,
> >                               unsigned long dst_addr,
> >                               unsigned long src_addr,
> >                               enum mcopy_atomic_mode mode,
> >                               struct page **pagep,
> > -                             bool wp_copy);
> > +                             bool wp_copy,
> > +                             bool new_mapping);
> >  #endif /* CONFIG_USERFAULTFD */
> >  bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
> >                                               struct vm_area_struct *vma,
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 0ec2f231524e..09fa57599233 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -5808,6 +5808,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >               vma_end_reservation(h, vma, haddr);
> >       }
> >
> > +     /* This lock will get pretty expensive at 4K. */
> >       ptl = hugetlb_pte_lock(mm, hpte);
> >       ret = 0;
> >       /* If pte changed from under us, retry */
> > @@ -6098,24 +6099,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >   * modifications for huge pages.
> >   */
> >  int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> > -                         pte_t *dst_pte,
> > +                         struct hugetlb_pte *dst_hpte,
> >                           struct vm_area_struct *dst_vma,
> >                           unsigned long dst_addr,
> >                           unsigned long src_addr,
> >                           enum mcopy_atomic_mode mode,
> >                           struct page **pagep,
> > -                         bool wp_copy)
> > +                         bool wp_copy,
> > +                         bool new_mapping)
> >  {
> >       bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
> >       struct hstate *h = hstate_vma(dst_vma);
> >       struct address_space *mapping = dst_vma->vm_file->f_mapping;
> > +     unsigned long haddr = dst_addr & huge_page_mask(h);
> >       pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr);
> >       unsigned long size;
> >       int vm_shared = dst_vma->vm_flags & VM_SHARED;
> >       pte_t _dst_pte;
> >       spinlock_t *ptl;
> >       int ret = -ENOMEM;
> > -     struct page *page;
> > +     struct page *page, *subpage;
> >       int writable;
> >       bool page_in_pagecache = false;
> >
> > @@ -6130,12 +6133,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >                * a non-missing case. Return -EEXIST.
> >                */
> >               if (vm_shared &&
> > -                 hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) {
> > +                 hugetlbfs_pagecache_present(h, dst_vma, haddr)) {
> >                       ret = -EEXIST;
> >                       goto out;
> >               }
> >
> > -             page = alloc_huge_page(dst_vma, dst_addr, 0);
> > +             page = alloc_huge_page(dst_vma, haddr, 0);
> >               if (IS_ERR(page)) {
> >                       ret = -ENOMEM;
> >                       goto out;
> > @@ -6151,13 +6154,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >                       /* Free the allocated page which may have
> >                        * consumed a reservation.
> >                        */
> > -                     restore_reserve_on_error(h, dst_vma, dst_addr, page);
> > +                     restore_reserve_on_error(h, dst_vma, haddr, page);
> >                       put_page(page);
> >
> >                       /* Allocate a temporary page to hold the copied
> >                        * contents.
> >                        */
> > -                     page = alloc_huge_page_vma(h, dst_vma, dst_addr);
> > +                     page = alloc_huge_page_vma(h, dst_vma, haddr);
> >                       if (!page) {
> >                               ret = -ENOMEM;
> >                               goto out;
> > @@ -6171,14 +6174,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >               }
> >       } else {
> >               if (vm_shared &&
> > -                 hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) {
> > +                 hugetlbfs_pagecache_present(h, dst_vma, haddr)) {
> >                       put_page(*pagep);
> >                       ret = -EEXIST;
> >                       *pagep = NULL;
> >                       goto out;
> >               }
> >
> > -             page = alloc_huge_page(dst_vma, dst_addr, 0);
> > +             page = alloc_huge_page(dst_vma, haddr, 0);
> >               if (IS_ERR(page)) {
> >                       ret = -ENOMEM;
> >                       *pagep = NULL;
> > @@ -6216,8 +6219,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >               page_in_pagecache = true;
> >       }
> >
> > -     ptl = huge_pte_lockptr(huge_page_shift(h), dst_mm, dst_pte);
> > -     spin_lock(ptl);
> > +     ptl = hugetlb_pte_lock(dst_mm, dst_hpte);
> >
> >       /*
> >        * Recheck the i_size after holding PT lock to make sure not
> > @@ -6239,14 +6241,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >        * registered, we firstly wr-protect a none pte which has no page cache
> >        * page backing it, then access the page.
> >        */
> > -     if (!huge_pte_none_mostly(huge_ptep_get(dst_pte)))
> > +     if (!hugetlb_pte_none_mostly(dst_hpte))
> >               goto out_release_unlock;
> >
> > -     if (vm_shared) {
> > -             page_dup_file_rmap(page, true);
> > -     } else {
> > -             ClearHPageRestoreReserve(page);
> > -             hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
> > +     if (new_mapping) {
>
> IIUC you wanted to avoid the mapcount accountings when it's the sub-page
> that was going to be mapped.
>
> Is it a must we get this only from the caller?  Can we know we're doing
> sub-page mapping already here and make a decision with e.g. dst_hpte?
>
> It looks weird to me to pass this explicitly from the caller, especially
> that's when we don't really have the pgtable lock so I'm wondering about
> possible race conditions too on having stale new_mapping values.

The only way to know what the correct value for `new_mapping` should
be is to know if we had to change the hstate-level P*D to non-none to
service this UFFDIO_CONTINUE request. I'll see if there is a nice way
to do that check in `hugetlb_mcopy_atomic_pte`. Right now there is no
race, because we synchronize on the per-hpage mutex.

>
> > +             if (vm_shared) {
> > +                     page_dup_file_rmap(page, true);
> > +             } else {
> > +                     ClearHPageRestoreReserve(page);
> > +                     hugepage_add_new_anon_rmap(page, dst_vma, haddr);
> > +             }
> >       }
> >
> >       /*
> > @@ -6258,7 +6262,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >       else
> >               writable = dst_vma->vm_flags & VM_WRITE;
> >
> > -     _dst_pte = make_huge_pte(dst_vma, page, writable);
> > +     subpage = hugetlb_find_subpage(h, page, dst_addr);
> > +     if (subpage != page)
> > +             BUG_ON(!hugetlb_hgm_enabled(dst_vma));
> > +
> > +     _dst_pte = make_huge_pte(dst_vma, subpage, writable);
> >       /*
> >        * Always mark UFFDIO_COPY page dirty; note that this may not be
> >        * extremely important for hugetlbfs for now since swapping is not
> > @@ -6271,14 +6279,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >       if (wp_copy)
> >               _dst_pte = huge_pte_mkuffd_wp(_dst_pte);
> >
> > -     set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> > +     set_huge_pte_at(dst_mm, dst_addr, dst_hpte->ptep, _dst_pte);
> >
> > -     (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte,
> > -                                     dst_vma->vm_flags & VM_WRITE);
> > -     hugetlb_count_add(pages_per_huge_page(h), dst_mm);
> > +     (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_hpte->ptep,
> > +                     _dst_pte, dst_vma->vm_flags & VM_WRITE);
> > +     hugetlb_count_add(hugetlb_pte_size(dst_hpte) / PAGE_SIZE, dst_mm);
> >
> >       /* No need to invalidate - it was non-present before */
> > -     update_mmu_cache(dst_vma, dst_addr, dst_pte);
> > +     update_mmu_cache(dst_vma, dst_addr, dst_hpte->ptep);
> >
> >       spin_unlock(ptl);
> >       if (!is_continue)
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index 4f4892a5f767..ee40d98068bf 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -310,14 +310,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> >  {
> >       int vm_shared = dst_vma->vm_flags & VM_SHARED;
> >       ssize_t err;
> > -     pte_t *dst_pte;
> >       unsigned long src_addr, dst_addr;
> >       long copied;
> >       struct page *page;
> > -     unsigned long vma_hpagesize;
> > +     unsigned long vma_hpagesize, vma_altpagesize;
> >       pgoff_t idx;
> >       u32 hash;
> >       struct address_space *mapping;
> > +     bool use_hgm = hugetlb_hgm_enabled(dst_vma) &&
> > +             mode == MCOPY_ATOMIC_CONTINUE;
> > +     struct hstate *h = hstate_vma(dst_vma);
> >
> >       /*
> >        * There is no default zero huge page for all huge page sizes as
> > @@ -335,12 +337,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> >       copied = 0;
> >       page = NULL;
> >       vma_hpagesize = vma_kernel_pagesize(dst_vma);
> > +     if (use_hgm)
> > +             vma_altpagesize = PAGE_SIZE;
>
> Do we need to check the "len" to know whether we should use sub-page
> mapping or original hpage size?  E.g. any old UFFDIO_CONTINUE code will
> still want the old behavior I think.

I think that's a fair point; however, if we enable HGM and the address
and len happen to be hstate-aligned, we basically do the same thing as
if HGM wasn't enabled. It could be a minor performance optimization to
do `vma_altpagesize=vma_hpagesize` in that case, but in terms of how
the page tables are set up, the end result would be the same.

>
> > +     else
> > +             vma_altpagesize = vma_hpagesize;
> >
> >       /*
> >        * Validate alignment based on huge page size
> >        */
> >       err = -EINVAL;
> > -     if (dst_start & (vma_hpagesize - 1) || len & (vma_hpagesize - 1))
> > +     if (dst_start & (vma_altpagesize - 1) || len & (vma_altpagesize - 1))
> >               goto out_unlock;
> >
> >  retry:
> > @@ -361,6 +367,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> >               vm_shared = dst_vma->vm_flags & VM_SHARED;
> >       }
> >
> > +     BUG_ON(!vm_shared && use_hgm);
> > +
> >       /*
> >        * If not shared, ensure the dst_vma has a anon_vma.
> >        */
> > @@ -371,11 +379,13 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> >       }
> >
> >       while (src_addr < src_start + len) {
> > +             struct hugetlb_pte hpte;
> > +             bool new_mapping;
> >               BUG_ON(dst_addr >= dst_start + len);
> >
> >               /*
> >                * Serialize via i_mmap_rwsem and hugetlb_fault_mutex.
> > -              * i_mmap_rwsem ensures the dst_pte remains valid even
> > +              * i_mmap_rwsem ensures the hpte.ptep remains valid even
> >                * in the case of shared pmds.  fault mutex prevents
> >                * races with other faulting threads.
> >                */
> > @@ -383,27 +393,47 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> >               i_mmap_lock_read(mapping);
> >               idx = linear_page_index(dst_vma, dst_addr);
> >               hash = hugetlb_fault_mutex_hash(mapping, idx);
> > +             /* This lock will get expensive at 4K. */
> >               mutex_lock(&hugetlb_fault_mutex_table[hash]);
> >
> > -             err = -ENOMEM;
> > -             dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize);
> > -             if (!dst_pte) {
> > +             err = 0;
> > +
> > +             pte_t *ptep = huge_pte_alloc(dst_mm, dst_vma, dst_addr,
> > +                                          vma_hpagesize);
> > +             if (!ptep)
> > +                     err = -ENOMEM;
> > +             else {
> > +                     hugetlb_pte_populate(&hpte, ptep,
> > +                                     huge_page_shift(h));
> > +                     /*
> > +                      * If the hstate-level PTE is not none, then a mapping
> > +                      * was previously established.
> > +                      * The per-hpage mutex prevents double-counting.
> > +                      */
> > +                     new_mapping = hugetlb_pte_none(&hpte);
> > +                     if (use_hgm)
> > +                             err = hugetlb_alloc_largest_pte(&hpte, dst_mm, dst_vma,
> > +                                                             dst_addr,
> > +                                                             dst_start + len);
> > +             }
> > +
> > +             if (err) {
> >                       mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> >                       i_mmap_unlock_read(mapping);
> >                       goto out_unlock;
> >               }
> >
> >               if (mode != MCOPY_ATOMIC_CONTINUE &&
> > -                 !huge_pte_none_mostly(huge_ptep_get(dst_pte))) {
> > +                 !hugetlb_pte_none_mostly(&hpte)) {
> >                       err = -EEXIST;
> >                       mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> >                       i_mmap_unlock_read(mapping);
> >                       goto out_unlock;
> >               }
> >
> > -             err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma,
> > +             err = hugetlb_mcopy_atomic_pte(dst_mm, &hpte, dst_vma,
> >                                              dst_addr, src_addr, mode, &page,
> > -                                            wp_copy);
> > +                                            wp_copy, new_mapping);
> >
> >               mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> >               i_mmap_unlock_read(mapping);
> > @@ -413,6 +443,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> >               if (unlikely(err == -ENOENT)) {
> >                       mmap_read_unlock(dst_mm);
> >                       BUG_ON(!page);
> > +                     BUG_ON(hpte.shift != huge_page_shift(h));
> >
> >                       err = copy_huge_page_from_user(page,
> >                                               (const void __user *)src_addr,
> > @@ -430,9 +461,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> >                       BUG_ON(page);
> >
> >               if (!err) {
> > -                     dst_addr += vma_hpagesize;
> > -                     src_addr += vma_hpagesize;
> > -                     copied += vma_hpagesize;
> > +                     dst_addr += hugetlb_pte_size(&hpte);
> > +                     src_addr += hugetlb_pte_size(&hpte);
> > +                     copied += hugetlb_pte_size(&hpte);
> >
> >                       if (fatal_signal_pending(current))
> >                               err = -EINTR;
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >
>
> --
> Peter Xu
>

Thanks, Peter! :)

- James

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE
  2022-07-15 16:58     ` James Houghton
@ 2022-07-15 17:20       ` Peter Xu
  2022-07-20 20:58         ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: Peter Xu @ 2022-07-15 17:20 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jul 15, 2022 at 09:58:10AM -0700, James Houghton wrote:
> On Fri, Jul 15, 2022 at 9:21 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Jun 24, 2022 at 05:36:50PM +0000, James Houghton wrote:
> > > The changes here are very similar to the changes made to
> > > hugetlb_no_page, where we do a high-granularity page table walk and
> > > do accounting slightly differently because we are mapping only a piece
> > > of a page.
> > >
> > > Signed-off-by: James Houghton <jthoughton@google.com>
> > > ---
> > >  fs/userfaultfd.c        |  3 +++
> > >  include/linux/hugetlb.h |  6 +++--
> > >  mm/hugetlb.c            | 54 +++++++++++++++++++++-----------------
> > >  mm/userfaultfd.c        | 57 +++++++++++++++++++++++++++++++----------
> > >  4 files changed, 82 insertions(+), 38 deletions(-)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index e943370107d0..77c1b8a7d0b9 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -245,6 +245,9 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
> > >       if (!ptep)
> > >               goto out;
> > >
> > > +     if (hugetlb_hgm_enabled(vma))
> > > +             goto out;
> > > +
> >
> > This is weird.  It means we'll never wait for sub-page mapping enabled
> > vmas.  Why?
> >
> 
> `ret` is true in this case, so we're actually *always* waiting.

Aha!  Then I think that's another problem, sorry. :) See Below.

> 
> > Not to mention hugetlb_hgm_enabled() currently is simply VM_SHARED, so it
> > means we'll stop waiting for all shared hugetlbfs uffd page faults..
> >
> > I'd expect in the in-house postcopy tests you should see vcpu threads
> > spinning on the page faults until it's serviced.
> >
> > IMO we still need to properly wait when the pgtable doesn't have the
> > faulted address covered.  For sub-page mapping it'll probably need to walk
> > into sub-page levels.
> 
> Ok, SGTM. I'll do that for the next version. I'm not sure of the
> consequences of returning `true` here when we should be returning
> `false`.

We've put ourselves onto the wait queue, if another concurrent
UFFDIO_CONTINUE happened and pte is already installed, I think this thread
could be waiting forever on the next schedule().

The solution should be the same - walking the sub-page pgtable would work,
afaict.

[...]

> > > @@ -6239,14 +6241,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> > >        * registered, we firstly wr-protect a none pte which has no page cache
> > >        * page backing it, then access the page.
> > >        */
> > > -     if (!huge_pte_none_mostly(huge_ptep_get(dst_pte)))
> > > +     if (!hugetlb_pte_none_mostly(dst_hpte))
> > >               goto out_release_unlock;
> > >
> > > -     if (vm_shared) {
> > > -             page_dup_file_rmap(page, true);
> > > -     } else {
> > > -             ClearHPageRestoreReserve(page);
> > > -             hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
> > > +     if (new_mapping) {
> >
> > IIUC you wanted to avoid the mapcount accountings when it's the sub-page
> > that was going to be mapped.
> >
> > Is it a must we get this only from the caller?  Can we know we're doing
> > sub-page mapping already here and make a decision with e.g. dst_hpte?
> >
> > It looks weird to me to pass this explicitly from the caller, especially
> > that's when we don't really have the pgtable lock so I'm wondering about
> > possible race conditions too on having stale new_mapping values.
> 
> The only way to know what the correct value for `new_mapping` should
> be is to know if we had to change the hstate-level P*D to non-none to
> service this UFFDIO_CONTINUE request. I'll see if there is a nice way
> to do that check in `hugetlb_mcopy_atomic_pte`.
> Right now there is no

Would "new_mapping = dest_hpte->shift != huge_page_shift(hstate)" work (or
something alike)?

> race, because we synchronize on the per-hpage mutex.

Yeah not familiar with that mutex enough to tell, as long as that mutex
guarantees no pgtable update (hmm, then why we need the pgtable lock
here???) then it looks fine.

[...]

> > > @@ -335,12 +337,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> > >       copied = 0;
> > >       page = NULL;
> > >       vma_hpagesize = vma_kernel_pagesize(dst_vma);
> > > +     if (use_hgm)
> > > +             vma_altpagesize = PAGE_SIZE;
> >
> > Do we need to check the "len" to know whether we should use sub-page
> > mapping or original hpage size?  E.g. any old UFFDIO_CONTINUE code will
> > still want the old behavior I think.
> 
> I think that's a fair point; however, if we enable HGM and the address
> and len happen to be hstate-aligned

The address can, but len (note! not "end" here) cannot?

> , we basically do the same thing as
> if HGM wasn't enabled. It could be a minor performance optimization to
> do `vma_altpagesize=vma_hpagesize` in that case, but in terms of how
> the page tables are set up, the end result would be the same.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range
  2022-07-12 18:06       ` Mike Kravetz
@ 2022-07-15 21:39         ` Axel Rasmussen
  0 siblings, 0 replies; 128+ messages in thread
From: Axel Rasmussen @ 2022-07-15 21:39 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: James Houghton, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, Linux MM, LKML

On Tue, Jul 12, 2022 at 11:07 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 07/12/22 10:19, James Houghton wrote:
> > On Mon, Jul 11, 2022 at 4:41 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >
> > > On 06/24/22 17:36, James Houghton wrote:
> > > > This allows fork() to work with high-granularity mappings. The page
> > > > table structure is copied such that partially mapped regions will remain
> > > > partially mapped in the same way for the new process.
> > > >
> > > > Signed-off-by: James Houghton <jthoughton@google.com>
> > > > ---
> > > >  mm/hugetlb.c | 74 +++++++++++++++++++++++++++++++++++++++++-----------
> > > >  1 file changed, 59 insertions(+), 15 deletions(-)
> > >
> > > FYI -
> > > With https://lore.kernel.org/linux-mm/20220621235620.291305-5-mike.kravetz@oracle.com/
> > > copy_hugetlb_page_range() should never be called for shared mappings.
> > > Since HGM only works on shared mappings, code in this patch will never
> > > be executed.
> > >
> > > I have a TODO to remove shared mapping support from copy_hugetlb_page_range.
> >
> > Thanks Mike. If I understand things correctly, it seems like I don't
> > have to do anything to support fork() then; we just don't copy the
> > page table structure from the old VMA to the new one.
>
> Yes, for now.  We will not copy the page tables for shared mappings.
> When adding support for private mapping, we will need to handle the
> HGM case.
>
> >                                                       That is, as
> > opposed to having the same bits of the old VMA being mapped in the new
> > one, the new VMA will have an empty page table. This would slightly
> > change how userfaultfd's behavior on the new VMA, but that seems fine
> > to me.
>
> Right.  Since the 'mapping size information' is essentially carried in
> the page tables, it will be lost if page tables are not copied.
>
> Not sure if anyone would depend on that behavior.
>
> Axel, this may also impact minor fault processing.  Any concerns?
> Patch is sitting in Andrew's tree for next merge window.

Sorry for the slow response, just catching up a bit here. :)

If I understand correctly, let's say we have a process where some
hugetlb pages are fully mapped (pages are in page cache, page table
entries exist). Once we fork(), we in the future won't copy the page
table entries, but I assume we do setup the underlying pages for CoW
still. So I guess this means in the old process no fault would happen
if the memory was touched, but in the forked process it would generate
a minor fault?

To me that seems fine. When userspace gets a minor fault it's always
fine for it to just say "don't care, just UFFDIO_CONTINUE, no work
needed". For VM migration I don't think it's unreasonable to expect
userspace to remember whether or not the page is clean (it already
does this anyway) and whether or not a fork (without exec) had
happened. It seems to me it should work fine.

> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-07-15 16:35       ` Peter Xu
@ 2022-07-15 21:52         ` Axel Rasmussen
  2022-07-15 23:03           ` Peter Xu
  0 siblings, 1 reply; 128+ messages in thread
From: Axel Rasmussen @ 2022-07-15 21:52 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, Mike Kravetz, James Houghton,
	Muchun Song, David Hildenbrand, David Rientjes, Mina Almasry,
	Jue Wang, Manish Mishra, Linux MM, LKML

On Fri, Jul 15, 2022 at 9:35 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Jul 12, 2022 at 10:42:17AM +0100, Dr. David Alan Gilbert wrote:
> > * Mike Kravetz (mike.kravetz@oracle.com) wrote:
> > > On 06/24/22 17:36, James Houghton wrote:
> > > > After high-granularity mapping, page table entries for HugeTLB pages can
> > > > be of any size/type. (For example, we can have a 1G page mapped with a
> > > > mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> > > > PTE after we have done a page table walk.
> > >
> > > This has been rolling around in my head.
> > >
> > > Will this first use case (live migration) actually make use of this
> > > 'mixed mapping' model where hugetlb pages could be mapped at the PUD,
> > > PMD and PTE level all within the same vma?  I only understand the use
> > > case from a high level.  But, it seems that we would want to only want
> > > to migrate PTE (or PMD) sized pages and not necessarily a mix.
> >
> > I suspect we would pick one size and use that size for all transfers
> > when in postcopy; not sure if there are any side cases though.

Sorry for chiming in late. At least from my perspective being able to
do multiple sizes is a nice to have optmization.

As talked about above, imagine a guest VM backed by 1G hugetlb pages.
We're going along doing demand paging at 4K; because we want each
request to complete as quickly as possible, we want very small
granularity.

Guest access in terms of "physical" memory address is basically
random. So, actually filling in all 262k 4K PTEs making up a
contiguous 1G region might take quite some time. Once we've completed
any of the various 2M contiguous regions, it would be nice to go ahead
and collapse those right away. The benefit is, the guest will see some
performance benefit from the 2G page already, without having to wait
for the full 1G page to complete. Once we do complete a 1G page, it
would be nice to collapse that one level further. If we do this, the
whole guest memory will be a mix of 1G, 2M, and 4K.

>
> Yes, I'm also curious whether the series can be much simplified if we have
> a static way to do sub-page mappings, e.g., when sub-page mapping enabled
> we always map to PAGE_SIZE only; if not we keep the old hpage size mappings
> only.
>
> > > Looking to the future when supporting memory error handling/page poisoning
> > > it seems like we would certainly want multiple size mappings.
>
> If we treat page poisoning as very rare events anyway, IMHO it'll even be
> acceptable if we always split 1G pages into 4K ones but only rule out the
> real poisoned 4K phys page.  After all IIUC the major goal is for reducing
> poisoned memory footprint.
>
> It'll be definitely nicer if we can keep 511 2M pages and 511 4K pages in
> that case so the 511 2M pages performs slightly better, but it'll be
> something extra to me.  It can always be something worked upon a simpler
> version of sub-page mapping which is only PAGE_SIZE based.
>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-07-15 21:52         ` Axel Rasmussen
@ 2022-07-15 23:03           ` Peter Xu
  0 siblings, 0 replies; 128+ messages in thread
From: Peter Xu @ 2022-07-15 23:03 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Dr. David Alan Gilbert, Mike Kravetz, James Houghton,
	Muchun Song, David Hildenbrand, David Rientjes, Mina Almasry,
	Jue Wang, Manish Mishra, Linux MM, LKML

On Fri, Jul 15, 2022 at 02:52:27PM -0700, Axel Rasmussen wrote:
> Guest access in terms of "physical" memory address is basically
> random. So, actually filling in all 262k 4K PTEs making up a
> contiguous 1G region might take quite some time. Once we've completed
> any of the various 2M contiguous regions, it would be nice to go ahead
> and collapse those right away. The benefit is, the guest will see some
> performance benefit from the 2G page already, without having to wait
> for the full 1G page to complete. Once we do complete a 1G page, it
> would be nice to collapse that one level further. If we do this, the
> whole guest memory will be a mix of 1G, 2M, and 4K.

Just to mention that we've got quite some other things that drags perf down
much more than tlb hits on page sizes during any VM migration process.

For example, when we split & wr-protect pages during the starting phase of
migration on src host, it's not about 10% or 20% drop but much drastic.  In
the postcopy case it's for dest but still it's part of the whole migration
process and probably guest-aware too.  If the guest wants, it can simply
start writting some pages continuously and it'll see obvious drag downs any
time during migration I bet.

It'll always be nice to have multi-level sub-mappings and I fully agree.
IMHO it's a matter of whether keeping 4k-only would greatly simplify the
work, especially on the rework of hugetlb sub-mage aware pgtable ops.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings
  2022-06-24 17:36 ` [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
@ 2022-07-19 10:19   ` manish.mishra
  2022-07-19 15:58     ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-07-19 10:19 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This enlightens __unmap_hugepage_range to deal with high-granularity
> mappings. This doesn't change its API; it still must be called with
> hugepage alignment, but it will correctly unmap hugepages that have been
> mapped at high granularity.
>
> Analogous to the mapcount rules introduced by hugetlb_no_page, we only
> drop mapcount in this case if we are unmapping an entire hugepage in one
> operation. This is the case when a VMA is destroyed.
>
> Eventually, functionality here can be expanded to allow users to call
> MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is
> not done here.

Sorry i may have misunderstood something here, but allowing something like

MADV_DONTNEED on PAGE_SIZE in hugetlbfs can cause fragmentation

in hugetlbfs pool which kind of looks opposite of prupose of hugetlbfs?

>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   include/asm-generic/tlb.h |  6 +--
>   mm/hugetlb.c              | 85 ++++++++++++++++++++++++++-------------
>   2 files changed, 59 insertions(+), 32 deletions(-)
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index ff3e82553a76..8daa3ae460d9 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -562,9 +562,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb,
>   		__tlb_remove_tlb_entry(tlb, ptep, address);	\
>   	} while (0)
>   
> -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
> +#define tlb_remove_huge_tlb_entry(tlb, hpte, address)	\
>   	do {							\
> -		unsigned long _sz = huge_page_size(h);		\
> +		unsigned long _sz = hugetlb_pte_size(&hpte);	\
>   		if (_sz >= P4D_SIZE)				\
>   			tlb_flush_p4d_range(tlb, address, _sz);	\
>   		else if (_sz >= PUD_SIZE)			\
> @@ -573,7 +573,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb,
>   			tlb_flush_pmd_range(tlb, address, _sz);	\
>   		else						\
>   			tlb_flush_pte_range(tlb, address, _sz);	\
> -		__tlb_remove_tlb_entry(tlb, ptep, address);	\
> +		__tlb_remove_tlb_entry(tlb, hpte.ptep, address);\
>   	} while (0)
>   
>   /**
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index da30621656b8..51fc1d3f122f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5120,24 +5120,20 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
>   {
>   	struct mm_struct *mm = vma->vm_mm;
>   	unsigned long address;
> -	pte_t *ptep;
> +	struct hugetlb_pte hpte;
>   	pte_t pte;
>   	spinlock_t *ptl;
> -	struct page *page;
> +	struct page *hpage, *subpage;
>   	struct hstate *h = hstate_vma(vma);
>   	unsigned long sz = huge_page_size(h);
>   	struct mmu_notifier_range range;
>   	bool force_flush = false;
> +	bool hgm_enabled = hugetlb_hgm_enabled(vma);
>   
>   	WARN_ON(!is_vm_hugetlb_page(vma));
>   	BUG_ON(start & ~huge_page_mask(h));
>   	BUG_ON(end & ~huge_page_mask(h));
>   
> -	/*
> -	 * This is a hugetlb vma, all the pte entries should point
> -	 * to huge page.
> -	 */
> -	tlb_change_page_size(tlb, sz);
>   	tlb_start_vma(tlb, vma);
>   
>   	/*
> @@ -5148,25 +5144,43 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
>   	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
>   	mmu_notifier_invalidate_range_start(&range);
>   	address = start;
> -	for (; address < end; address += sz) {
> -		ptep = huge_pte_offset(mm, address, sz);
> -		if (!ptep)
> +
> +	while (address < end) {
> +		pte_t *ptep = huge_pte_offset(mm, address, sz);
> +
> +		if (!ptep) {
> +			address += sz;
>   			continue;
> +		}
> +		hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h));
> +		if (hgm_enabled) {
> +			int ret = huge_pte_alloc_high_granularity(
> +					&hpte, mm, vma, address, PAGE_SHIFT,
> +					HUGETLB_SPLIT_NEVER,
> +					/*write_locked=*/true);

I see huge_pte_alloc_high_granularity with HUGETLB_SPLIT_NEVER just

do huge_tlb_walk. So is HUGETLB_SPLIT_NEVER even required, i mean

for those cases you can directly do huge_tlb_walk? I mean name

huge_pte_alloc_high_granularity confuses for those cases.

> +			/*
> +			 * We will never split anything, so this should always
> +			 * succeed.
> +			 */
> +			BUG_ON(ret);
> +		}
>   
> -		ptl = huge_pte_lock(h, mm, ptep);
> -		if (huge_pmd_unshare(mm, vma, &address, ptep)) {
> +		ptl = hugetlb_pte_lock(mm, &hpte);
> +		if (!hgm_enabled && huge_pmd_unshare(
> +					mm, vma, &address, hpte.ptep)) {
>   			spin_unlock(ptl);
>   			tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
>   			force_flush = true;
> -			continue;
> +			goto next_hpte;
>   		}
>   
> -		pte = huge_ptep_get(ptep);
> -		if (huge_pte_none(pte)) {
> +		if (hugetlb_pte_none(&hpte)) {
>   			spin_unlock(ptl);
> -			continue;
> +			goto next_hpte;
>   		}
>   
> +		pte = hugetlb_ptep_get(&hpte);
> +
>   		/*
>   		 * Migrating hugepage or HWPoisoned hugepage is already
>   		 * unmapped and its refcount is dropped, so just clear pte here.
> @@ -5180,24 +5194,27 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
>   			 */
>   			if (pte_swp_uffd_wp_any(pte) &&
>   			    !(zap_flags & ZAP_FLAG_DROP_MARKER))
> -				set_huge_pte_at(mm, address, ptep,
> +				set_huge_pte_at(mm, address, hpte.ptep,
>   						make_pte_marker(PTE_MARKER_UFFD_WP));
>   			else
> -				huge_pte_clear(mm, address, ptep, sz);
> +				huge_pte_clear(mm, address, hpte.ptep,
> +						hugetlb_pte_size(&hpte));
>   			spin_unlock(ptl);
> -			continue;
> +			goto next_hpte;
>   		}
>   
> -		page = pte_page(pte);
> +		subpage = pte_page(pte);
> +		BUG_ON(!subpage);
> +		hpage = compound_head(subpage);
>   		/*
>   		 * If a reference page is supplied, it is because a specific
>   		 * page is being unmapped, not a range. Ensure the page we
>   		 * are about to unmap is the actual page of interest.
>   		 */
>   		if (ref_page) {
> -			if (page != ref_page) {
> +			if (hpage != ref_page) {
>   				spin_unlock(ptl);
> -				continue;
> +				goto next_hpte;
>   			}
>   			/*
>   			 * Mark the VMA as having unmapped its page so that
> @@ -5207,25 +5224,35 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
>   			set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
>   		}
>   
> -		pte = huge_ptep_get_and_clear(mm, address, ptep);
> -		tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
> +		pte = huge_ptep_get_and_clear(mm, address, hpte.ptep);
> +		tlb_change_page_size(tlb, hugetlb_pte_size(&hpte));
> +		tlb_remove_huge_tlb_entry(tlb, hpte, address);
>   		if (huge_pte_dirty(pte))
> -			set_page_dirty(page);
> +			set_page_dirty(hpage);
>   		/* Leave a uffd-wp pte marker if needed */
>   		if (huge_pte_uffd_wp(pte) &&
>   		    !(zap_flags & ZAP_FLAG_DROP_MARKER))
> -			set_huge_pte_at(mm, address, ptep,
> +			set_huge_pte_at(mm, address, hpte.ptep,
>   					make_pte_marker(PTE_MARKER_UFFD_WP));
> -		hugetlb_count_sub(pages_per_huge_page(h), mm);
> -		page_remove_rmap(page, vma, true);
> +
> +		hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm);
> +
> +		/*
> +		 * If we are unmapping the entire page, remove it from the
> +		 * rmap.
> +		 */
> +		if (IS_ALIGNED(address, sz) && address + sz <= end)
> +			page_remove_rmap(hpage, vma, true);
>   
>   		spin_unlock(ptl);
> -		tlb_remove_page_size(tlb, page, huge_page_size(h));
> +		tlb_remove_page_size(tlb, subpage, hugetlb_pte_size(&hpte));
>   		/*
>   		 * Bail out after unmapping reference page if supplied
>   		 */
>   		if (ref_page)
>   			break;
> +next_hpte:
> +		address += hugetlb_pte_size(&hpte);
>   	}
>   	mmu_notifier_invalidate_range_end(&range);
>   	tlb_end_vma(tlb, vma);

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM
  2022-06-24 17:36 ` [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM James Houghton
@ 2022-07-19 10:48   ` manish.mishra
  2022-07-19 16:19     ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: manish.mishra @ 2022-07-19 10:48 UTC (permalink / raw)
  To: James Houghton, Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Jue Wang, Dr . David Alan Gilbert, linux-mm, linux-kernel


On 24/06/22 11:06 pm, James Houghton wrote:
> This enables support for GUP, and it is needed for the KVM demand paging
> self-test to work.
>
> One important change here is that, before, we never needed to grab the
> i_mmap_sem, but now, to prevent someone from collapsing the page tables
> out from under us, we grab it for reading when doing high-granularity PT
> walks.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>   mm/hugetlb.c | 70 ++++++++++++++++++++++++++++++++++++++++++----------
>   1 file changed, 57 insertions(+), 13 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f9c7daa6c090..aadfcee947cf 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6298,14 +6298,18 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   	unsigned long vaddr = *position;
>   	unsigned long remainder = *nr_pages;
>   	struct hstate *h = hstate_vma(vma);
> +	struct address_space *mapping = vma->vm_file->f_mapping;
>   	int err = -EFAULT, refs;
> +	bool has_i_mmap_sem = false;
>   
>   	while (vaddr < vma->vm_end && remainder) {
>   		pte_t *pte;
>   		spinlock_t *ptl = NULL;
>   		bool unshare = false;
>   		int absent;
> +		unsigned long pages_per_hpte;
>   		struct page *page;
> +		struct hugetlb_pte hpte;
>   
>   		/*
>   		 * If we have a pending SIGKILL, don't keep faulting pages and
> @@ -6325,9 +6329,23 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   		 */
>   		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h),
>   				      huge_page_size(h));
> -		if (pte)
> -			ptl = huge_pte_lock(h, mm, pte);
> -		absent = !pte || huge_pte_none(huge_ptep_get(pte));
> +		if (pte) {
> +			hugetlb_pte_populate(&hpte, pte, huge_page_shift(h));
> +			if (hugetlb_hgm_enabled(vma)) {
> +				BUG_ON(has_i_mmap_sem);

Just thinking can we do without i_mmap_lock_read in most cases. Like earlier

this function was good without i_mmap_lock_read doing almost everything

which is happening now?

> +				i_mmap_lock_read(mapping);
> +				/*
> +				 * Need to hold the mapping semaphore for
> +				 * reading to do a HGM walk.
> +				 */
> +				has_i_mmap_sem = true;
> +				hugetlb_walk_to(mm, &hpte, vaddr, PAGE_SIZE,
> +						/*stop_at_none=*/true);
> +			}
> +			ptl = hugetlb_pte_lock(mm, &hpte);
> +		}
> +
> +		absent = !pte || hugetlb_pte_none(&hpte);
>   
>   		/*
>   		 * When coredumping, it suits get_dump_page if we just return
> @@ -6338,8 +6356,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   		 */
>   		if (absent && (flags & FOLL_DUMP) &&
>   		    !hugetlbfs_pagecache_present(h, vma, vaddr)) {
> -			if (pte)
> +			if (pte) {
> +				if (has_i_mmap_sem) {
> +					i_mmap_unlock_read(mapping);
> +					has_i_mmap_sem = false;
> +				}
>   				spin_unlock(ptl);
> +			}
>   			remainder = 0;
>   			break;
>   		}
> @@ -6359,8 +6382,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   			vm_fault_t ret;
>   			unsigned int fault_flags = 0;
>   
> -			if (pte)
> +			if (pte) {
> +				if (has_i_mmap_sem) {
> +					i_mmap_unlock_read(mapping);
> +					has_i_mmap_sem = false;
> +				}
>   				spin_unlock(ptl);
> +			}
>   			if (flags & FOLL_WRITE)
>   				fault_flags |= FAULT_FLAG_WRITE;
>   			else if (unshare)
> @@ -6403,8 +6431,11 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   			continue;
>   		}
>   
> -		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
> -		page = pte_page(huge_ptep_get(pte));
> +		pfn_offset = (vaddr & ~hugetlb_pte_mask(&hpte)) >> PAGE_SHIFT;
> +		page = pte_page(hugetlb_ptep_get(&hpte));
> +		pages_per_hpte = hugetlb_pte_size(&hpte) / PAGE_SIZE;
> +		if (hugetlb_hgm_enabled(vma))
> +			page = compound_head(page);
>   
>   		VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
>   			       !PageAnonExclusive(page), page);
> @@ -6414,17 +6445,21 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   		 * and skip the same_page loop below.
>   		 */
>   		if (!pages && !vmas && !pfn_offset &&
> -		    (vaddr + huge_page_size(h) < vma->vm_end) &&
> -		    (remainder >= pages_per_huge_page(h))) {
> -			vaddr += huge_page_size(h);
> -			remainder -= pages_per_huge_page(h);
> -			i += pages_per_huge_page(h);
> +		    (vaddr + pages_per_hpte < vma->vm_end) &&
> +		    (remainder >= pages_per_hpte)) {
> +			vaddr += pages_per_hpte;
> +			remainder -= pages_per_hpte;
> +			i += pages_per_hpte;
>   			spin_unlock(ptl);
> +			if (has_i_mmap_sem) {
> +				has_i_mmap_sem = false;
> +				i_mmap_unlock_read(mapping);
> +			}
>   			continue;
>   		}
>   
>   		/* vaddr may not be aligned to PAGE_SIZE */
> -		refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
> +		refs = min3(pages_per_hpte - pfn_offset, remainder,
>   		    (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
>   
>   		if (pages || vmas)
> @@ -6447,6 +6482,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   			if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
>   							 flags))) {
>   				spin_unlock(ptl);
> +				if (has_i_mmap_sem) {
> +					has_i_mmap_sem = false;
> +					i_mmap_unlock_read(mapping);
> +				}
>   				remainder = 0;
>   				err = -ENOMEM;
>   				break;
> @@ -6458,8 +6497,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   		i += refs;
>   
>   		spin_unlock(ptl);
> +		if (has_i_mmap_sem) {
> +			has_i_mmap_sem = false;
> +			i_mmap_unlock_read(mapping);
> +		}
>   	}
>   	*nr_pages = remainder;
> +	BUG_ON(has_i_mmap_sem);
>   	/*
>   	 * setting position is actually required only if remainder is
>   	 * not zero but it's faster not to add a "if (remainder)"

Thanks

Manish Mishra


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings
  2022-07-19 10:19   ` manish.mishra
@ 2022-07-19 15:58     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-07-19 15:58 UTC (permalink / raw)
  To: manish.mishra
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Tue, Jul 19, 2022 at 3:20 AM manish.mishra <manish.mishra@nutanix.com> wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
> > This enlightens __unmap_hugepage_range to deal with high-granularity
> > mappings. This doesn't change its API; it still must be called with
> > hugepage alignment, but it will correctly unmap hugepages that have been
> > mapped at high granularity.
> >
> > Analogous to the mapcount rules introduced by hugetlb_no_page, we only
> > drop mapcount in this case if we are unmapping an entire hugepage in one
> > operation. This is the case when a VMA is destroyed.
> >
> > Eventually, functionality here can be expanded to allow users to call
> > MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is
> > not done here.
>
> Sorry i may have misunderstood something here, but allowing something like
>
> MADV_DONTNEED on PAGE_SIZE in hugetlbfs can cause fragmentation
>
> in hugetlbfs pool which kind of looks opposite of prupose of hugetlbfs?

It can be helpful for some applications, like if we want to get page
fault notifications through userfaultfd on a 4K piece of a hugepage.
It kind of goes against the purpose of HugeTLB, but we sort of get
this functionality automatically with this patch.

>
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >   include/asm-generic/tlb.h |  6 +--
> >   mm/hugetlb.c              | 85 ++++++++++++++++++++++++++-------------
> >   2 files changed, 59 insertions(+), 32 deletions(-)
> >
> > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> > index ff3e82553a76..8daa3ae460d9 100644
> > --- a/include/asm-generic/tlb.h
> > +++ b/include/asm-generic/tlb.h
> > @@ -562,9 +562,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb,
> >               __tlb_remove_tlb_entry(tlb, ptep, address);     \
> >       } while (0)
> >
> > -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)     \
> > +#define tlb_remove_huge_tlb_entry(tlb, hpte, address)        \
> >       do {                                                    \
> > -             unsigned long _sz = huge_page_size(h);          \
> > +             unsigned long _sz = hugetlb_pte_size(&hpte);    \
> >               if (_sz >= P4D_SIZE)                            \
> >                       tlb_flush_p4d_range(tlb, address, _sz); \
> >               else if (_sz >= PUD_SIZE)                       \
> > @@ -573,7 +573,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb,
> >                       tlb_flush_pmd_range(tlb, address, _sz); \
> >               else                                            \
> >                       tlb_flush_pte_range(tlb, address, _sz); \
> > -             __tlb_remove_tlb_entry(tlb, ptep, address);     \
> > +             __tlb_remove_tlb_entry(tlb, hpte.ptep, address);\
> >       } while (0)
> >
> >   /**
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index da30621656b8..51fc1d3f122f 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -5120,24 +5120,20 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
> >   {
> >       struct mm_struct *mm = vma->vm_mm;
> >       unsigned long address;
> > -     pte_t *ptep;
> > +     struct hugetlb_pte hpte;
> >       pte_t pte;
> >       spinlock_t *ptl;
> > -     struct page *page;
> > +     struct page *hpage, *subpage;
> >       struct hstate *h = hstate_vma(vma);
> >       unsigned long sz = huge_page_size(h);
> >       struct mmu_notifier_range range;
> >       bool force_flush = false;
> > +     bool hgm_enabled = hugetlb_hgm_enabled(vma);
> >
> >       WARN_ON(!is_vm_hugetlb_page(vma));
> >       BUG_ON(start & ~huge_page_mask(h));
> >       BUG_ON(end & ~huge_page_mask(h));
> >
> > -     /*
> > -      * This is a hugetlb vma, all the pte entries should point
> > -      * to huge page.
> > -      */
> > -     tlb_change_page_size(tlb, sz);
> >       tlb_start_vma(tlb, vma);
> >
> >       /*
> > @@ -5148,25 +5144,43 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
> >       adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
> >       mmu_notifier_invalidate_range_start(&range);
> >       address = start;
> > -     for (; address < end; address += sz) {
> > -             ptep = huge_pte_offset(mm, address, sz);
> > -             if (!ptep)
> > +
> > +     while (address < end) {
> > +             pte_t *ptep = huge_pte_offset(mm, address, sz);
> > +
> > +             if (!ptep) {
> > +                     address += sz;
> >                       continue;
> > +             }
> > +             hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h));
> > +             if (hgm_enabled) {
> > +                     int ret = huge_pte_alloc_high_granularity(
> > +                                     &hpte, mm, vma, address, PAGE_SHIFT,
> > +                                     HUGETLB_SPLIT_NEVER,
> > +                                     /*write_locked=*/true);
>
> I see huge_pte_alloc_high_granularity with HUGETLB_SPLIT_NEVER just
>
> do huge_tlb_walk. So is HUGETLB_SPLIT_NEVER even required, i mean
>
> for those cases you can directly do huge_tlb_walk? I mean name
>
> huge_pte_alloc_high_granularity confuses for those cases.

Agreed. huge_pte_alloc_high_granularity with HUGETLB_SPLIT_NEVER is
pretty much the same as hugetlb_walk_to (+hugetlb_pte_init). It is
confusing to have two ways of doing the exact same thing, so I'll get
rid of HUGETLB_SPLIT_NEVER (and the "alloc" name is confusing in this
case too, yeah).

>
> > +                     /*
> > +                      * We will never split anything, so this should always
> > +                      * succeed.
> > +                      */
> > +                     BUG_ON(ret);
> > +             }
> >
> > -             ptl = huge_pte_lock(h, mm, ptep);
> > -             if (huge_pmd_unshare(mm, vma, &address, ptep)) {
> > +             ptl = hugetlb_pte_lock(mm, &hpte);
> > +             if (!hgm_enabled && huge_pmd_unshare(
> > +                                     mm, vma, &address, hpte.ptep)) {
> >                       spin_unlock(ptl);
> >                       tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
> >                       force_flush = true;
> > -                     continue;
> > +                     goto next_hpte;
> >               }
> >
> > -             pte = huge_ptep_get(ptep);
> > -             if (huge_pte_none(pte)) {
> > +             if (hugetlb_pte_none(&hpte)) {
> >                       spin_unlock(ptl);
> > -                     continue;
> > +                     goto next_hpte;
> >               }
> >
> > +             pte = hugetlb_ptep_get(&hpte);
> > +
> >               /*
> >                * Migrating hugepage or HWPoisoned hugepage is already
> >                * unmapped and its refcount is dropped, so just clear pte here.
> > @@ -5180,24 +5194,27 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
> >                        */
> >                       if (pte_swp_uffd_wp_any(pte) &&
> >                           !(zap_flags & ZAP_FLAG_DROP_MARKER))
> > -                             set_huge_pte_at(mm, address, ptep,
> > +                             set_huge_pte_at(mm, address, hpte.ptep,
> >                                               make_pte_marker(PTE_MARKER_UFFD_WP));
> >                       else
> > -                             huge_pte_clear(mm, address, ptep, sz);
> > +                             huge_pte_clear(mm, address, hpte.ptep,
> > +                                             hugetlb_pte_size(&hpte));
> >                       spin_unlock(ptl);
> > -                     continue;
> > +                     goto next_hpte;
> >               }
> >
> > -             page = pte_page(pte);
> > +             subpage = pte_page(pte);
> > +             BUG_ON(!subpage);
> > +             hpage = compound_head(subpage);
> >               /*
> >                * If a reference page is supplied, it is because a specific
> >                * page is being unmapped, not a range. Ensure the page we
> >                * are about to unmap is the actual page of interest.
> >                */
> >               if (ref_page) {
> > -                     if (page != ref_page) {
> > +                     if (hpage != ref_page) {
> >                               spin_unlock(ptl);
> > -                             continue;
> > +                             goto next_hpte;
> >                       }
> >                       /*
> >                        * Mark the VMA as having unmapped its page so that
> > @@ -5207,25 +5224,35 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
> >                       set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
> >               }
> >
> > -             pte = huge_ptep_get_and_clear(mm, address, ptep);
> > -             tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
> > +             pte = huge_ptep_get_and_clear(mm, address, hpte.ptep);
> > +             tlb_change_page_size(tlb, hugetlb_pte_size(&hpte));
> > +             tlb_remove_huge_tlb_entry(tlb, hpte, address);
> >               if (huge_pte_dirty(pte))
> > -                     set_page_dirty(page);
> > +                     set_page_dirty(hpage);
> >               /* Leave a uffd-wp pte marker if needed */
> >               if (huge_pte_uffd_wp(pte) &&
> >                   !(zap_flags & ZAP_FLAG_DROP_MARKER))
> > -                     set_huge_pte_at(mm, address, ptep,
> > +                     set_huge_pte_at(mm, address, hpte.ptep,
> >                                       make_pte_marker(PTE_MARKER_UFFD_WP));
> > -             hugetlb_count_sub(pages_per_huge_page(h), mm);
> > -             page_remove_rmap(page, vma, true);
> > +
> > +             hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm);
> > +
> > +             /*
> > +              * If we are unmapping the entire page, remove it from the
> > +              * rmap.
> > +              */
> > +             if (IS_ALIGNED(address, sz) && address + sz <= end)
> > +                     page_remove_rmap(hpage, vma, true);
> >
> >               spin_unlock(ptl);
> > -             tlb_remove_page_size(tlb, page, huge_page_size(h));
> > +             tlb_remove_page_size(tlb, subpage, hugetlb_pte_size(&hpte));
> >               /*
> >                * Bail out after unmapping reference page if supplied
> >                */
> >               if (ref_page)
> >                       break;
> > +next_hpte:
> > +             address += hugetlb_pte_size(&hpte);
> >       }
> >       mmu_notifier_invalidate_range_end(&range);
> >       tlb_end_vma(tlb, vma);

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM
  2022-07-19 10:48   ` manish.mishra
@ 2022-07-19 16:19     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-07-19 16:19 UTC (permalink / raw)
  To: manish.mishra
  Cc: Mike Kravetz, Muchun Song, Peter Xu, David Hildenbrand,
	David Rientjes, Axel Rasmussen, Mina Almasry, Jue Wang,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Tue, Jul 19, 2022 at 3:48 AM manish.mishra <manish.mishra@nutanix.com> wrote:
>
>
> On 24/06/22 11:06 pm, James Houghton wrote:
> > This enables support for GUP, and it is needed for the KVM demand paging
> > self-test to work.
> >
> > One important change here is that, before, we never needed to grab the
> > i_mmap_sem, but now, to prevent someone from collapsing the page tables
> > out from under us, we grab it for reading when doing high-granularity PT
> > walks.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> >   mm/hugetlb.c | 70 ++++++++++++++++++++++++++++++++++++++++++----------
> >   1 file changed, 57 insertions(+), 13 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index f9c7daa6c090..aadfcee947cf 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6298,14 +6298,18 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >       unsigned long vaddr = *position;
> >       unsigned long remainder = *nr_pages;
> >       struct hstate *h = hstate_vma(vma);
> > +     struct address_space *mapping = vma->vm_file->f_mapping;
> >       int err = -EFAULT, refs;
> > +     bool has_i_mmap_sem = false;
> >
> >       while (vaddr < vma->vm_end && remainder) {
> >               pte_t *pte;
> >               spinlock_t *ptl = NULL;
> >               bool unshare = false;
> >               int absent;
> > +             unsigned long pages_per_hpte;
> >               struct page *page;
> > +             struct hugetlb_pte hpte;
> >
> >               /*
> >                * If we have a pending SIGKILL, don't keep faulting pages and
> > @@ -6325,9 +6329,23 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >                */
> >               pte = huge_pte_offset(mm, vaddr & huge_page_mask(h),
> >                                     huge_page_size(h));
> > -             if (pte)
> > -                     ptl = huge_pte_lock(h, mm, pte);
> > -             absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > +             if (pte) {
> > +                     hugetlb_pte_populate(&hpte, pte, huge_page_shift(h));
> > +                     if (hugetlb_hgm_enabled(vma)) {
> > +                             BUG_ON(has_i_mmap_sem);
>
> Just thinking can we do without i_mmap_lock_read in most cases. Like earlier
>
> this function was good without i_mmap_lock_read doing almost everything
>
> which is happening now?

We need something to prevent the page tables from being rearranged
while we're walking them. In this RFC, I used the i_mmap_lock. I'm
going to change it, probably to a per-VMA lock (or maybe a per-hpage
lock. I'm trying to figure out if a system with PTLs/hugetlb_pte_lock
could work too :)).

>
> > +                             i_mmap_lock_read(mapping);
> > +                             /*
> > +                              * Need to hold the mapping semaphore for
> > +                              * reading to do a HGM walk.
> > +                              */
> > +                             has_i_mmap_sem = true;
> > +                             hugetlb_walk_to(mm, &hpte, vaddr, PAGE_SIZE,
> > +                                             /*stop_at_none=*/true);
> > +                     }
> > +                     ptl = hugetlb_pte_lock(mm, &hpte);
> > +             }
> > +
> > +             absent = !pte || hugetlb_pte_none(&hpte);
> >
> >               /*
> >                * When coredumping, it suits get_dump_page if we just return
> > @@ -6338,8 +6356,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >                */
> >               if (absent && (flags & FOLL_DUMP) &&
> >                   !hugetlbfs_pagecache_present(h, vma, vaddr)) {
> > -                     if (pte)
> > +                     if (pte) {
> > +                             if (has_i_mmap_sem) {
> > +                                     i_mmap_unlock_read(mapping);
> > +                                     has_i_mmap_sem = false;
> > +                             }
> >                               spin_unlock(ptl);
> > +                     }
> >                       remainder = 0;
> >                       break;
> >               }
> > @@ -6359,8 +6382,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >                       vm_fault_t ret;
> >                       unsigned int fault_flags = 0;
> >
> > -                     if (pte)
> > +                     if (pte) {
> > +                             if (has_i_mmap_sem) {
> > +                                     i_mmap_unlock_read(mapping);
> > +                                     has_i_mmap_sem = false;
> > +                             }
> >                               spin_unlock(ptl);
> > +                     }
> >                       if (flags & FOLL_WRITE)
> >                               fault_flags |= FAULT_FLAG_WRITE;
> >                       else if (unshare)
> > @@ -6403,8 +6431,11 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >                       continue;
> >               }
> >
> > -             pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
> > -             page = pte_page(huge_ptep_get(pte));
> > +             pfn_offset = (vaddr & ~hugetlb_pte_mask(&hpte)) >> PAGE_SHIFT;
> > +             page = pte_page(hugetlb_ptep_get(&hpte));
> > +             pages_per_hpte = hugetlb_pte_size(&hpte) / PAGE_SIZE;
> > +             if (hugetlb_hgm_enabled(vma))
> > +                     page = compound_head(page);
> >
> >               VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
> >                              !PageAnonExclusive(page), page);
> > @@ -6414,17 +6445,21 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >                * and skip the same_page loop below.
> >                */
> >               if (!pages && !vmas && !pfn_offset &&
> > -                 (vaddr + huge_page_size(h) < vma->vm_end) &&
> > -                 (remainder >= pages_per_huge_page(h))) {
> > -                     vaddr += huge_page_size(h);
> > -                     remainder -= pages_per_huge_page(h);
> > -                     i += pages_per_huge_page(h);
> > +                 (vaddr + pages_per_hpte < vma->vm_end) &&
> > +                 (remainder >= pages_per_hpte)) {
> > +                     vaddr += pages_per_hpte;
> > +                     remainder -= pages_per_hpte;
> > +                     i += pages_per_hpte;
> >                       spin_unlock(ptl);
> > +                     if (has_i_mmap_sem) {
> > +                             has_i_mmap_sem = false;
> > +                             i_mmap_unlock_read(mapping);
> > +                     }
> >                       continue;
> >               }
> >
> >               /* vaddr may not be aligned to PAGE_SIZE */
> > -             refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
> > +             refs = min3(pages_per_hpte - pfn_offset, remainder,
> >                   (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
> >
> >               if (pages || vmas)
> > @@ -6447,6 +6482,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >                       if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
> >                                                        flags))) {
> >                               spin_unlock(ptl);
> > +                             if (has_i_mmap_sem) {
> > +                                     has_i_mmap_sem = false;
> > +                                     i_mmap_unlock_read(mapping);
> > +                             }
> >                               remainder = 0;
> >                               err = -ENOMEM;
> >                               break;
> > @@ -6458,8 +6497,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >               i += refs;
> >
> >               spin_unlock(ptl);
> > +             if (has_i_mmap_sem) {
> > +                     has_i_mmap_sem = false;
> > +                     i_mmap_unlock_read(mapping);
> > +             }
> >       }
> >       *nr_pages = remainder;
> > +     BUG_ON(has_i_mmap_sem);
> >       /*
> >        * setting position is actually required only if remainder is
> >        * not zero but it's faster not to add a "if (remainder)"
>
> Thanks
>
> Manish Mishra
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE
  2022-07-15 17:20       ` Peter Xu
@ 2022-07-20 20:58         ` James Houghton
  2022-07-21 19:09           ` Peter Xu
  0 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-07-20 20:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jul 15, 2022 at 10:20 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Jul 15, 2022 at 09:58:10AM -0700, James Houghton wrote:
> > On Fri, Jul 15, 2022 at 9:21 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Jun 24, 2022 at 05:36:50PM +0000, James Houghton wrote:
> > > > The changes here are very similar to the changes made to
> > > > hugetlb_no_page, where we do a high-granularity page table walk and
> > > > do accounting slightly differently because we are mapping only a piece
> > > > of a page.
> > > >
> > > > Signed-off-by: James Houghton <jthoughton@google.com>
> > > > ---
> > > >  fs/userfaultfd.c        |  3 +++
> > > >  include/linux/hugetlb.h |  6 +++--
> > > >  mm/hugetlb.c            | 54 +++++++++++++++++++++-----------------
> > > >  mm/userfaultfd.c        | 57 +++++++++++++++++++++++++++++++----------
> > > >  4 files changed, 82 insertions(+), 38 deletions(-)
> > > >
> > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > > index e943370107d0..77c1b8a7d0b9 100644
> > > > --- a/fs/userfaultfd.c
> > > > +++ b/fs/userfaultfd.c
> > > > @@ -245,6 +245,9 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
> > > >       if (!ptep)
> > > >               goto out;
> > > >
> > > > +     if (hugetlb_hgm_enabled(vma))
> > > > +             goto out;
> > > > +
> > >
> > > This is weird.  It means we'll never wait for sub-page mapping enabled
> > > vmas.  Why?
> > >
> >
> > `ret` is true in this case, so we're actually *always* waiting.
>
> Aha!  Then I think that's another problem, sorry. :) See Below.
>
> >
> > > Not to mention hugetlb_hgm_enabled() currently is simply VM_SHARED, so it
> > > means we'll stop waiting for all shared hugetlbfs uffd page faults..
> > >
> > > I'd expect in the in-house postcopy tests you should see vcpu threads
> > > spinning on the page faults until it's serviced.
> > >
> > > IMO we still need to properly wait when the pgtable doesn't have the
> > > faulted address covered.  For sub-page mapping it'll probably need to walk
> > > into sub-page levels.
> >
> > Ok, SGTM. I'll do that for the next version. I'm not sure of the
> > consequences of returning `true` here when we should be returning
> > `false`.
>
> We've put ourselves onto the wait queue, if another concurrent
> UFFDIO_CONTINUE happened and pte is already installed, I think this thread
> could be waiting forever on the next schedule().
>
> The solution should be the same - walking the sub-page pgtable would work,
> afaict.
>
> [...]
>
> > > > @@ -6239,14 +6241,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> > > >        * registered, we firstly wr-protect a none pte which has no page cache
> > > >        * page backing it, then access the page.
> > > >        */
> > > > -     if (!huge_pte_none_mostly(huge_ptep_get(dst_pte)))
> > > > +     if (!hugetlb_pte_none_mostly(dst_hpte))
> > > >               goto out_release_unlock;
> > > >
> > > > -     if (vm_shared) {
> > > > -             page_dup_file_rmap(page, true);
> > > > -     } else {
> > > > -             ClearHPageRestoreReserve(page);
> > > > -             hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
> > > > +     if (new_mapping) {
> > >
> > > IIUC you wanted to avoid the mapcount accountings when it's the sub-page
> > > that was going to be mapped.
> > >
> > > Is it a must we get this only from the caller?  Can we know we're doing
> > > sub-page mapping already here and make a decision with e.g. dst_hpte?
> > >
> > > It looks weird to me to pass this explicitly from the caller, especially
> > > that's when we don't really have the pgtable lock so I'm wondering about
> > > possible race conditions too on having stale new_mapping values.
> >
> > The only way to know what the correct value for `new_mapping` should
> > be is to know if we had to change the hstate-level P*D to non-none to
> > service this UFFDIO_CONTINUE request. I'll see if there is a nice way
> > to do that check in `hugetlb_mcopy_atomic_pte`.
> > Right now there is no
>
> Would "new_mapping = dest_hpte->shift != huge_page_shift(hstate)" work (or
> something alike)?

This works in the hugetlb_fault case, because in the hugetlb_fault
case, we install the largest PTE possible. If we are mapping a page
for the first time, we will use an hstate-sized PTE. But for
UFFDIO_CONTINUE, we may be installing a 4K PTE as the first PTE for
the whole hpage.

>
> > race, because we synchronize on the per-hpage mutex.
>
> Yeah not familiar with that mutex enough to tell, as long as that mutex
> guarantees no pgtable update (hmm, then why we need the pgtable lock
> here???) then it looks fine.

Let me take a closer look at this. I'll have a more detailed
explanation for the next version of the RFC.

>
> [...]
>
> > > > @@ -335,12 +337,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> > > >       copied = 0;
> > > >       page = NULL;
> > > >       vma_hpagesize = vma_kernel_pagesize(dst_vma);
> > > > +     if (use_hgm)
> > > > +             vma_altpagesize = PAGE_SIZE;
> > >
> > > Do we need to check the "len" to know whether we should use sub-page
> > > mapping or original hpage size?  E.g. any old UFFDIO_CONTINUE code will
> > > still want the old behavior I think.
> >
> > I think that's a fair point; however, if we enable HGM and the address
> > and len happen to be hstate-aligned
>
> The address can, but len (note! not "end" here) cannot?

They both (dst_start and len) need to be hpage-aligned, otherwise we
won't be able to install hstate-sized PTEs. Like if we're installing
4K at the beginning of a 1G hpage, we can't install a PUD, because we
only want to install that 4K.

>
> > , we basically do the same thing as
> > if HGM wasn't enabled. It could be a minor performance optimization to
> > do `vma_altpagesize=vma_hpagesize` in that case, but in terms of how
> > the page tables are set up, the end result would be the same.
>
> Thanks,

Thanks!

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE
  2022-07-20 20:58         ` James Houghton
@ 2022-07-21 19:09           ` Peter Xu
  2022-07-21 19:44             ` James Houghton
  0 siblings, 1 reply; 128+ messages in thread
From: Peter Xu @ 2022-07-21 19:09 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Wed, Jul 20, 2022 at 01:58:06PM -0700, James Houghton wrote:
> > > > > @@ -335,12 +337,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> > > > >       copied = 0;
> > > > >       page = NULL;
> > > > >       vma_hpagesize = vma_kernel_pagesize(dst_vma);
> > > > > +     if (use_hgm)
> > > > > +             vma_altpagesize = PAGE_SIZE;
> > > >
> > > > Do we need to check the "len" to know whether we should use sub-page
> > > > mapping or original hpage size?  E.g. any old UFFDIO_CONTINUE code will
> > > > still want the old behavior I think.
> > >
> > > I think that's a fair point; however, if we enable HGM and the address
> > > and len happen to be hstate-aligned
> >
> > The address can, but len (note! not "end" here) cannot?
> 
> They both (dst_start and len) need to be hpage-aligned, otherwise we
> won't be able to install hstate-sized PTEs. Like if we're installing
> 4K at the beginning of a 1G hpage, we can't install a PUD, because we
> only want to install that 4K.

I'm still confused...

Shouldn't one of the major goals of sub-page mapping is to grant user the
capability to do UFFDIO_CONTINUE with len<hpagesize (so we install pages in
sub-page level)?  If so, why len needs to be always hpagesize aligned?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE
  2022-07-21 19:09           ` Peter Xu
@ 2022-07-21 19:44             ` James Houghton
  2022-07-21 19:53               ` Peter Xu
  0 siblings, 1 reply; 128+ messages in thread
From: James Houghton @ 2022-07-21 19:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Thu, Jul 21, 2022 at 12:09 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jul 20, 2022 at 01:58:06PM -0700, James Houghton wrote:
> > > > > > @@ -335,12 +337,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> > > > > >       copied = 0;
> > > > > >       page = NULL;
> > > > > >       vma_hpagesize = vma_kernel_pagesize(dst_vma);
> > > > > > +     if (use_hgm)
> > > > > > +             vma_altpagesize = PAGE_SIZE;
> > > > >
> > > > > Do we need to check the "len" to know whether we should use sub-page
> > > > > mapping or original hpage size?  E.g. any old UFFDIO_CONTINUE code will
> > > > > still want the old behavior I think.
> > > >
> > > > I think that's a fair point; however, if we enable HGM and the address
> > > > and len happen to be hstate-aligned
> > >
> > > The address can, but len (note! not "end" here) cannot?
> >
> > They both (dst_start and len) need to be hpage-aligned, otherwise we
> > won't be able to install hstate-sized PTEs. Like if we're installing
> > 4K at the beginning of a 1G hpage, we can't install a PUD, because we
> > only want to install that 4K.
>
> I'm still confused...
>
> Shouldn't one of the major goals of sub-page mapping is to grant user the
> capability to do UFFDIO_CONTINUE with len<hpagesize (so we install pages in
> sub-page level)?  If so, why len needs to be always hpagesize aligned?

Sorry I misunderstood what you were asking. We allow both to be
PAGE_SIZE-aligned. :) That is indeed the goal of HGM.

If dst_start and len were both hpage-aligned, then we *could* set
`use_hgm = false`, and everything would still work. That's what I
thought you were asking about. I don't see any reason to do this
though, as `use_hgm = true` will only grant additional functionality,
and `use_hgm = false` would only -- at best -- be a minor performance
optimization in this case.

- James

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE
  2022-07-21 19:44             ` James Houghton
@ 2022-07-21 19:53               ` Peter Xu
  0 siblings, 0 replies; 128+ messages in thread
From: Peter Xu @ 2022-07-21 19:53 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Thu, Jul 21, 2022 at 12:44:58PM -0700, James Houghton wrote:
> On Thu, Jul 21, 2022 at 12:09 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Jul 20, 2022 at 01:58:06PM -0700, James Houghton wrote:
> > > > > > > @@ -335,12 +337,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
> > > > > > >       copied = 0;
> > > > > > >       page = NULL;
> > > > > > >       vma_hpagesize = vma_kernel_pagesize(dst_vma);
> > > > > > > +     if (use_hgm)
> > > > > > > +             vma_altpagesize = PAGE_SIZE;
> > > > > >
> > > > > > Do we need to check the "len" to know whether we should use sub-page
> > > > > > mapping or original hpage size?  E.g. any old UFFDIO_CONTINUE code will
> > > > > > still want the old behavior I think.
> > > > >
> > > > > I think that's a fair point; however, if we enable HGM and the address
> > > > > and len happen to be hstate-aligned
> > > >
> > > > The address can, but len (note! not "end" here) cannot?
> > >
> > > They both (dst_start and len) need to be hpage-aligned, otherwise we
> > > won't be able to install hstate-sized PTEs. Like if we're installing
> > > 4K at the beginning of a 1G hpage, we can't install a PUD, because we
> > > only want to install that 4K.
> >
> > I'm still confused...
> >
> > Shouldn't one of the major goals of sub-page mapping is to grant user the
> > capability to do UFFDIO_CONTINUE with len<hpagesize (so we install pages in
> > sub-page level)?  If so, why len needs to be always hpagesize aligned?
> 
> Sorry I misunderstood what you were asking. We allow both to be
> PAGE_SIZE-aligned. :) That is indeed the goal of HGM.

Ah OK. :)

> 
> If dst_start and len were both hpage-aligned, then we *could* set
> `use_hgm = false`, and everything would still work. That's what I
> thought you were asking about. I don't see any reason to do this
> though, as `use_hgm = true` will only grant additional functionality,
> and `use_hgm = false` would only -- at best -- be a minor performance
> optimization in this case.

I just want to make sure this patch won't break existing uffd-minor users,
or it'll be an kernel abi breakage.

We'd still want to have e.g. existing compiled apps run like before, which
iiuc means we should only use sub-page mapping when len!=hpagesize here.

I'm not sure it's only about perf - the app may not even be prepared to
receive yet another page faults within the same huge page range.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
                     ` (4 preceding siblings ...)
  2022-07-11 23:32   ` Mike Kravetz
@ 2022-09-08 17:38   ` Peter Xu
  2022-09-08 17:54     ` James Houghton
  5 siblings, 1 reply; 128+ messages in thread
From: Peter Xu @ 2022-09-08 17:38 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

James,

On Fri, Jun 24, 2022 at 05:36:37PM +0000, James Houghton wrote:
> +static inline
> +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> +{
> +
> +	BUG_ON(!hpte->ptep);
> +	// Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
> +	// the regular page table lock.
> +	if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> +		return huge_pte_lockptr(hugetlb_pte_shift(hpte),
> +				mm, hpte->ptep);
> +	return &mm->page_table_lock;
> +}

Today when I re-read part of this thread, I found that I'm not sure whether
this is safe.  IIUC taking different locks depending on the state of pte
may lead to issues.

For example, could below race happen where two threads can be taking
different locks even if stumbled over the same pmd entry?

         thread 1                          thread 2
         --------                          --------

    hugetlb_pte_lockptr (for pmd level)
      pte_none()==true,
        take pmd lock
    pmd_alloc()
                                hugetlb_pte_lockptr (for pmd level)
                                  pte is pgtable entry (so !none, !present_leaf)
                                    take page_table_lock
                                (can run concurrently with thread 1...)
    pte_alloc()
    ...

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries
  2022-09-08 17:38   ` Peter Xu
@ 2022-09-08 17:54     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-09-08 17:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Thu, Sep 8, 2022 at 10:38 AM Peter Xu <peterx@redhat.com> wrote:
>
> James,
>
> On Fri, Jun 24, 2022 at 05:36:37PM +0000, James Houghton wrote:
> > +static inline
> > +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     // Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
> > +     // the regular page table lock.
> > +     if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> > +             return huge_pte_lockptr(hugetlb_pte_shift(hpte),
> > +                             mm, hpte->ptep);
> > +     return &mm->page_table_lock;
> > +}
>
> Today when I re-read part of this thread, I found that I'm not sure whether
> this is safe.  IIUC taking different locks depending on the state of pte
> may lead to issues.
>
> For example, could below race happen where two threads can be taking
> different locks even if stumbled over the same pmd entry?
>
>          thread 1                          thread 2
>          --------                          --------
>
>     hugetlb_pte_lockptr (for pmd level)
>       pte_none()==true,
>         take pmd lock
>     pmd_alloc()
>                                 hugetlb_pte_lockptr (for pmd level)
>                                   pte is pgtable entry (so !none, !present_leaf)
>                                     take page_table_lock
>                                 (can run concurrently with thread 1...)
>     pte_alloc()
>     ...

Thanks for pointing out this race. Yes, it is wrong to change which
lock we take depending on the value of the PTE, as we would need to
lock the PTE first to correctly make the decision. This has already
been fixed in the next version of this series :). That is, we choose
which lock to grab based on the PTE's page table level.

- James

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled
  2022-06-24 17:36 ` [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled James Houghton
  2022-06-27 12:55   ` manish.mishra
  2022-06-28 20:33   ` Mina Almasry
@ 2022-09-08 18:07   ` Peter Xu
  2022-09-08 18:13     ` James Houghton
  2 siblings, 1 reply; 128+ messages in thread
From: Peter Xu @ 2022-09-08 18:07 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 05:36:39PM +0000, James Houghton wrote:
> +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> +bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> +{
> +	/* All shared VMAs have HGM enabled. */
> +	return vma->vm_flags & VM_SHARED;
> +}
> +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */

Another nitpick: suggest to rename this to "hugetlb_***_supported()" (with
whatever the new name could be..), as long as it cannot be "disabled". :)

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled
  2022-09-08 18:07   ` Peter Xu
@ 2022-09-08 18:13     ` James Houghton
  0 siblings, 0 replies; 128+ messages in thread
From: James Houghton @ 2022-09-08 18:13 UTC (permalink / raw)
  To: Peter Xu
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Thu, Sep 8, 2022 at 11:07 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Jun 24, 2022 at 05:36:39PM +0000, James Houghton wrote:
> > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> > +{
> > +     /* All shared VMAs have HGM enabled. */
> > +     return vma->vm_flags & VM_SHARED;
> > +}
> > +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
>
> Another nitpick: suggest to rename this to "hugetlb_***_supported()" (with
> whatever the new name could be..), as long as it cannot be "disabled". :)
>

Will do. :)

- James

> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks
  2022-06-24 17:36 ` [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks James Houghton
  2022-06-27 13:07   ` manish.mishra
@ 2022-09-08 18:20   ` Peter Xu
  1 sibling, 0 replies; 128+ messages in thread
From: Peter Xu @ 2022-09-08 18:20 UTC (permalink / raw)
  To: James Houghton
  Cc: Mike Kravetz, Muchun Song, David Hildenbrand, David Rientjes,
	Axel Rasmussen, Mina Almasry, Jue Wang, Manish Mishra,
	Dr . David Alan Gilbert, linux-mm, linux-kernel

On Fri, Jun 24, 2022 at 05:36:41PM +0000, James Houghton wrote:
> This adds it for architectures that use GENERAL_HUGETLB, including x86.
> 
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/hugetlb.h |  2 ++
>  mm/hugetlb.c            | 45 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index e7a6b944d0cc..605aa19d8572 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -258,6 +258,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
>  			unsigned long addr, unsigned long sz);
>  pte_t *huge_pte_offset(struct mm_struct *mm,
>  		       unsigned long addr, unsigned long sz);
> +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		    unsigned long addr, unsigned long sz, bool stop_at_none);
>  int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
>  				unsigned long *addr, pte_t *ptep);
>  void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 557b0afdb503..3ec2a921ee6f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6981,6 +6981,51 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>  	return (pte_t *)pmd;
>  }
>  
> +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		    unsigned long addr, unsigned long sz, bool stop_at_none)
> +{
> +	pte_t *ptep;
> +
> +	if (!hpte->ptep) {
> +		pgd_t *pgd = pgd_offset(mm, addr);
> +
> +		if (!pgd)
> +			return -ENOMEM;
> +		ptep = (pte_t *)p4d_alloc(mm, pgd, addr);
> +		if (!ptep)
> +			return -ENOMEM;
> +		hugetlb_pte_populate(hpte, ptep, P4D_SHIFT);
> +	}
> +
> +	while (hugetlb_pte_size(hpte) > sz &&
> +			!hugetlb_pte_present_leaf(hpte) &&
> +			!(stop_at_none && hugetlb_pte_none(hpte))) {
> +		if (hpte->shift == PMD_SHIFT) {
> +			ptep = pte_alloc_map(mm, (pmd_t *)hpte->ptep, addr);

I had a feeling that the pairing pte_unmap() was lost.

I think most distros are not with CONFIG_HIGHPTE at all, but still..

> +			if (!ptep)
> +				return -ENOMEM;
> +			hpte->shift = PAGE_SHIFT;
> +			hpte->ptep = ptep;
> +		} else if (hpte->shift == PUD_SHIFT) {
> +			ptep = (pte_t *)pmd_alloc(mm, (pud_t *)hpte->ptep,
> +						  addr);
> +			if (!ptep)
> +				return -ENOMEM;
> +			hpte->shift = PMD_SHIFT;
> +			hpte->ptep = ptep;
> +		} else if (hpte->shift == P4D_SHIFT) {
> +			ptep = (pte_t *)pud_alloc(mm, (p4d_t *)hpte->ptep,
> +						  addr);
> +			if (!ptep)
> +				return -ENOMEM;
> +			hpte->shift = PUD_SHIFT;
> +			hpte->ptep = ptep;
> +		} else
> +			BUG();
> +	}
> +	return 0;
> +}
> +
>  #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>  
>  #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 128+ messages in thread

end of thread, other threads:[~2022-09-08 18:21 UTC | newest]

Thread overview: 128+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
2022-06-24 17:36 ` [RFC PATCH 01/26] hugetlb: make hstate accessor functions const James Houghton
2022-06-24 18:43   ` Mina Almasry
     [not found]   ` <e55f90f5-ba14-5d6e-8f8f-abf731b9095e@nutanix.com>
2022-06-27 12:09     ` manish.mishra
2022-06-28 17:08       ` James Houghton
2022-06-29  6:18   ` Muchun Song
2022-06-24 17:36 ` [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2022-06-24 18:51   ` Mina Almasry
2022-06-27 12:08   ` manish.mishra
2022-06-28 15:35     ` James Houghton
2022-06-27 18:42   ` Mike Kravetz
2022-06-28 15:40     ` James Houghton
2022-06-29  6:39       ` Muchun Song
2022-06-29 21:06         ` Mike Kravetz
2022-06-29 21:13           ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift James Houghton
2022-06-24 19:01   ` Mina Almasry
2022-06-27 12:13   ` manish.mishra
2022-06-24 17:36 ` [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2022-06-27 12:26   ` manish.mishra
2022-06-27 20:51   ` Mike Kravetz
2022-06-28 15:29     ` James Houghton
2022-06-29  6:09     ` Muchun Song
2022-06-29 21:03       ` Mike Kravetz
2022-06-29 21:39         ` James Houghton
2022-06-29 22:24           ` Mike Kravetz
2022-06-30  9:35             ` Muchun Song
2022-06-30 16:23               ` James Houghton
2022-06-30 17:40                 ` Mike Kravetz
2022-07-01  3:32                 ` Muchun Song
2022-06-24 17:36 ` [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2022-06-27 12:28   ` manish.mishra
2022-06-28 20:03     ` Mina Almasry
2022-06-24 17:36 ` [RFC PATCH 06/26] mm: make free_p?d_range functions public James Houghton
2022-06-27 12:31   ` manish.mishra
2022-06-28 20:35   ` Mike Kravetz
2022-07-12 20:52     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2022-06-25  2:59   ` kernel test robot
2022-06-27 12:47   ` manish.mishra
2022-06-29 16:28     ` James Houghton
2022-06-28 20:25   ` Mina Almasry
2022-06-29 16:42     ` James Houghton
2022-06-28 20:44   ` Mike Kravetz
2022-06-29 16:24     ` James Houghton
2022-07-11 23:32   ` Mike Kravetz
2022-07-12  9:42     ` Dr. David Alan Gilbert
2022-07-12 17:51       ` Mike Kravetz
2022-07-15 16:35       ` Peter Xu
2022-07-15 21:52         ` Axel Rasmussen
2022-07-15 23:03           ` Peter Xu
2022-09-08 17:38   ` Peter Xu
2022-09-08 17:54     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures James Houghton
2022-06-27 12:52   ` manish.mishra
2022-06-28 20:27   ` Mina Almasry
2022-06-24 17:36 ` [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled James Houghton
2022-06-27 12:55   ` manish.mishra
2022-06-28 20:33   ` Mina Almasry
2022-09-08 18:07   ` Peter Xu
2022-09-08 18:13     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift James Houghton
2022-06-27 13:01   ` manish.mishra
2022-06-28 21:58   ` Mina Almasry
2022-07-07 21:39     ` Mike Kravetz
2022-07-08 15:52     ` James Houghton
2022-07-09 21:55       ` Mina Almasry
2022-06-24 17:36 ` [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks James Houghton
2022-06-27 13:07   ` manish.mishra
2022-07-07 23:03     ` Mike Kravetz
2022-09-08 18:20   ` Peter Xu
2022-06-24 17:36 ` [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality James Houghton
2022-06-27 13:50   ` manish.mishra
2022-06-29 16:10     ` James Houghton
2022-06-29 14:33   ` manish.mishra
2022-06-29 16:20     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity James Houghton
2022-06-29 14:11   ` manish.mishra
2022-06-24 17:36 ` [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2022-06-29 14:40   ` manish.mishra
2022-06-29 15:56     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2022-07-19 10:19   ` manish.mishra
2022-07-19 15:58     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 16/26] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
2022-06-24 17:36 ` [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM James Houghton
2022-07-19 10:48   ` manish.mishra
2022-07-19 16:19     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
2022-06-25  2:58   ` kernel test robot
2022-06-25  2:59   ` kernel test robot
2022-06-24 17:36 ` [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2022-07-11 23:41   ` Mike Kravetz
2022-07-12 17:19     ` James Houghton
2022-07-12 18:06       ` Mike Kravetz
2022-07-15 21:39         ` Axel Rasmussen
2022-06-24 17:36 ` [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE James Houghton
2022-07-15 16:21   ` Peter Xu
2022-07-15 16:58     ` James Houghton
2022-07-15 17:20       ` Peter Xu
2022-07-20 20:58         ` James Houghton
2022-07-21 19:09           ` Peter Xu
2022-07-21 19:44             ` James Houghton
2022-07-21 19:53               ` Peter Xu
2022-06-24 17:36 ` [RFC PATCH 21/26] hugetlb: add hugetlb_collapse James Houghton
2022-06-24 17:36 ` [RFC PATCH 22/26] madvise: add uapi for HugeTLB HGM collapse: MADV_COLLAPSE James Houghton
2022-06-24 17:36 ` [RFC PATCH 23/26] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
2022-06-24 17:36 ` [RFC PATCH 24/26] arm64/hugetlb: add support for high-granularity mappings James Houghton
2022-06-24 17:36 ` [RFC PATCH 25/26] selftests: add HugeTLB HGM to userfaultfd selftest James Houghton
2022-06-24 17:36 ` [RFC PATCH 26/26] selftests: add HugeTLB HGM to KVM demand paging selftest James Houghton
2022-06-24 18:29 ` [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping Matthew Wilcox
2022-06-27 16:36   ` James Houghton
2022-06-27 17:56     ` Dr. David Alan Gilbert
2022-06-27 20:31       ` James Houghton
2022-06-28  0:04         ` Nadav Amit
2022-06-30 19:21           ` Peter Xu
2022-07-01  5:54             ` Nadav Amit
2022-06-28  8:20         ` Dr. David Alan Gilbert
2022-06-30 16:09           ` Peter Xu
2022-06-24 18:41 ` Mina Almasry
2022-06-27 16:27   ` James Houghton
2022-06-28 14:17     ` Muchun Song
2022-06-28 17:26     ` Mina Almasry
2022-06-28 17:56       ` Dr. David Alan Gilbert
2022-06-29 18:31         ` James Houghton
2022-06-29 20:39       ` Axel Rasmussen
2022-06-24 18:47 ` Matthew Wilcox
2022-06-27 16:48   ` James Houghton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.