linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] mm: Break COW for pinned pages during fork()
@ 2020-09-21 21:17 Peter Xu
  2020-09-21 21:17 ` [PATCH 1/5] mm: Introduce mm_struct.has_pinned Peter Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-21 21:17 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Jason Gunthorpe, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Peter Xu,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

Finally I start to post formal patches because it's growing.  And also since\r
we've discussed quite some issues already, so I feel like it's clearer on what\r
we need to do, and how.\r
\r
This series is majorly inspired by the previous discussion on the list [1],\r
starting from the report from Jason on the rdma test failure.  Linus proposed\r
the solution, which seems to be a very nice approach to avoid the breakage of\r
userspace apps that didn't use MADV_DONTFORK properly before.  More information\r
can be found in that thread too.\r
\r
I believe the initial plan was to consider merging something like this for\r
rc7/rc8.  However now I'm not sure due to the fact that the code change in\r
copy_pte_range() is probably more than expected, so it can be with some risk.\r
I'll leave this question to the reviewers...\r
\r
I tested it myself with fork() after vfio pinning a bunch of device pages, and\r
I verified that the new copy pte logic worked as expected at least in the most\r
general path.  However I didn't test thp case yet because afaict vfio does not\r
support thp backed dma pages.  Luckily, the pmd/pud thp patch is much more\r
straightforward than the pte one, so hopefully it can be directly verified by\r
some code review plus some more heavy-weight rdma tests.\r
\r
Patch 1:      Introduce mm.has_pinned (as single patch as suggested by Jason)\r
Patch 2-3:    Some slight rework on copy_page_range() path as preparation\r
Patch 4:      Early cow solution for pte copy for pinned pages\r
Patch 5:      Same as above, but for thp (pmd/pud).\r
\r
Hugetlbfs fix is still missing, but as planned, that's not urgent so we can\r
work upon.  Comments greatly welcomed.\r
\r
Thanks.\r
\r
Peter Xu (5):\r
  mm: Introduce mm_struct.has_pinned\r
  mm/fork: Pass new vma pointer into copy_page_range()\r
  mm: Rework return value for copy_one_pte()\r
  mm: Do early cow for pinned pages during fork() for ptes\r
  mm/thp: Split huge pmds/puds if they're pinned when fork()\r
\r
 include/linux/mm.h       |   2 +-\r
 include/linux/mm_types.h |  10 ++\r
 kernel/fork.c            |   3 +-\r
 mm/gup.c                 |   6 ++\r
 mm/huge_memory.c         |  26 +++++\r
 mm/memory.c              | 226 +++++++++++++++++++++++++++++++++++----\r
 6 files changed, 248 insertions(+), 25 deletions(-)\r
\r
-- \r
2.26.2\r
\r



^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-21 21:17 [PATCH 0/5] mm: Break COW for pinned pages during fork() Peter Xu
@ 2020-09-21 21:17 ` Peter Xu
  2020-09-21 21:43   ` Jann Horn
                     ` (2 more replies)
  2020-09-21 21:17 ` [PATCH 2/5] mm/fork: Pass new vma pointer into copy_page_range() Peter Xu
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-21 21:17 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Jason Gunthorpe, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Peter Xu,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

(Commit message collected from Jason Gunthorpe)

Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
track if the mm_struct has ever been used with pin_user_pages(). mm_structs
that have never been passed to pin_user_pages() cannot have a positive
page_maybe_dma_pinned() by definition. This allows cases that might drive up
the page ref_count to avoid any penalty from handling dma_pinned pages.

Due to complexities with unpining this trivial version is a permanent sticky
bit, future work will be needed to make this a counter.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm_types.h | 10 ++++++++++
 kernel/fork.c            |  1 +
 mm/gup.c                 |  6 ++++++
 3 files changed, 17 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 496c3ff97cce..6f291f8b74c6 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -441,6 +441,16 @@ struct mm_struct {
 #endif
 		int map_count;			/* number of VMAs */
 
+		/**
+		 * @has_pinned: Whether this mm has pinned any pages.  This can
+		 * be either replaced in the future by @pinned_vm when it
+		 * becomes stable, or grow into a counter on its own. We're
+		 * aggresive on this bit now - even if the pinned pages were
+		 * unpinned later on, we'll still keep this bit set for the
+		 * lifecycle of this mm just for simplicity.
+		 */
+		int has_pinned;
+
 		spinlock_t page_table_lock; /* Protects page tables and some
 					     * counters
 					     */
diff --git a/kernel/fork.c b/kernel/fork.c
index 49677d668de4..7237d418e7b5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1011,6 +1011,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	mm_pgtables_bytes_init(mm);
 	mm->map_count = 0;
 	mm->locked_vm = 0;
+	mm->has_pinned = 0;
 	atomic64_set(&mm->pinned_vm, 0);
 	memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
 	spin_lock_init(&mm->page_table_lock);
diff --git a/mm/gup.c b/mm/gup.c
index e5739a1974d5..2d9019bf1773 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1255,6 +1255,9 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
 		BUG_ON(*locked != 1);
 	}
 
+	if (flags & FOLL_PIN)
+		WRITE_ONCE(mm->has_pinned, 1);
+
 	/*
 	 * FOLL_PIN and FOLL_GET are mutually exclusive. Traditional behavior
 	 * is to set FOLL_GET if the caller wants pages[] filled in (but has
@@ -2660,6 +2663,9 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
 				       FOLL_FAST_ONLY)))
 		return -EINVAL;
 
+	if (gup_flags & FOLL_PIN)
+		WRITE_ONCE(current->mm->has_pinned, 1);
+
 	if (!(gup_flags & FOLL_FAST_ONLY))
 		might_lock_read(&current->mm->mmap_lock);
 
-- 
2.26.2



^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 2/5] mm/fork: Pass new vma pointer into copy_page_range()
  2020-09-21 21:17 [PATCH 0/5] mm: Break COW for pinned pages during fork() Peter Xu
  2020-09-21 21:17 ` [PATCH 1/5] mm: Introduce mm_struct.has_pinned Peter Xu
@ 2020-09-21 21:17 ` Peter Xu
  2020-09-21 21:17 ` [PATCH 3/5] mm: Rework return value for copy_one_pte() Peter Xu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-21 21:17 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Jason Gunthorpe, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Peter Xu,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

This prepares for the future work to trigger early cow on pinned pages during
fork().  No functional change intended.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h |  2 +-
 kernel/fork.c      |  2 +-
 mm/memory.c        | 14 +++++++++-----
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ca6e6a81576b..bf1ac54be55e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1644,7 +1644,7 @@ struct mmu_notifier_range;
 void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
 		unsigned long end, unsigned long floor, unsigned long ceiling);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
-			struct vm_area_struct *vma);
+		    struct vm_area_struct *vma, struct vm_area_struct *new);
 int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   struct mmu_notifier_range *range,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
diff --git a/kernel/fork.c b/kernel/fork.c
index 7237d418e7b5..843807ade6dd 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -589,7 +589,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 
 		mm->map_count++;
 		if (!(tmp->vm_flags & VM_WIPEONFORK))
-			retval = copy_page_range(mm, oldmm, mpnt);
+			retval = copy_page_range(mm, oldmm, mpnt, tmp);
 
 		if (tmp->vm_ops && tmp->vm_ops->open)
 			tmp->vm_ops->open(tmp);
diff --git a/mm/memory.c b/mm/memory.c
index 469af373ae76..7525147908c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -814,6 +814,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		   pmd_t *dst_pmd, pmd_t *src_pmd, struct vm_area_struct *vma,
+		   struct vm_area_struct *new,
 		   unsigned long addr, unsigned long end)
 {
 	pte_t *orig_src_pte, *orig_dst_pte;
@@ -877,6 +878,7 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pud_t *dst_pud, pud_t *src_pud, struct vm_area_struct *vma,
+		struct vm_area_struct *new,
 		unsigned long addr, unsigned long end)
 {
 	pmd_t *src_pmd, *dst_pmd;
@@ -903,7 +905,7 @@ static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src
 		if (pmd_none_or_clear_bad(src_pmd))
 			continue;
 		if (copy_pte_range(dst_mm, src_mm, dst_pmd, src_pmd,
-						vma, addr, next))
+				   vma, new, addr, next))
 			return -ENOMEM;
 	} while (dst_pmd++, src_pmd++, addr = next, addr != end);
 	return 0;
@@ -911,6 +913,7 @@ static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src
 
 static inline int copy_pud_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		p4d_t *dst_p4d, p4d_t *src_p4d, struct vm_area_struct *vma,
+		struct vm_area_struct *new,
 		unsigned long addr, unsigned long end)
 {
 	pud_t *src_pud, *dst_pud;
@@ -937,7 +940,7 @@ static inline int copy_pud_range(struct mm_struct *dst_mm, struct mm_struct *src
 		if (pud_none_or_clear_bad(src_pud))
 			continue;
 		if (copy_pmd_range(dst_mm, src_mm, dst_pud, src_pud,
-						vma, addr, next))
+				   vma, new, addr, next))
 			return -ENOMEM;
 	} while (dst_pud++, src_pud++, addr = next, addr != end);
 	return 0;
@@ -945,6 +948,7 @@ static inline int copy_pud_range(struct mm_struct *dst_mm, struct mm_struct *src
 
 static inline int copy_p4d_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pgd_t *dst_pgd, pgd_t *src_pgd, struct vm_area_struct *vma,
+		struct vm_area_struct *new,
 		unsigned long addr, unsigned long end)
 {
 	p4d_t *src_p4d, *dst_p4d;
@@ -959,14 +963,14 @@ static inline int copy_p4d_range(struct mm_struct *dst_mm, struct mm_struct *src
 		if (p4d_none_or_clear_bad(src_p4d))
 			continue;
 		if (copy_pud_range(dst_mm, src_mm, dst_p4d, src_p4d,
-						vma, addr, next))
+				   vma, new, addr, next))
 			return -ENOMEM;
 	} while (dst_p4d++, src_p4d++, addr = next, addr != end);
 	return 0;
 }
 
 int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		struct vm_area_struct *vma)
+		    struct vm_area_struct *vma, struct vm_area_struct *new)
 {
 	pgd_t *src_pgd, *dst_pgd;
 	unsigned long next;
@@ -1021,7 +1025,7 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		if (pgd_none_or_clear_bad(src_pgd))
 			continue;
 		if (unlikely(copy_p4d_range(dst_mm, src_mm, dst_pgd, src_pgd,
-					    vma, addr, next))) {
+					    vma, new, addr, next))) {
 			ret = -ENOMEM;
 			break;
 		}
-- 
2.26.2



^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-21 21:17 [PATCH 0/5] mm: Break COW for pinned pages during fork() Peter Xu
  2020-09-21 21:17 ` [PATCH 1/5] mm: Introduce mm_struct.has_pinned Peter Xu
  2020-09-21 21:17 ` [PATCH 2/5] mm/fork: Pass new vma pointer into copy_page_range() Peter Xu
@ 2020-09-21 21:17 ` Peter Xu
  2020-09-22  7:11   ` John Hubbard
                     ` (2 more replies)
  2020-09-21 21:20 ` [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes Peter Xu
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-21 21:17 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Jason Gunthorpe, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Peter Xu,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

There's one special path for copy_one_pte() with swap entries, in which
add_swap_count_continuation(GFP_ATOMIC) might fail.  In that case we'll return
the swp_entry_t so that the caller will release the locks and redo the same
thing with GFP_KERNEL.

It's confusing when copy_one_pte() must return a swp_entry_t (even if all the
ptes are non-swap entries).  More importantly, we face other requirement to
extend this "we need to do something else, but without the locks held" case.

Rework the return value into something easier to understand, as defined in enum
copy_mm_ret.  We'll pass the swp_entry_t back using the newly introduced union
copy_mm_data parameter.

Another trivial change is to move the reset of the "progress" counter into the
retry path, so that we'll reset it for other reasons too.

This should prepare us with adding new return codes, very soon.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c | 42 +++++++++++++++++++++++++++++-------------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7525147908c4..1530bb1070f4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -689,16 +689,24 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 }
 #endif
 
+#define  COPY_MM_DONE               0
+#define  COPY_MM_SWAP_CONT          1
+
+struct copy_mm_data {
+	/* COPY_MM_SWAP_CONT */
+	swp_entry_t entry;
+};
+
 /*
  * copy one vm_area from one task to the other. Assumes the page tables
  * already present in the new task to be cleared in the whole range
  * covered by this vma.
  */
 
-static inline unsigned long
+static inline int
 copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
-		unsigned long addr, int *rss)
+		unsigned long addr, int *rss, struct copy_mm_data *data)
 {
 	unsigned long vm_flags = vma->vm_flags;
 	pte_t pte = *src_pte;
@@ -709,8 +717,10 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		swp_entry_t entry = pte_to_swp_entry(pte);
 
 		if (likely(!non_swap_entry(entry))) {
-			if (swap_duplicate(entry) < 0)
-				return entry.val;
+			if (swap_duplicate(entry) < 0) {
+				data->entry = entry;
+				return COPY_MM_SWAP_CONT;
+			}
 
 			/* make sure dst_mm is on swapoff's mmlist. */
 			if (unlikely(list_empty(&dst_mm->mmlist))) {
@@ -809,7 +819,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 out_set_pte:
 	set_pte_at(dst_mm, addr, dst_pte, pte);
-	return 0;
+	return COPY_MM_DONE;
 }
 
 static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
@@ -820,9 +830,9 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	pte_t *orig_src_pte, *orig_dst_pte;
 	pte_t *src_pte, *dst_pte;
 	spinlock_t *src_ptl, *dst_ptl;
-	int progress = 0;
+	int progress, copy_ret = COPY_MM_DONE;
 	int rss[NR_MM_COUNTERS];
-	swp_entry_t entry = (swp_entry_t){0};
+	struct copy_mm_data data;
 
 again:
 	init_rss_vec(rss);
@@ -837,6 +847,7 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	orig_dst_pte = dst_pte;
 	arch_enter_lazy_mmu_mode();
 
+	progress = 0;
 	do {
 		/*
 		 * We are holding two locks at this point - either of them
@@ -852,9 +863,9 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			progress++;
 			continue;
 		}
-		entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
-							vma, addr, rss);
-		if (entry.val)
+		copy_ret = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
+					vma, addr, rss, &data);
+		if (copy_ret != COPY_MM_DONE)
 			break;
 		progress += 8;
 	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
@@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	pte_unmap_unlock(orig_dst_pte, dst_ptl);
 	cond_resched();
 
-	if (entry.val) {
-		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
+	switch (copy_ret) {
+	case COPY_MM_SWAP_CONT:
+		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
 			return -ENOMEM;
-		progress = 0;
+		break;
+	default:
+		break;
 	}
+
 	if (addr != end)
 		goto again;
+
 	return 0;
 }
 
-- 
2.26.2



^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-21 21:17 [PATCH 0/5] mm: Break COW for pinned pages during fork() Peter Xu
                   ` (2 preceding siblings ...)
  2020-09-21 21:17 ` [PATCH 3/5] mm: Rework return value for copy_one_pte() Peter Xu
@ 2020-09-21 21:20 ` Peter Xu
  2020-09-21 21:55   ` Jann Horn
                     ` (2 more replies)
  2020-09-21 21:20 ` [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork() Peter Xu
  2020-09-23 10:21 ` [PATCH 0/5] mm: Break COW for pinned pages during fork() Leon Romanovsky
  5 siblings, 3 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-21 21:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: peterx, Linus Torvalds, Michal Hocko, Kirill Shutemov, Jann Horn,
	Oleg Nesterov, Kirill Tkhai, Hugh Dickins, Leon Romanovsky,
	Jan Kara, John Hubbard, Christoph Hellwig, Andrew Morton,
	Jason Gunthorpe, Andrea Arcangeli

This patch is greatly inspired by the discussions on the list from Linus, Jason
Gunthorpe and others [1].

It allows copy_pte_range() to do early cow if the pages were pinned on the
source mm.  Currently we don't have an accurate way to know whether a page is
pinned or not.  The only thing we have is page_maybe_dma_pinned().  However
that's good enough for now.  Especially, with the newly added mm->has_pinned
flag to make sure we won't affect processes that never pinned any pages.

It would be easier if we can do GFP_KERNEL allocation within copy_one_pte().
Unluckily, we can't because we're with the page table locks held for both the
parent and child processes.  So the page copy process needs to be done outside
copy_one_pte().

The new COPY_MM_BREAK_COW is introduced for this - copy_one_pte() would return
this when it finds any pte that may need an early breaking of cow.

page_duplicate() is used to handle the page copy process in copy_pte_range().
Of course we need to do that after releasing of the locks.

The slightly tricky part is page_duplicate() will fill in the copy_mm_data with
the new page copied and we'll need to re-install the pte again with page table
locks held again.  That's done in pte_install_copied_page().

The whole procedure looks quite similar to wp_page_copy() however it's simpler
because we know the page is special (pinned) and we know we don't need tlb
flushings because no one is referencing the new mm yet.

Though we still have to be very careful on maintaining the two pages (one old
source page, one new allocated page) across all these lock taking/releasing
process and make sure neither of them will get lost.

[1] https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c | 174 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 167 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 1530bb1070f4..8f3521be80ca 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -691,12 +691,72 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 #define  COPY_MM_DONE               0
 #define  COPY_MM_SWAP_CONT          1
+#define  COPY_MM_BREAK_COW          2
 
 struct copy_mm_data {
 	/* COPY_MM_SWAP_CONT */
 	swp_entry_t entry;
+	/* COPY_MM_BREAK_COW */
+	struct {
+		struct page *cow_old_page; /* Released by page_duplicate() */
+		struct page *cow_new_page; /* Released by page_release_cow() */
+		pte_t cow_oldpte;
+	};
 };
 
+static inline void page_release_cow(struct copy_mm_data *data)
+{
+	/* The old page should only be released in page_duplicate() */
+	WARN_ON_ONCE(data->cow_old_page);
+
+	if (data->cow_new_page) {
+		put_page(data->cow_new_page);
+		data->cow_new_page = NULL;
+	}
+}
+
+/*
+ * Duplicate the page for this PTE.  Returns zero if page copied (so we need to
+ * retry on the same PTE again to arm the copied page very soon), or negative
+ * if error happened.  In all cases, the old page will be properly released.
+ */
+static int page_duplicate(struct mm_struct *src_mm, struct vm_area_struct *vma,
+			  unsigned long address, struct copy_mm_data *data)
+{
+	struct page *new_page = NULL;
+	int ret;
+
+	/* This should have been set in change_one_pte() when reach here */
+	WARN_ON_ONCE(!data->cow_old_page);
+
+	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
+	if (!new_page) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	copy_user_highpage(new_page, data->cow_old_page, address, vma);
+	ret = mem_cgroup_charge(new_page, src_mm, GFP_KERNEL);
+	if (ret) {
+		put_page(new_page);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	cgroup_throttle_swaprate(new_page, GFP_KERNEL);
+	__SetPageUptodate(new_page);
+
+	/* So far so good; arm the new page for the next attempt */
+	data->cow_new_page = new_page;
+
+out:
+	/* Always release the old page */
+	put_page(data->cow_old_page);
+	data->cow_old_page = NULL;
+
+	return ret;
+}
+
 /*
  * copy one vm_area from one task to the other. Assumes the page tables
  * already present in the new task to be cleared in the whole range
@@ -711,6 +771,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	unsigned long vm_flags = vma->vm_flags;
 	pte_t pte = *src_pte;
 	struct page *page;
+	bool wp;
 
 	/* pte contains position in swap or file, so copy. */
 	if (unlikely(!pte_present(pte))) {
@@ -789,10 +850,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * If it's a COW mapping, write protect it both
 	 * in the parent and the child
 	 */
-	if (is_cow_mapping(vm_flags) && pte_write(pte)) {
-		ptep_set_wrprotect(src_mm, addr, src_pte);
-		pte = pte_wrprotect(pte);
-	}
+	wp = is_cow_mapping(vm_flags) && pte_write(pte);
 
 	/*
 	 * If it's a shared mapping, mark it clean in
@@ -813,15 +871,80 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	page = vm_normal_page(vma, addr, pte);
 	if (page) {
 		get_page(page);
+
+		/*
+		 * If the page is pinned in source mm, do early cow right now
+		 * so that the pinned page won't be replaced by another random
+		 * page without being noticed after the fork().
+		 *
+		 * Note: there can be some very rare cases that we'll do
+		 * unnecessary cow here, due to page_maybe_dma_pinned() is
+		 * sometimes bogus, and has_pinned flag is currently aggresive
+		 * too.  However this should be good enough for us for now as
+		 * long as we covered all the pinned pages.  We can make this
+		 * better in the future by providing an accurate accounting for
+		 * pinned pages.
+		 *
+		 * Because we'll need to release the locks before doing cow,
+		 * pass this work to upper layer.
+		 */
+		if (READ_ONCE(src_mm->has_pinned) && wp &&
+		    page_maybe_dma_pinned(page)) {
+			/* We've got the page already; we're safe */
+			data->cow_old_page = page;
+			data->cow_oldpte = *src_pte;
+			return COPY_MM_BREAK_COW;
+		}
+
 		page_dup_rmap(page, false);
 		rss[mm_counter(page)]++;
 	}
 
+	if (wp) {
+		ptep_set_wrprotect(src_mm, addr, src_pte);
+		pte = pte_wrprotect(pte);
+	}
+
 out_set_pte:
 	set_pte_at(dst_mm, addr, dst_pte, pte);
 	return COPY_MM_DONE;
 }
 
+/*
+ * Install the pte with the copied page stored in `data'.  Returns true when
+ * installation completes, or false when src pte has changed.
+ */
+static int pte_install_copied_page(struct mm_struct *dst_mm,
+				   struct vm_area_struct *new,
+				   pte_t *src_pte, pte_t *dst_pte,
+				   unsigned long addr, int *rss,
+				   struct copy_mm_data *data)
+{
+	struct page *new_page = data->cow_new_page;
+	pte_t entry;
+
+	if (!pte_same(*src_pte, data->cow_oldpte)) {
+		/* PTE has changed under us.  Release the page and retry */
+		page_release_cow(data);
+		return false;
+	}
+
+	entry = mk_pte(new_page, new->vm_page_prot);
+	entry = pte_sw_mkyoung(entry);
+	entry = maybe_mkwrite(pte_mkdirty(entry), new);
+	page_add_new_anon_rmap(new_page, new, addr, false);
+	set_pte_at(dst_mm, addr, dst_pte, entry);
+	rss[mm_counter(new_page)]++;
+
+	/*
+	 * Manually clear the new page pointer since we've moved ownership to
+	 * the newly armed PTE.
+	 */
+	data->cow_new_page = NULL;
+
+	return true;
+}
+
 static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		   pmd_t *dst_pmd, pmd_t *src_pmd, struct vm_area_struct *vma,
 		   struct vm_area_struct *new,
@@ -830,16 +953,23 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	pte_t *orig_src_pte, *orig_dst_pte;
 	pte_t *src_pte, *dst_pte;
 	spinlock_t *src_ptl, *dst_ptl;
-	int progress, copy_ret = COPY_MM_DONE;
+	int progress, ret, copy_ret = COPY_MM_DONE;
 	int rss[NR_MM_COUNTERS];
 	struct copy_mm_data data;
 
 again:
+	/* We don't reset this for COPY_MM_BREAK_COW */
+	memset(&data, 0, sizeof(data));
+
+again_break_cow:
 	init_rss_vec(rss);
 
 	dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
-	if (!dst_pte)
+	if (!dst_pte) {
+		/* Guarantee that the new page is released if there is */
+		page_release_cow(&data);
 		return -ENOMEM;
+	}
 	src_pte = pte_offset_map(src_pmd, addr);
 	src_ptl = pte_lockptr(src_mm, src_pmd);
 	spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
@@ -859,6 +989,25 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
 				break;
 		}
+
+		if (unlikely(data.cow_new_page)) {
+			/*
+			 * If cow_new_page set, we must be at the 2nd round of
+			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
+			 * page now.  Note that in all cases page_break_cow()
+			 * will properly release the objects in copy_mm_data.
+			 */
+			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
+			if (pte_install_copied_page(dst_mm, new, src_pte,
+						    dst_pte, addr, rss,
+						    &data)) {
+				/* We installed the pte successfully; move on */
+				progress++;
+				continue;
+			}
+			/* PTE changed.  Retry this pte (falls through) */
+		}
+
 		if (pte_none(*src_pte)) {
 			progress++;
 			continue;
@@ -882,8 +1031,19 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
 			return -ENOMEM;
 		break;
-	default:
+	case COPY_MM_BREAK_COW:
+		/* Do accounting onto parent mm directly */
+		ret = page_duplicate(src_mm, vma, addr, &data);
+		if (ret)
+			return ret;
+		goto again_break_cow;
+	case COPY_MM_DONE:
+		/* This means we're all good. */
 		break;
+	default:
+		/* This should mean copy_ret < 0.  Time to fail this fork().. */
+		WARN_ON_ONCE(copy_ret >= 0);
+		return copy_ret;
 	}
 
 	if (addr != end)
-- 
2.26.2



^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-21 21:17 [PATCH 0/5] mm: Break COW for pinned pages during fork() Peter Xu
                   ` (3 preceding siblings ...)
  2020-09-21 21:20 ` [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes Peter Xu
@ 2020-09-21 21:20 ` Peter Xu
  2020-09-22  6:41   ` John Hubbard
  2020-09-22 12:05   ` Jason Gunthorpe
  2020-09-23 10:21 ` [PATCH 0/5] mm: Break COW for pinned pages during fork() Leon Romanovsky
  5 siblings, 2 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-21 21:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: peterx, Linus Torvalds, Michal Hocko, Kirill Shutemov, Jann Horn,
	Oleg Nesterov, Kirill Tkhai, Hugh Dickins, Leon Romanovsky,
	Jan Kara, John Hubbard, Christoph Hellwig, Andrew Morton,
	Jason Gunthorpe, Andrea Arcangeli

Pinned pages shouldn't be write-protected when fork() happens, because follow
up copy-on-write on these pages could cause the pinned pages to be replaced by
random newly allocated pages.

For huge PMDs, we split the huge pmd if pinning is detected.  So that future
handling will be done by the PTE level (with our latest changes, each of the
small pages will be copied).  We can achieve this by let copy_huge_pmd() return
-EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and
finally land the next copy_pte_range() call.

Huge PUDs will be even more special - so far it does not support anonymous
pages.  But it can actually be done the same as the huge PMDs even if the split
huge PUDs means to erase the PUD entries.  It'll guarantee the follow up fault
ins will remap the same pages in either parent/child later.

This might not be the most efficient way, but it should be easy and clean
enough.  It should be fine, since we're tackling with a very rare case just to
make sure userspaces that pinned some thps will still work even without
MADV_DONTFORK and after they fork()ed.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/huge_memory.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7ff29cc3d55c..c40aac0ad87e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 	src_page = pmd_page(pmd);
 	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
+
+	/*
+	 * If this page is a potentially pinned page, split and retry the fault
+	 * with smaller page size.  Normally this should not happen because the
+	 * userspace should use MADV_DONTFORK upon pinned regions.  This is a
+	 * best effort that the pinned pages won't be replaced by another
+	 * random page during the coming copy-on-write.
+	 */
+	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
+		     page_maybe_dma_pinned(src_page))) {
+		pte_free(dst_mm, pgtable);
+		spin_unlock(src_ptl);
+		spin_unlock(dst_ptl);
+		__split_huge_pmd(vma, src_pmd, addr, false, NULL);
+		return -EAGAIN;
+	}
+
 	get_page(src_page);
 	page_dup_rmap(src_page, true);
 	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
@@ -1177,6 +1194,15 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		/* No huge zero pud yet */
 	}
 
+	/* Please refer to comments in copy_huge_pmd() */
+	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
+		     page_maybe_dma_pinned(pud_page(pud)))) {
+		spin_unlock(src_ptl);
+		spin_unlock(dst_ptl);
+		__split_huge_pud(vma, src_pud, addr);
+		return -EAGAIN;
+	}
+
 	pudp_set_wrprotect(src_mm, addr, src_pud);
 	pud = pud_mkold(pud_wrprotect(pud));
 	set_pud_at(dst_mm, addr, dst_pud, pud);
-- 
2.26.2



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-21 21:17 ` [PATCH 1/5] mm: Introduce mm_struct.has_pinned Peter Xu
@ 2020-09-21 21:43   ` Jann Horn
  2020-09-21 22:30     ` Peter Xu
  2020-09-21 23:53   ` John Hubbard
  2020-09-27  0:41   ` [mm] 698ac7610f: will-it-scale.per_thread_ops 8.2% improvement kernel test robot
  2 siblings, 1 reply; 110+ messages in thread
From: Jann Horn @ 2020-09-21 21:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linux-MM, kernel list, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds

On Mon, Sep 21, 2020 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
>
> (Commit message collected from Jason Gunthorpe)
>
> Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
> track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> that have never been passed to pin_user_pages() cannot have a positive
> page_maybe_dma_pinned() by definition.

There are some caveats here, right? E.g. this isn't necessarily true
for pagecache pages, I think?


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-21 21:20 ` [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes Peter Xu
@ 2020-09-21 21:55   ` Jann Horn
  2020-09-21 22:18     ` John Hubbard
  2020-09-21 22:27     ` Peter Xu
  2020-09-22 11:48   ` Oleg Nesterov
  2020-09-24 11:48   ` Kirill Tkhai
  2 siblings, 2 replies; 110+ messages in thread
From: Jann Horn @ 2020-09-21 21:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linux-MM, kernel list, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Oleg Nesterov, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Mon, Sep 21, 2020 at 11:20 PM Peter Xu <peterx@redhat.com> wrote:
> This patch is greatly inspired by the discussions on the list from Linus, Jason
> Gunthorpe and others [1].
>
> It allows copy_pte_range() to do early cow if the pages were pinned on the
> source mm.  Currently we don't have an accurate way to know whether a page is
> pinned or not.  The only thing we have is page_maybe_dma_pinned().  However
> that's good enough for now.  Especially, with the newly added mm->has_pinned
> flag to make sure we won't affect processes that never pinned any pages.

To clarify: This patch only handles pin_user_pages() callers and
doesn't try to address other GUP users, right? E.g. if task A uses
process_vm_write() on task B while task B is going through fork(),
that can still race in such a way that the written data only shows up
in the child and not in B, right?

I dislike the whole pin_user_pages() concept because (as far as I
understand) it fundamentally tries to fix a problem in the subset of
cases that are more likely to occur in practice (long-term pins
overlapping with things like writeback), and ignores the rarer cases
("short-term" GUP).


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-21 21:55   ` Jann Horn
@ 2020-09-21 22:18     ` John Hubbard
  2020-09-21 22:27       ` Jann Horn
  2020-09-21 22:27     ` Peter Xu
  1 sibling, 1 reply; 110+ messages in thread
From: John Hubbard @ 2020-09-21 22:18 UTC (permalink / raw)
  To: Jann Horn, Peter Xu
  Cc: Linux-MM, kernel list, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Oleg Nesterov, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, Christoph Hellwig, Andrew Morton,
	Jason Gunthorpe, Andrea Arcangeli

On 9/21/20 2:55 PM, Jann Horn wrote:
> On Mon, Sep 21, 2020 at 11:20 PM Peter Xu <peterx@redhat.com> wrote:
...
> I dislike the whole pin_user_pages() concept because (as far as I
> understand) it fundamentally tries to fix a problem in the subset of
> cases that are more likely to occur in practice (long-term pins
> overlapping with things like writeback), and ignores the rarer cases
> ("short-term" GUP).
> 

Well, no, that's not really fair. pin_user_pages() provides a key
prerequisite to fixing *all* of the bugs in that area, not just a
subset. The 5 cases in Documentation/core-api/pin_user_pages.rst cover
this pretty well. Or if they don't, let me know and I'll have another
pass at it.

The case for a "pin count" that is (logically) separate from a
page->_refcount is real, and it fixes real problems. An elevated
refcount can be caused by a lot of things, but it can normally be waited
for and/or retried. The FOLL_PIN pages cannot.

Of course, a valid remaining criticism of the situation is, "why not
just *always* mark any of these pages as "dma-pinned"? In other words,
why even have a separate gup/pup API? And in fact, perhaps eventually
we'll just get rid of the get_user_pages*() side of it. But the pin
count will need to remain, in order to discern between DMA pins and
temporary refcount boosts.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-21 21:55   ` Jann Horn
  2020-09-21 22:18     ` John Hubbard
@ 2020-09-21 22:27     ` Peter Xu
  1 sibling, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-21 22:27 UTC (permalink / raw)
  To: Jann Horn
  Cc: Linux-MM, kernel list, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Oleg Nesterov, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

Hi, Jann,

On Mon, Sep 21, 2020 at 11:55:06PM +0200, Jann Horn wrote:
> On Mon, Sep 21, 2020 at 11:20 PM Peter Xu <peterx@redhat.com> wrote:
> > This patch is greatly inspired by the discussions on the list from Linus, Jason
> > Gunthorpe and others [1].
> >
> > It allows copy_pte_range() to do early cow if the pages were pinned on the
> > source mm.  Currently we don't have an accurate way to know whether a page is
> > pinned or not.  The only thing we have is page_maybe_dma_pinned().  However
> > that's good enough for now.  Especially, with the newly added mm->has_pinned
> > flag to make sure we won't affect processes that never pinned any pages.
> 
> To clarify: This patch only handles pin_user_pages() callers and
> doesn't try to address other GUP users, right? E.g. if task A uses
> process_vm_write() on task B while task B is going through fork(),
> that can still race in such a way that the written data only shows up
> in the child and not in B, right?

I saw that process_vm_write() is using pin_user_pages_remote(), so I think
after this patch applied the data will only be written to B but not the child.
Because when B fork() with these temp pinned pages, it will copy the pages
rather than write-protect them any more.  IIUC the child could still have
partial data, but at last (after unpinned) B should always have the complete
data set.

> 
> I dislike the whole pin_user_pages() concept because (as far as I
> understand) it fundamentally tries to fix a problem in the subset of
> cases that are more likely to occur in practice (long-term pins
> overlapping with things like writeback), and ignores the rarer cases
> ("short-term" GUP).

John/Jason or others may be better on commenting on this one.  From my own
understanding, I thought it was the right thing to do so that we'll always
guarantee process B gets the whole data.  From that pov this patch should make
sense even for short term gups.  But maybe I've missed something.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-21 22:18     ` John Hubbard
@ 2020-09-21 22:27       ` Jann Horn
  2020-09-22  0:08         ` John Hubbard
  0 siblings, 1 reply; 110+ messages in thread
From: Jann Horn @ 2020-09-21 22:27 UTC (permalink / raw)
  To: John Hubbard
  Cc: Peter Xu, Linux-MM, kernel list, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Oleg Nesterov, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, Christoph Hellwig, Andrew Morton,
	Jason Gunthorpe, Andrea Arcangeli

On Tue, Sep 22, 2020 at 12:18 AM John Hubbard <jhubbard@nvidia.com> wrote:
> On 9/21/20 2:55 PM, Jann Horn wrote:
> > On Mon, Sep 21, 2020 at 11:20 PM Peter Xu <peterx@redhat.com> wrote:
> ...
> > I dislike the whole pin_user_pages() concept because (as far as I
> > understand) it fundamentally tries to fix a problem in the subset of
> > cases that are more likely to occur in practice (long-term pins
> > overlapping with things like writeback), and ignores the rarer cases
> > ("short-term" GUP).
> >
>
> Well, no, that's not really fair. pin_user_pages() provides a key
> prerequisite to fixing *all* of the bugs in that area, not just a
> subset. The 5 cases in Documentation/core-api/pin_user_pages.rst cover
> this pretty well. Or if they don't, let me know and I'll have another
> pass at it.
>
> The case for a "pin count" that is (logically) separate from a
> page->_refcount is real, and it fixes real problems. An elevated
> refcount can be caused by a lot of things, but it can normally be waited
> for and/or retried. The FOLL_PIN pages cannot.
>
> Of course, a valid remaining criticism of the situation is, "why not
> just *always* mark any of these pages as "dma-pinned"? In other words,
> why even have a separate gup/pup API? And in fact, perhaps eventually
> we'll just get rid of the get_user_pages*() side of it. But the pin
> count will need to remain, in order to discern between DMA pins and
> temporary refcount boosts.

Ah... the documentation you linked implies that FOLL_WRITE should more
or less imply FOLL_PIN? I didn't realize that.

Whoops, and actually, process_vm_writev() does use FOLL_PIN
already, and I just grepped the code the wrong way.

Thanks for the enlightenment; I take back everything I said.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-21 21:43   ` Jann Horn
@ 2020-09-21 22:30     ` Peter Xu
  2020-09-21 22:47       ` Jann Horn
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-21 22:30 UTC (permalink / raw)
  To: Jann Horn
  Cc: Linux-MM, kernel list, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds

On Mon, Sep 21, 2020 at 11:43:38PM +0200, Jann Horn wrote:
> On Mon, Sep 21, 2020 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > (Commit message collected from Jason Gunthorpe)
> >
> > Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
> > track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> > that have never been passed to pin_user_pages() cannot have a positive
> > page_maybe_dma_pinned() by definition.
> 
> There are some caveats here, right? E.g. this isn't necessarily true
> for pagecache pages, I think?

Sorry I didn't follow here.  Could you help explain with some details?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-21 22:30     ` Peter Xu
@ 2020-09-21 22:47       ` Jann Horn
  2020-09-22 11:54         ` Jason Gunthorpe
  0 siblings, 1 reply; 110+ messages in thread
From: Jann Horn @ 2020-09-21 22:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linux-MM, kernel list, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds

On Tue, Sep 22, 2020 at 12:30 AM Peter Xu <peterx@redhat.com> wrote:
> On Mon, Sep 21, 2020 at 11:43:38PM +0200, Jann Horn wrote:
> > On Mon, Sep 21, 2020 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
> > > (Commit message collected from Jason Gunthorpe)
> > >
> > > Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
> > > track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> > > that have never been passed to pin_user_pages() cannot have a positive
> > > page_maybe_dma_pinned() by definition.
> >
> > There are some caveats here, right? E.g. this isn't necessarily true
> > for pagecache pages, I think?
>
> Sorry I didn't follow here.  Could you help explain with some details?

The commit message says "mm_structs that have never been passed to
pin_user_pages() cannot have a positive page_maybe_dma_pinned() by
definition"; but that is not true for pages which may also be mapped
in a second mm and may have been passed to pin_user_pages() through
that second mm (meaning they must be writable over there and not
shared with us via CoW).

For example:

Process A:

fd_a = open("/foo/bar", O_RDWR);
mapping_a = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, fd_a, 0);
pin_user_pages(mapping_a, 1, ...);

Process B:

fd_b = open("/foo/bar", O_RDONLY);
mapping_b = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd_b, 0);
*(volatile char *)mapping_b;

At this point, process B has never called pin_user_pages(), but
page_maybe_dma_pinned() on the page at mapping_b would return true.


I don't think this is a problem for the use of page_maybe_dma_pinned()
in fork(), but I do think that the commit message is not entirely
correct.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-21 21:17 ` [PATCH 1/5] mm: Introduce mm_struct.has_pinned Peter Xu
  2020-09-21 21:43   ` Jann Horn
@ 2020-09-21 23:53   ` John Hubbard
  2020-09-22  0:01     ` John Hubbard
  2020-09-22 15:17     ` Peter Xu
  2020-09-27  0:41   ` [mm] 698ac7610f: will-it-scale.per_thread_ops 8.2% improvement kernel test robot
  2 siblings, 2 replies; 110+ messages in thread
From: John Hubbard @ 2020-09-21 23:53 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Jason Gunthorpe, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Linus Torvalds,
	Jann Horn

On 9/21/20 2:17 PM, Peter Xu wrote:
> (Commit message collected from Jason Gunthorpe)
> 
> Reduce the chance of false positive from page_maybe_dma_pinned() by keeping

Not yet, it doesn't. :)  More:

> track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> that have never been passed to pin_user_pages() cannot have a positive
> page_maybe_dma_pinned() by definition. This allows cases that might drive up
> the page ref_count to avoid any penalty from handling dma_pinned pages.
> 
> Due to complexities with unpining this trivial version is a permanent sticky
> bit, future work will be needed to make this a counter.

How about this instead:

Subsequent patches intend to reduce the chance of false positives from
page_maybe_dma_pinned(), by also considering whether or not a page has
even been part of an mm struct that has ever had pin_user_pages*()
applied to any of its pages.

In order to allow that, provide a boolean value (even though it's not
implemented exactly as a boolean type) within the mm struct, that is
simply set once and never cleared. This will suffice for an early, rough
implementation that fixes a few problems.

Future work is planned, to provide a more sophisticated solution, likely
involving a counter, and *not* involving something that is set and never
cleared.

> 
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   include/linux/mm_types.h | 10 ++++++++++
>   kernel/fork.c            |  1 +
>   mm/gup.c                 |  6 ++++++
>   3 files changed, 17 insertions(+)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 496c3ff97cce..6f291f8b74c6 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -441,6 +441,16 @@ struct mm_struct {
>   #endif
>   		int map_count;			/* number of VMAs */
>   
> +		/**
> +		 * @has_pinned: Whether this mm has pinned any pages.  This can
> +		 * be either replaced in the future by @pinned_vm when it
> +		 * becomes stable, or grow into a counter on its own. We're
> +		 * aggresive on this bit now - even if the pinned pages were
> +		 * unpinned later on, we'll still keep this bit set for the
> +		 * lifecycle of this mm just for simplicity.
> +		 */
> +		int has_pinned;

I think this would be elegant as an atomic_t, and using atomic_set() and
atomic_read(), which seem even more self-documenting that what you have here.

But it's admittedly a cosmetic point, combined with my perennial fear that
I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :)

It's completely OK to just ignore this comment, but I didn't want to completely
miss the opportunity to make it a tiny bit cleaner to the reader.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-21 23:53   ` John Hubbard
@ 2020-09-22  0:01     ` John Hubbard
  2020-09-22 15:17     ` Peter Xu
  1 sibling, 0 replies; 110+ messages in thread
From: John Hubbard @ 2020-09-22  0:01 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Jason Gunthorpe, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Linus Torvalds,
	Jann Horn

On 9/21/20 4:53 PM, John Hubbard wrote:
> On 9/21/20 2:17 PM, Peter Xu wrote:
>> (Commit message collected from Jason Gunthorpe)
>>
>> Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
> 
> Not yet, it doesn't. :)  More:
> 
>> track if the mm_struct has ever been used with pin_user_pages(). mm_structs
>> that have never been passed to pin_user_pages() cannot have a positive
>> page_maybe_dma_pinned() by definition. This allows cases that might drive up
>> the page ref_count to avoid any penalty from handling dma_pinned pages.
>>
>> Due to complexities with unpining this trivial version is a permanent sticky
>> bit, future work will be needed to make this a counter.
> 
> How about this instead:
> 
> Subsequent patches intend to reduce the chance of false positives from
> page_maybe_dma_pinned(), by also considering whether or not a page has
> even been part of an mm struct that has ever had pin_user_pages*()


arggh, correction: please make that:

     "...whether or not a page is part of an mm struct that...".

(Present tense.) Otherwise, people start wondering about the checkered past
of a page's past lives, and it badly distracts from the main point here. :)


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-21 22:27       ` Jann Horn
@ 2020-09-22  0:08         ` John Hubbard
  0 siblings, 0 replies; 110+ messages in thread
From: John Hubbard @ 2020-09-22  0:08 UTC (permalink / raw)
  To: Jann Horn
  Cc: Peter Xu, Linux-MM, kernel list, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Oleg Nesterov, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, Christoph Hellwig, Andrew Morton,
	Jason Gunthorpe, Andrea Arcangeli

On 9/21/20 3:27 PM, Jann Horn wrote:
> On Tue, Sep 22, 2020 at 12:18 AM John Hubbard <jhubbard@nvidia.com> wrote:
>> On 9/21/20 2:55 PM, Jann Horn wrote:
>>> On Mon, Sep 21, 2020 at 11:20 PM Peter Xu <peterx@redhat.com> wrote:
>> ...
> Ah... the documentation you linked implies that FOLL_WRITE should more
> or less imply FOLL_PIN? I didn't realize that.
> 

hmmm, that does seem like a pretty close approximation. It's certainly
true that if we were only doing reads, and also never marking pages
dirty, that the file system writeback code would be OK.

For completeness we should add: even just reading a page is still a
problem, if one also marks the page as dirty (which is inconsistent and
wrong, but still). That's because the file system code can then break,
during writeback in particular.


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-21 21:20 ` [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork() Peter Xu
@ 2020-09-22  6:41   ` John Hubbard
  2020-09-22 10:33     ` Jan Kara
  2020-09-23 16:06     ` Peter Xu
  2020-09-22 12:05   ` Jason Gunthorpe
  1 sibling, 2 replies; 110+ messages in thread
From: John Hubbard @ 2020-09-22  6:41 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Linus Torvalds, Michal Hocko, Kirill Shutemov, Jann Horn,
	Oleg Nesterov, Kirill Tkhai, Hugh Dickins, Leon Romanovsky,
	Jan Kara, Christoph Hellwig, Andrew Morton, Jason Gunthorpe,
	Andrea Arcangeli

On 9/21/20 2:20 PM, Peter Xu wrote:
...
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7ff29cc3d55c..c40aac0ad87e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>   
>   	src_page = pmd_page(pmd);
>   	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
> +
> +	/*
> +	 * If this page is a potentially pinned page, split and retry the fault
> +	 * with smaller page size.  Normally this should not happen because the
> +	 * userspace should use MADV_DONTFORK upon pinned regions.  This is a
> +	 * best effort that the pinned pages won't be replaced by another
> +	 * random page during the coming copy-on-write.
> +	 */
> +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> +		     page_maybe_dma_pinned(src_page))) {

This condition would make a good static inline function. It's used in 3 places,
and the condition is quite special and worth documenting, and having a separate
function helps with that, because the function name adds to the story. I'd suggest
approximately:

     page_likely_dma_pinned()

for the name.

> +		pte_free(dst_mm, pgtable);
> +		spin_unlock(src_ptl);
> +		spin_unlock(dst_ptl);
> +		__split_huge_pmd(vma, src_pmd, addr, false, NULL);
> +		return -EAGAIN;
> +	}


Why wait until we are so deep into this routine to detect this and unwind?
It seems like if you could do a check near the beginning of this routine, and
handle it there, with less unwinding? In fact, after taking only the src_ptl,
the check could be made, right?


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-21 21:17 ` [PATCH 3/5] mm: Rework return value for copy_one_pte() Peter Xu
@ 2020-09-22  7:11   ` John Hubbard
  2020-09-22 15:29     ` Peter Xu
  2020-09-22 10:08   ` Oleg Nesterov
  2020-09-23 17:16   ` Linus Torvalds
  2 siblings, 1 reply; 110+ messages in thread
From: John Hubbard @ 2020-09-22  7:11 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Jason Gunthorpe, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Linus Torvalds,
	Jann Horn

On 9/21/20 2:17 PM, Peter Xu wrote:
> There's one special path for copy_one_pte() with swap entries, in which
> add_swap_count_continuation(GFP_ATOMIC) might fail.  In that case we'll return

I might be looking at the wrong place, but the existing code seems to call
add_swap_count_continuation(GFP_KERNEL), not with GFP_ATOMIC?

> the swp_entry_t so that the caller will release the locks and redo the same
> thing with GFP_KERNEL.
> 
> It's confusing when copy_one_pte() must return a swp_entry_t (even if all the
> ptes are non-swap entries).  More importantly, we face other requirement to
> extend this "we need to do something else, but without the locks held" case.
> 
> Rework the return value into something easier to understand, as defined in enum
> copy_mm_ret.  We'll pass the swp_entry_t back using the newly introduced union

I like the documentation here, but it doesn't match what you did in the patch.
Actually, the documentation had the right idea (enum, rather than #define, for
COPY_MM_* items). Below...

> copy_mm_data parameter.
> 
> Another trivial change is to move the reset of the "progress" counter into the
> retry path, so that we'll reset it for other reasons too.
> 
> This should prepare us with adding new return codes, very soon.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   mm/memory.c | 42 +++++++++++++++++++++++++++++-------------
>   1 file changed, 29 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 7525147908c4..1530bb1070f4 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -689,16 +689,24 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>   }
>   #endif
>   
> +#define  COPY_MM_DONE               0
> +#define  COPY_MM_SWAP_CONT          1

Those should be enums, so as to get a little type safety and other goodness from
using non-macro items.

...
> @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>   	pte_unmap_unlock(orig_dst_pte, dst_ptl);
>   	cond_resched();
>   
> -	if (entry.val) {
> -		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> +	switch (copy_ret) {
> +	case COPY_MM_SWAP_CONT:
> +		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
>   			return -ENOMEM;
> -		progress = 0;

Yes. Definitely a little cleaner to reset this above, instead of here.

> +		break;
> +	default:
> +		break;

I assume this no-op noise is to placate the compiler and/or static checkers. :)

I'm unable to find any actual problems with the diffs, aside from the nit about
using an enum.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-21 21:17 ` [PATCH 3/5] mm: Rework return value for copy_one_pte() Peter Xu
  2020-09-22  7:11   ` John Hubbard
@ 2020-09-22 10:08   ` Oleg Nesterov
  2020-09-22 10:18     ` Oleg Nesterov
  2020-09-23 17:16   ` Linus Torvalds
  2 siblings, 1 reply; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-22 10:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On 09/21, Peter Xu wrote:
>
> @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	pte_unmap_unlock(orig_dst_pte, dst_ptl);
>  	cond_resched();
>  
> -	if (entry.val) {
> -		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> +	switch (copy_ret) {
> +	case COPY_MM_SWAP_CONT:
> +		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
>  			return -ENOMEM;
> -		progress = 0;
> +		break;

Note that you didn't clear copy_ret, it is still COPY_MM_SWAP_CONT,

> +	default:
> +		break;
>  	}
> +
>  	if (addr != end)
>  		goto again;

After that the main loop can stop again because of need_resched(), and
in this case add_swap_count_continuation(data.entry) will be called again?

Oleg.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22 10:08   ` Oleg Nesterov
@ 2020-09-22 10:18     ` Oleg Nesterov
  2020-09-22 15:36       ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-22 10:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On 09/22, Oleg Nesterov wrote:
>
> On 09/21, Peter Xu wrote:
> >
> > @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >  	pte_unmap_unlock(orig_dst_pte, dst_ptl);
> >  	cond_resched();
> >
> > -	if (entry.val) {
> > -		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> > +	switch (copy_ret) {
> > +	case COPY_MM_SWAP_CONT:
> > +		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
> >  			return -ENOMEM;
> > -		progress = 0;
> > +		break;
>
> Note that you didn't clear copy_ret, it is still COPY_MM_SWAP_CONT,
>
> > +	default:
> > +		break;
> >  	}
> > +
> >  	if (addr != end)
> >  		goto again;
>
> After that the main loop can stop again because of need_resched(), and
> in this case add_swap_count_continuation(data.entry) will be called again?

No, this is not possible, copy_one_pte() should be called at least once,
progress = 0 before restart. Sorry for noise.

Oleg.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-22  6:41   ` John Hubbard
@ 2020-09-22 10:33     ` Jan Kara
  2020-09-22 20:01       ` John Hubbard
  2020-09-23 16:06     ` Peter Xu
  1 sibling, 1 reply; 110+ messages in thread
From: Jan Kara @ 2020-09-22 10:33 UTC (permalink / raw)
  To: John Hubbard
  Cc: Peter Xu, linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Oleg Nesterov, Kirill Tkhai,
	Hugh Dickins, Leon Romanovsky, Jan Kara, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Mon 21-09-20 23:41:16, John Hubbard wrote:
> On 9/21/20 2:20 PM, Peter Xu wrote:
> ...
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 7ff29cc3d55c..c40aac0ad87e 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >   	src_page = pmd_page(pmd);
> >   	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
> > +
> > +	/*
> > +	 * If this page is a potentially pinned page, split and retry the fault
> > +	 * with smaller page size.  Normally this should not happen because the
> > +	 * userspace should use MADV_DONTFORK upon pinned regions.  This is a
> > +	 * best effort that the pinned pages won't be replaced by another
> > +	 * random page during the coming copy-on-write.
> > +	 */
> > +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> > +		     page_maybe_dma_pinned(src_page))) {
> 
> This condition would make a good static inline function. It's used in 3
> places, and the condition is quite special and worth documenting, and
> having a separate function helps with that, because the function name
> adds to the story. I'd suggest approximately:
> 
>     page_likely_dma_pinned()
> 
> for the name.

Well, but we should also capture that this really only works for anonymous
pages. For file pages mm->has_pinned does not work because the page may be
still pinned by completely unrelated process as Jann already properly
pointed out earlier in the thread. So maybe anon_page_likely_pinned()?
Possibly also assert PageAnon(page) in it if we want to be paranoid...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-21 21:20 ` [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes Peter Xu
  2020-09-21 21:55   ` Jann Horn
@ 2020-09-22 11:48   ` Oleg Nesterov
  2020-09-22 12:40     ` Oleg Nesterov
  2020-09-24 11:48   ` Kirill Tkhai
  2 siblings, 1 reply; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-22 11:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On 09/21, Peter Xu wrote:
>
> @@ -859,6 +989,25 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  			    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
>  				break;
>  		}
> +
> +		if (unlikely(data.cow_new_page)) {
> +			/*
> +			 * If cow_new_page set, we must be at the 2nd round of
> +			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
> +			 * page now.  Note that in all cases page_break_cow()
> +			 * will properly release the objects in copy_mm_data.
> +			 */
> +			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
> +			if (pte_install_copied_page(dst_mm, new, src_pte,
> +						    dst_pte, addr, rss,
> +						    &data)) {
> +				/* We installed the pte successfully; move on */
> +				progress++;
> +				continue;

I'm afraid I misread this patch too ;)

But it seems to me in this case the main loop can really "leak"
COPY_MM_BREAK_COW. Suppose the the next 31 pte's are pte_none() and
need_resched() is true.

No?

Oleg.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-21 22:47       ` Jann Horn
@ 2020-09-22 11:54         ` Jason Gunthorpe
  2020-09-22 14:28           ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 11:54 UTC (permalink / raw)
  To: Jann Horn
  Cc: Peter Xu, Linux-MM, kernel list, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds

On Tue, Sep 22, 2020 at 12:47:11AM +0200, Jann Horn wrote:
> On Tue, Sep 22, 2020 at 12:30 AM Peter Xu <peterx@redhat.com> wrote:
> > On Mon, Sep 21, 2020 at 11:43:38PM +0200, Jann Horn wrote:
> > > On Mon, Sep 21, 2020 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
> > > > (Commit message collected from Jason Gunthorpe)
> > > >
> > > > Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
> > > > track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> > > > that have never been passed to pin_user_pages() cannot have a positive
> > > > page_maybe_dma_pinned() by definition.
> > >
> > > There are some caveats here, right? E.g. this isn't necessarily true
> > > for pagecache pages, I think?
> >
> > Sorry I didn't follow here.  Could you help explain with some details?
> 
> The commit message says "mm_structs that have never been passed to
> pin_user_pages() cannot have a positive page_maybe_dma_pinned() by
> definition"; but that is not true for pages which may also be mapped
> in a second mm and may have been passed to pin_user_pages() through
> that second mm (meaning they must be writable over there and not
> shared with us via CoW).

The message does need a few more words to explain this trick can only
be used with COW'able pages.
 
> Process A:
> 
> fd_a = open("/foo/bar", O_RDWR);
> mapping_a = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, fd_a, 0);
> pin_user_pages(mapping_a, 1, ...);
> 
> Process B:
> 
> fd_b = open("/foo/bar", O_RDONLY);
> mapping_b = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd_b, 0);
> *(volatile char *)mapping_b;
> 
> At this point, process B has never called pin_user_pages(), but
> page_maybe_dma_pinned() on the page at mapping_b would return true.

My expectation is the pin_user_pages() should have already broken the
COW for the MAP_PRIVATE, so process B should not have a
page_maybe_dma_pinned()

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-21 21:20 ` [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork() Peter Xu
  2020-09-22  6:41   ` John Hubbard
@ 2020-09-22 12:05   ` Jason Gunthorpe
  2020-09-23 15:24     ` Peter Xu
  1 sibling, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 12:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Oleg Nesterov, Kirill Tkhai,
	Hugh Dickins, Leon Romanovsky, Jan Kara, John Hubbard,
	Christoph Hellwig, Andrew Morton, Andrea Arcangeli

On Mon, Sep 21, 2020 at 05:20:31PM -0400, Peter Xu wrote:
> Pinned pages shouldn't be write-protected when fork() happens, because follow
> up copy-on-write on these pages could cause the pinned pages to be replaced by
> random newly allocated pages.
> 
> For huge PMDs, we split the huge pmd if pinning is detected.  So that future
> handling will be done by the PTE level (with our latest changes, each of the
> small pages will be copied).  We can achieve this by let copy_huge_pmd() return
> -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and
> finally land the next copy_pte_range() call.
> 
> Huge PUDs will be even more special - so far it does not support anonymous
> pages.  But it can actually be done the same as the huge PMDs even if the split
> huge PUDs means to erase the PUD entries.  It'll guarantee the follow up fault
> ins will remap the same pages in either parent/child later.
> 
> This might not be the most efficient way, but it should be easy and clean
> enough.  It should be fine, since we're tackling with a very rare case just to
> make sure userspaces that pinned some thps will still work even without
> MADV_DONTFORK and after they fork()ed.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
>  mm/huge_memory.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7ff29cc3d55c..c40aac0ad87e 100644
> +++ b/mm/huge_memory.c
> @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  
>  	src_page = pmd_page(pmd);
>  	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
> +
> +	/*
> +	 * If this page is a potentially pinned page, split and retry the fault
> +	 * with smaller page size.  Normally this should not happen because the
> +	 * userspace should use MADV_DONTFORK upon pinned regions.  This is a
> +	 * best effort that the pinned pages won't be replaced by another
> +	 * random page during the coming copy-on-write.
> +	 */
> +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> +		     page_maybe_dma_pinned(src_page))) {
> +		pte_free(dst_mm, pgtable);
> +		spin_unlock(src_ptl);
> +		spin_unlock(dst_ptl);
> +		__split_huge_pmd(vma, src_pmd, addr, false, NULL);
> +		return -EAGAIN;
> +	}

Not sure why, but the PMD stuff here is not calling is_cow_mapping()
before doing the write protect. Seems like it might be an existing
bug?

In any event, the has_pinned logic shouldn't be used without also
checking is_cow_mapping(), so it should be added to that test. Same
remarks for PUD

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-22 11:48   ` Oleg Nesterov
@ 2020-09-22 12:40     ` Oleg Nesterov
  2020-09-22 15:58       ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-22 12:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On 09/22, Oleg Nesterov wrote:
>
> On 09/21, Peter Xu wrote:
> >
> > @@ -859,6 +989,25 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >  			    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
> >  				break;
> >  		}
> > +
> > +		if (unlikely(data.cow_new_page)) {
> > +			/*
> > +			 * If cow_new_page set, we must be at the 2nd round of
> > +			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
> > +			 * page now.  Note that in all cases page_break_cow()
> > +			 * will properly release the objects in copy_mm_data.
> > +			 */
> > +			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
> > +			if (pte_install_copied_page(dst_mm, new, src_pte,
> > +						    dst_pte, addr, rss,
> > +						    &data)) {
> > +				/* We installed the pte successfully; move on */
> > +				progress++;
> > +				continue;
>
> I'm afraid I misread this patch too ;)
>
> But it seems to me in this case the main loop can really "leak"
> COPY_MM_BREAK_COW. Suppose the the next 31 pte's are pte_none() and
> need_resched() is true.
>
> No?

If yes, perhaps we can simplify the copy_ret/cow_new_page logic and make
it more explicit?

Something like below, on top of this patch...

Oleg.


--- x/mm/memory.c
+++ x/mm/memory.c
@@ -704,17 +704,6 @@
 	};
 };
 
-static inline void page_release_cow(struct copy_mm_data *data)
-{
-	/* The old page should only be released in page_duplicate() */
-	WARN_ON_ONCE(data->cow_old_page);
-
-	if (data->cow_new_page) {
-		put_page(data->cow_new_page);
-		data->cow_new_page = NULL;
-	}
-}
-
 /*
  * Duplicate the page for this PTE.  Returns zero if page copied (so we need to
  * retry on the same PTE again to arm the copied page very soon), or negative
@@ -925,7 +914,7 @@
 
 	if (!pte_same(*src_pte, data->cow_oldpte)) {
 		/* PTE has changed under us.  Release the page and retry */
-		page_release_cow(data);
+		put_page(data->cow_new_page);
 		return false;
 	}
 
@@ -936,12 +925,6 @@
 	set_pte_at(dst_mm, addr, dst_pte, entry);
 	rss[mm_counter(new_page)]++;
 
-	/*
-	 * Manually clear the new page pointer since we've moved ownership to
-	 * the newly armed PTE.
-	 */
-	data->cow_new_page = NULL;
-
 	return true;
 }
 
@@ -958,16 +941,12 @@
 	struct copy_mm_data data;
 
 again:
-	/* We don't reset this for COPY_MM_BREAK_COW */
-	memset(&data, 0, sizeof(data));
-
-again_break_cow:
 	init_rss_vec(rss);
 
 	dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
 	if (!dst_pte) {
-		/* Guarantee that the new page is released if there is */
-		page_release_cow(&data);
+		if (unlikely(copy_ret == COPY_MM_BREAK_COW))
+			put_page(data.cow_new_page);
 		return -ENOMEM;
 	}
 	src_pte = pte_offset_map(src_pmd, addr);
@@ -978,6 +957,22 @@
 	arch_enter_lazy_mmu_mode();
 
 	progress = 0;
+	if (unlikely(copy_ret == COPY_MM_BREAK_COW)) {
+		/*
+		 * Note that in all cases pte_install_copied_page()
+		 * will properly release the objects in copy_mm_data.
+		 */
+		copy_ret = COPY_MM_DONE;
+		if (pte_install_copied_page(dst_mm, new, src_pte,
+					    dst_pte, addr, rss,
+					    &data)) {
+			/* We installed the pte successfully; move on */
+			progress++;
+			goto next;
+		}
+		/* PTE changed.  Retry this pte (falls through) */
+	}
+
 	do {
 		/*
 		 * We are holding two locks at this point - either of them
@@ -990,24 +985,6 @@
 				break;
 		}
 
-		if (unlikely(data.cow_new_page)) {
-			/*
-			 * If cow_new_page set, we must be at the 2nd round of
-			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
-			 * page now.  Note that in all cases page_break_cow()
-			 * will properly release the objects in copy_mm_data.
-			 */
-			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
-			if (pte_install_copied_page(dst_mm, new, src_pte,
-						    dst_pte, addr, rss,
-						    &data)) {
-				/* We installed the pte successfully; move on */
-				progress++;
-				continue;
-			}
-			/* PTE changed.  Retry this pte (falls through) */
-		}
-
 		if (pte_none(*src_pte)) {
 			progress++;
 			continue;
@@ -1017,6 +994,7 @@
 		if (copy_ret != COPY_MM_DONE)
 			break;
 		progress += 8;
+next:
 	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
 
 	arch_leave_lazy_mmu_mode();
@@ -1030,13 +1008,14 @@
 	case COPY_MM_SWAP_CONT:
 		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
 			return -ENOMEM;
-		break;
+		copy_ret = COPY_MM_DONE;
+		goto again;
 	case COPY_MM_BREAK_COW:
 		/* Do accounting onto parent mm directly */
 		ret = page_duplicate(src_mm, vma, addr, &data);
 		if (ret)
 			return ret;
-		goto again_break_cow;
+		goto again;
 	case COPY_MM_DONE:
 		/* This means we're all good. */
 		break;



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 11:54         ` Jason Gunthorpe
@ 2020-09-22 14:28           ` Peter Xu
  2020-09-22 15:56             ` Jason Gunthorpe
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-22 14:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jann Horn, Linux-MM, kernel list, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds

On Tue, Sep 22, 2020 at 08:54:36AM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 12:47:11AM +0200, Jann Horn wrote:
> > On Tue, Sep 22, 2020 at 12:30 AM Peter Xu <peterx@redhat.com> wrote:
> > > On Mon, Sep 21, 2020 at 11:43:38PM +0200, Jann Horn wrote:
> > > > On Mon, Sep 21, 2020 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
> > > > > (Commit message collected from Jason Gunthorpe)
> > > > >
> > > > > Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
> > > > > track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> > > > > that have never been passed to pin_user_pages() cannot have a positive
> > > > > page_maybe_dma_pinned() by definition.
> > > >
> > > > There are some caveats here, right? E.g. this isn't necessarily true
> > > > for pagecache pages, I think?
> > >
> > > Sorry I didn't follow here.  Could you help explain with some details?
> > 
> > The commit message says "mm_structs that have never been passed to
> > pin_user_pages() cannot have a positive page_maybe_dma_pinned() by
> > definition"; but that is not true for pages which may also be mapped
> > in a second mm and may have been passed to pin_user_pages() through
> > that second mm (meaning they must be writable over there and not
> > shared with us via CoW).
> 
> The message does need a few more words to explain this trick can only
> be used with COW'able pages.
>  
> > Process A:
> > 
> > fd_a = open("/foo/bar", O_RDWR);
> > mapping_a = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, fd_a, 0);
> > pin_user_pages(mapping_a, 1, ...);
> > 
> > Process B:
> > 
> > fd_b = open("/foo/bar", O_RDONLY);
> > mapping_b = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd_b, 0);
> > *(volatile char *)mapping_b;
> > 
> > At this point, process B has never called pin_user_pages(), but
> > page_maybe_dma_pinned() on the page at mapping_b would return true.
> 
> My expectation is the pin_user_pages() should have already broken the
> COW for the MAP_PRIVATE, so process B should not have a
> page_maybe_dma_pinned()

When process B maps with PROT_READ only (w/o PROT_WRITE) then it seems the same
page will be mapped.

I think I get the point from Jann now.  Maybe it's easier I just remove the
whole "mm_structs that have never been passed to pin_user_pages() cannot have a
positive page_maybe_dma_pinned() by definition" sentence if that's misleading,
because the rest seem to be clear enough on what this new field is used for.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-21 23:53   ` John Hubbard
  2020-09-22  0:01     ` John Hubbard
@ 2020-09-22 15:17     ` Peter Xu
  2020-09-22 16:10       ` Jason Gunthorpe
                         ` (2 more replies)
  1 sibling, 3 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-22 15:17 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Mon, Sep 21, 2020 at 04:53:38PM -0700, John Hubbard wrote:
> On 9/21/20 2:17 PM, Peter Xu wrote:
> > (Commit message collected from Jason Gunthorpe)
> > 
> > Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
> 
> Not yet, it doesn't. :)  More:
> 
> > track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> > that have never been passed to pin_user_pages() cannot have a positive
> > page_maybe_dma_pinned() by definition. This allows cases that might drive up
> > the page ref_count to avoid any penalty from handling dma_pinned pages.
> > 
> > Due to complexities with unpining this trivial version is a permanent sticky
> > bit, future work will be needed to make this a counter.
> 
> How about this instead:
> 
> Subsequent patches intend to reduce the chance of false positives from
> page_maybe_dma_pinned(), by also considering whether or not a page has
> even been part of an mm struct that has ever had pin_user_pages*()
> applied to any of its pages.
> 
> In order to allow that, provide a boolean value (even though it's not
> implemented exactly as a boolean type) within the mm struct, that is
> simply set once and never cleared. This will suffice for an early, rough
> implementation that fixes a few problems.
> 
> Future work is planned, to provide a more sophisticated solution, likely
> involving a counter, and *not* involving something that is set and never
> cleared.

This looks good, thanks.  Though I think Jason's version is good too (as long
as we remove the confusing sentence, that's the one starting with "mm_structs
that have never been passed... ").  Before I drop Jason's version, I think I'd
better figure out what's the major thing we missed so that maybe we can add
another paragraph.  E.g., "future work will be needed to make this a counter"
already means "involving a counter, and *not* involving something that is set
and never cleared" to me... Because otherwise it won't be called a counter..

> 
> > 
> > Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   include/linux/mm_types.h | 10 ++++++++++
> >   kernel/fork.c            |  1 +
> >   mm/gup.c                 |  6 ++++++
> >   3 files changed, 17 insertions(+)
> > 
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 496c3ff97cce..6f291f8b74c6 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -441,6 +441,16 @@ struct mm_struct {
> >   #endif
> >   		int map_count;			/* number of VMAs */
> > +		/**
> > +		 * @has_pinned: Whether this mm has pinned any pages.  This can
> > +		 * be either replaced in the future by @pinned_vm when it
> > +		 * becomes stable, or grow into a counter on its own. We're
> > +		 * aggresive on this bit now - even if the pinned pages were
> > +		 * unpinned later on, we'll still keep this bit set for the
> > +		 * lifecycle of this mm just for simplicity.
> > +		 */
> > +		int has_pinned;
> 
> I think this would be elegant as an atomic_t, and using atomic_set() and
> atomic_read(), which seem even more self-documenting that what you have here.
> 
> But it's admittedly a cosmetic point, combined with my perennial fear that
> I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :)

Yeah but I hope I'm using it right.. :) I used READ_ONCE/WRITE_ONCE explicitly
because I think they're cheaper than atomic operations, (which will, iiuc, lock
the bus).

> 
> It's completely OK to just ignore this comment, but I didn't want to completely
> miss the opportunity to make it a tiny bit cleaner to the reader.

This can always become an atomic in the future, or am I wrong?  Actually if
we're going to the counter way I feel like it's a must.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22  7:11   ` John Hubbard
@ 2020-09-22 15:29     ` Peter Xu
  0 siblings, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-22 15:29 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 12:11:29AM -0700, John Hubbard wrote:
> On 9/21/20 2:17 PM, Peter Xu wrote:
> > There's one special path for copy_one_pte() with swap entries, in which
> > add_swap_count_continuation(GFP_ATOMIC) might fail.  In that case we'll return
> 
> I might be looking at the wrong place, but the existing code seems to call
> add_swap_count_continuation(GFP_KERNEL), not with GFP_ATOMIC?

Ah, I wanted to reference the one in swap_duplicate().

> 
> > the swp_entry_t so that the caller will release the locks and redo the same
> > thing with GFP_KERNEL.
> > 
> > It's confusing when copy_one_pte() must return a swp_entry_t (even if all the
> > ptes are non-swap entries).  More importantly, we face other requirement to
> > extend this "we need to do something else, but without the locks held" case.
> > 
> > Rework the return value into something easier to understand, as defined in enum
> > copy_mm_ret.  We'll pass the swp_entry_t back using the newly introduced union
> 
> I like the documentation here, but it doesn't match what you did in the patch.
> Actually, the documentation had the right idea (enum, rather than #define, for
> COPY_MM_* items). Below...

Yeah actually my very initial version has it as an enum, then I changed it to
macros because I started to want it return negative as errors.  However funnily
in the current version copy_one_pte() won't return an error anymore... So
probably, yes, it should be a good idea to get the enum back.

Also we should be able to drop the negative handling too with copy_ret, though
it should be in the next patch.

> 
> > copy_mm_data parameter.
> > 
> > Another trivial change is to move the reset of the "progress" counter into the
> > retry path, so that we'll reset it for other reasons too.
> > 
> > This should prepare us with adding new return codes, very soon.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   mm/memory.c | 42 +++++++++++++++++++++++++++++-------------
> >   1 file changed, 29 insertions(+), 13 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 7525147908c4..1530bb1070f4 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -689,16 +689,24 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
> >   }
> >   #endif
> > +#define  COPY_MM_DONE               0
> > +#define  COPY_MM_SWAP_CONT          1
> 
> Those should be enums, so as to get a little type safety and other goodness from
> using non-macro items.
> 
> ...
> > @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >   	pte_unmap_unlock(orig_dst_pte, dst_ptl);
> >   	cond_resched();
> > -	if (entry.val) {
> > -		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> > +	switch (copy_ret) {
> > +	case COPY_MM_SWAP_CONT:
> > +		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
> >   			return -ENOMEM;
> > -		progress = 0;
> 
> Yes. Definitely a little cleaner to reset this above, instead of here.
> 
> > +		break;
> > +	default:
> > +		break;
> 
> I assume this no-op noise is to placate the compiler and/or static checkers. :)

This is (so far) for COPY_MM_DONE.  I normally will cover all cases in a
"switch()" and here "default" is for it.  Even if I covered all the
possibilities, I may still tend to keep one "default" and a WARN_ON_ONCE(1) to
make sure nothing I've missed.  Not sure whether that's the ideal way, though.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22 10:18     ` Oleg Nesterov
@ 2020-09-22 15:36       ` Peter Xu
  2020-09-22 15:48         ` Oleg Nesterov
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-22 15:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 12:18:16PM +0200, Oleg Nesterov wrote:
> On 09/22, Oleg Nesterov wrote:
> >
> > On 09/21, Peter Xu wrote:
> > >
> > > @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > >  	pte_unmap_unlock(orig_dst_pte, dst_ptl);
> > >  	cond_resched();
> > >
> > > -	if (entry.val) {
> > > -		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> > > +	switch (copy_ret) {
> > > +	case COPY_MM_SWAP_CONT:
> > > +		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
> > >  			return -ENOMEM;
> > > -		progress = 0;
> > > +		break;
> >
> > Note that you didn't clear copy_ret, it is still COPY_MM_SWAP_CONT,
> >
> > > +	default:
> > > +		break;
> > >  	}
> > > +
> > >  	if (addr != end)
> > >  		goto again;
> >
> > After that the main loop can stop again because of need_resched(), and
> > in this case add_swap_count_continuation(data.entry) will be called again?
> 
> No, this is not possible, copy_one_pte() should be called at least once,
> progress = 0 before restart. Sorry for noise.

Oh wait, I think you're right... when we get a COPY_MM_SWAP_CONT, goto "again",
then if there're 32 pte_none() ptes _plus_ an need_resched(), then we might
reach again at the same add_swap_count_continuation() with the same swp entry.

However since I didn't change this logic in this patch, it probably means this
bug is also in the original code before this series...  I'm thinking maybe I
should prepare a standalone patch to clear the swp_entry_t and cc stable.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22 15:36       ` Peter Xu
@ 2020-09-22 15:48         ` Oleg Nesterov
  2020-09-22 16:03           ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-22 15:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On 09/22, Peter Xu wrote:
>
> On Tue, Sep 22, 2020 at 12:18:16PM +0200, Oleg Nesterov wrote:
> > On 09/22, Oleg Nesterov wrote:
> > >
> > > On 09/21, Peter Xu wrote:
> > > >
> > > > @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > > >  	pte_unmap_unlock(orig_dst_pte, dst_ptl);
> > > >  	cond_resched();
> > > >
> > > > -	if (entry.val) {
> > > > -		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> > > > +	switch (copy_ret) {
> > > > +	case COPY_MM_SWAP_CONT:
> > > > +		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
> > > >  			return -ENOMEM;
> > > > -		progress = 0;
> > > > +		break;
> > >
> > > Note that you didn't clear copy_ret, it is still COPY_MM_SWAP_CONT,
> > >
> > > > +	default:
> > > > +		break;
> > > >  	}
> > > > +
> > > >  	if (addr != end)
> > > >  		goto again;
> > >
> > > After that the main loop can stop again because of need_resched(), and
> > > in this case add_swap_count_continuation(data.entry) will be called again?
> >
> > No, this is not possible, copy_one_pte() should be called at least once,
> > progress = 0 before restart. Sorry for noise.
>
> Oh wait, I think you're right... when we get a COPY_MM_SWAP_CONT, goto "again",
> then if there're 32 pte_none() ptes _plus_ an need_resched(), then we might
> reach again at the same add_swap_count_continuation() with the same swp entry.

Yes, please see my reply to 4/5 ;)

> However since I didn't change this logic in this patch, it probably means this
> bug is also in the original code before this series...  I'm thinking maybe I
> should prepare a standalone patch to clear the swp_entry_t and cc stable.

Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 0, then
pte_none(*src_pte) is not possible after restart? This means that copy_one_pte()
will be called at least once.

So _think_ that the current code is fine, but I can be easily wrong and I agree
this doesn't look clean.

Oleg.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 14:28           ` Peter Xu
@ 2020-09-22 15:56             ` Jason Gunthorpe
  2020-09-22 16:25               ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 15:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jann Horn, Linux-MM, kernel list, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds

On Tue, Sep 22, 2020 at 10:28:02AM -0400, Peter Xu wrote:
> On Tue, Sep 22, 2020 at 08:54:36AM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 22, 2020 at 12:47:11AM +0200, Jann Horn wrote:
> > > On Tue, Sep 22, 2020 at 12:30 AM Peter Xu <peterx@redhat.com> wrote:
> > > > On Mon, Sep 21, 2020 at 11:43:38PM +0200, Jann Horn wrote:
> > > > > On Mon, Sep 21, 2020 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
> > > > > > (Commit message collected from Jason Gunthorpe)
> > > > > >
> > > > > > Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
> > > > > > track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> > > > > > that have never been passed to pin_user_pages() cannot have a positive
> > > > > > page_maybe_dma_pinned() by definition.
> > > > >
> > > > > There are some caveats here, right? E.g. this isn't necessarily true
> > > > > for pagecache pages, I think?
> > > >
> > > > Sorry I didn't follow here.  Could you help explain with some details?
> > > 
> > > The commit message says "mm_structs that have never been passed to
> > > pin_user_pages() cannot have a positive page_maybe_dma_pinned() by
> > > definition"; but that is not true for pages which may also be mapped
> > > in a second mm and may have been passed to pin_user_pages() through
> > > that second mm (meaning they must be writable over there and not
> > > shared with us via CoW).
> > 
> > The message does need a few more words to explain this trick can only
> > be used with COW'able pages.
> >  
> > > Process A:
> > > 
> > > fd_a = open("/foo/bar", O_RDWR);
> > > mapping_a = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, fd_a, 0);
> > > pin_user_pages(mapping_a, 1, ...);
> > > 
> > > Process B:
> > > 
> > > fd_b = open("/foo/bar", O_RDONLY);
> > > mapping_b = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd_b, 0);
> > > *(volatile char *)mapping_b;
> > > 
> > > At this point, process B has never called pin_user_pages(), but
> > > page_maybe_dma_pinned() on the page at mapping_b would return true.
> > 
> > My expectation is the pin_user_pages() should have already broken the
> > COW for the MAP_PRIVATE, so process B should not have a
> > page_maybe_dma_pinned()
> 
> When process B maps with PROT_READ only (w/o PROT_WRITE) then it seems the same
> page will be mapped.

I thought MAP_PRIVATE without PROT_WRITE was nonsensical, it only has
meaning for writes initiated by the mapping. MAP_SHARED/PROT_READ is
the same behavior on Linux, IIRC.

But, yes, you certainly can end up with B having
page_maybe_dma_pinned() pages in shared VMA, just not in COW'able
mappings.

> I think I get the point from Jann now.  Maybe it's easier I just remove the
> whole "mm_structs that have never been passed to pin_user_pages() cannot have a
> positive page_maybe_dma_pinned() by definition" sentence if that's misleading,
> because the rest seem to be clear enough on what this new field is used for.

"for COW" I think is still the important detail here, see for instance
my remark on the PUD/PMD splitting where it is necessary to test for
cow before using this.

Perhaps we should call it "has_pinned_for_cow" to place emphasis on
this detail? Due to the shared pages issue It really doesn't have any
broader utility, eg for file back pages or otherwise.

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-22 12:40     ` Oleg Nesterov
@ 2020-09-22 15:58       ` Peter Xu
  2020-09-22 16:52         ` Oleg Nesterov
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-22 15:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Tue, Sep 22, 2020 at 02:40:14PM +0200, Oleg Nesterov wrote:
> On 09/22, Oleg Nesterov wrote:
> >
> > On 09/21, Peter Xu wrote:
> > >
> > > @@ -859,6 +989,25 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > >  			    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
> > >  				break;
> > >  		}
> > > +
> > > +		if (unlikely(data.cow_new_page)) {
> > > +			/*
> > > +			 * If cow_new_page set, we must be at the 2nd round of
> > > +			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
> > > +			 * page now.  Note that in all cases page_break_cow()
> > > +			 * will properly release the objects in copy_mm_data.
> > > +			 */
> > > +			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
> > > +			if (pte_install_copied_page(dst_mm, new, src_pte,
> > > +						    dst_pte, addr, rss,
> > > +						    &data)) {
> > > +				/* We installed the pte successfully; move on */
> > > +				progress++;
> > > +				continue;
> >
> > I'm afraid I misread this patch too ;)
> >
> > But it seems to me in this case the main loop can really "leak"
> > COPY_MM_BREAK_COW. Suppose the the next 31 pte's are pte_none() and
> > need_resched() is true.
> >
> > No?

I still think it's a no...

Note that now we'll reset "progress" every time before the do loop, so we'll
never reach need_resched() (since progress<32) before pte_install_copied_page()
when needed.

I explicitly put the pte_install_copied_page() into the loop just...

> 
> If yes, perhaps we can simplify the copy_ret/cow_new_page logic and make
> it more explicit?
> 
> Something like below, on top of this patch...
> 
> Oleg.
> 
> 
> --- x/mm/memory.c
> +++ x/mm/memory.c
> @@ -704,17 +704,6 @@
>  	};
>  };
>  
> -static inline void page_release_cow(struct copy_mm_data *data)
> -{
> -	/* The old page should only be released in page_duplicate() */
> -	WARN_ON_ONCE(data->cow_old_page);
> -
> -	if (data->cow_new_page) {
> -		put_page(data->cow_new_page);
> -		data->cow_new_page = NULL;
> -	}
> -}

(I'm not very sure on whether I should drop this helper.  I wanted to have more
 spots for checking everything is right and raise if something got wrong, and I
 also wanted to have the cow_new_page to never contain invalid page pointer too
 since after the put_page() it's invalid (otherwise we'll need to set NULL when
 we do put_page every time explicitly).  I'll still tend to keep this if no
 strong opinion.. or I can also drop it if there's another vote.)

> -
>  /*
>   * Duplicate the page for this PTE.  Returns zero if page copied (so we need to
>   * retry on the same PTE again to arm the copied page very soon), or negative
> @@ -925,7 +914,7 @@
>  
>  	if (!pte_same(*src_pte, data->cow_oldpte)) {
>  		/* PTE has changed under us.  Release the page and retry */
> -		page_release_cow(data);
> +		put_page(data->cow_new_page);
>  		return false;
>  	}
>  
> @@ -936,12 +925,6 @@
>  	set_pte_at(dst_mm, addr, dst_pte, entry);
>  	rss[mm_counter(new_page)]++;
>  
> -	/*
> -	 * Manually clear the new page pointer since we've moved ownership to
> -	 * the newly armed PTE.
> -	 */
> -	data->cow_new_page = NULL;
> -
>  	return true;
>  }
>  
> @@ -958,16 +941,12 @@
>  	struct copy_mm_data data;
>  
>  again:
> -	/* We don't reset this for COPY_MM_BREAK_COW */
> -	memset(&data, 0, sizeof(data));
> -
> -again_break_cow:
>  	init_rss_vec(rss);
>  
>  	dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
>  	if (!dst_pte) {
> -		/* Guarantee that the new page is released if there is */
> -		page_release_cow(&data);
> +		if (unlikely(copy_ret == COPY_MM_BREAK_COW))
> +			put_page(data.cow_new_page);
>  		return -ENOMEM;
>  	}
>  	src_pte = pte_offset_map(src_pmd, addr);
> @@ -978,6 +957,22 @@
>  	arch_enter_lazy_mmu_mode();
>  
>  	progress = 0;
> +	if (unlikely(copy_ret == COPY_MM_BREAK_COW)) {
> +		/*
> +		 * Note that in all cases pte_install_copied_page()
> +		 * will properly release the objects in copy_mm_data.
> +		 */
> +		copy_ret = COPY_MM_DONE;
> +		if (pte_install_copied_page(dst_mm, new, src_pte,
> +					    dst_pte, addr, rss,
> +					    &data)) {
> +			/* We installed the pte successfully; move on */
> +			progress++;
> +			goto next;

... to avoid jumps like this because I think it's really tricky. :)

> +		}
> +		/* PTE changed.  Retry this pte (falls through) */
> +	}
> +
>  	do {
>  		/*
>  		 * We are holding two locks at this point - either of them
> @@ -990,24 +985,6 @@
>  				break;
>  		}
>  
> -		if (unlikely(data.cow_new_page)) {
> -			/*
> -			 * If cow_new_page set, we must be at the 2nd round of
> -			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
> -			 * page now.  Note that in all cases page_break_cow()
> -			 * will properly release the objects in copy_mm_data.
> -			 */
> -			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
> -			if (pte_install_copied_page(dst_mm, new, src_pte,
> -						    dst_pte, addr, rss,
> -						    &data)) {
> -				/* We installed the pte successfully; move on */
> -				progress++;
> -				continue;
> -			}
> -			/* PTE changed.  Retry this pte (falls through) */
> -		}
> -
>  		if (pte_none(*src_pte)) {
>  			progress++;
>  			continue;
> @@ -1017,6 +994,7 @@
>  		if (copy_ret != COPY_MM_DONE)
>  			break;
>  		progress += 8;
> +next:
>  	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
>  
>  	arch_leave_lazy_mmu_mode();
> @@ -1030,13 +1008,14 @@
>  	case COPY_MM_SWAP_CONT:
>  		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
>  			return -ENOMEM;
> -		break;
> +		copy_ret = COPY_MM_DONE;

Kind of a continuation of the discussion from previous patch - I think we'd
better reset copy_ret not only for this case, but move it after the switch
(just in case there'll be new ones).  The new BREAK_COW uses goto so it's quite
special.

> +		goto again;

I feel like this could go wrong without the "addr != end" check later, when
this is the last pte to check.

Thanks,

>  	case COPY_MM_BREAK_COW:
>  		/* Do accounting onto parent mm directly */
>  		ret = page_duplicate(src_mm, vma, addr, &data);
>  		if (ret)
>  			return ret;
> -		goto again_break_cow;
> +		goto again;
>  	case COPY_MM_DONE:
>  		/* This means we're all good. */
>  		break;
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22 15:48         ` Oleg Nesterov
@ 2020-09-22 16:03           ` Peter Xu
  2020-09-22 16:53             ` Oleg Nesterov
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-22 16:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > However since I didn't change this logic in this patch, it probably means this
> > bug is also in the original code before this series...  I'm thinking maybe I
> > should prepare a standalone patch to clear the swp_entry_t and cc stable.
> 
> Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 0, then
> pte_none(*src_pte) is not possible after restart? This means that copy_one_pte()
> will be called at least once.

Note that we've released the page table locks, so afaict the old swp entry can
be gone under us when we go back to the "do" loop... :) Extremely corner case,
but maybe still good to fix, extra clearness as a (good) side effect.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 15:17     ` Peter Xu
@ 2020-09-22 16:10       ` Jason Gunthorpe
  2020-09-22 17:54         ` Peter Xu
  2020-09-22 18:02       ` John Hubbard
  2020-09-22 19:11       ` John Hubbard
  2 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 16:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 11:17:36AM -0400, Peter Xu wrote:

> > But it's admittedly a cosmetic point, combined with my perennial fear that
> > I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :)
> 
> Yeah but I hope I'm using it right.. :) I used READ_ONCE/WRITE_ONCE explicitly
> because I think they're cheaper than atomic operations, (which will, iiuc, lock
> the bus).

It is worth thinking a bit about racing fork with
pin_user_pages(). The desired outcome is:

  If fork wins the page is write protected, and pin_user_pages_fast()
  will COW it.

  If pin_user_pages_fast() wins then fork must see the READ_ONCE and
  the pin.

As get_user_pages_fast() is lockless it looks like the ordering has to
be like this:

  pin_user_pages_fast()                   fork()
   atomic_set(has_pinned, 1);
   [..]
   atomic_add(page->_refcount)
   ordered check write protect()
                                          ordered set write protect()
                                          atomic_read(page->_refcount)
                                          atomic_read(has_pinned)

Such that in all the degenerate racy cases the outcome is that both
sides COW, never neither.

Thus I think it does have to be atomics purely from an ordering
perspective, observing an increased _refcount requires that has_pinned
!= 0 if we are pinning.

So, to make this 100% this ordering will need to be touched up too.

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 15:56             ` Jason Gunthorpe
@ 2020-09-22 16:25               ` Linus Torvalds
  0 siblings, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2020-09-22 16:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, Jann Horn, Linux-MM, kernel list, Andrew Morton,
	Jan Kara, Michal Hocko, Kirill Tkhai, Kirill Shutemov,
	Hugh Dickins, Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Oleg Nesterov, Leon Romanovsky

On Tue, Sep 22, 2020 at 8:56 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> I thought MAP_PRIVATE without PROT_WRITE was nonsensical,

MAP_PRIVATE without PROT_WRITE is very common.

It's what happens for every executable mapping, for example.

And no, it's not the same as MAP_SHARED, for a couple of simple
reasons. It does end up being similar for all the *normal* cases, but
there are various cases where it isn't.

 - mprotect() and friends. MAP_PRIVATE is fixed, but it might have
been writable in the past, and it might become writable in the future.

 - breakpoints and ptrace. This will punch through even a non-writable
mapping and force a COW (since that's the whole point: executables are
not writable, but to do a SW breakpoint you have to write to the page)

So no, MAP_PRIVATE is not nonsensical without PROT_WRITE, and it's not
even remotely unusual.

              Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-22 15:58       ` Peter Xu
@ 2020-09-22 16:52         ` Oleg Nesterov
  2020-09-22 18:34           ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-22 16:52 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On 09/22, Peter Xu wrote:
>
> On Tue, Sep 22, 2020 at 02:40:14PM +0200, Oleg Nesterov wrote:
> > On 09/22, Oleg Nesterov wrote:
> > >
> > > On 09/21, Peter Xu wrote:
> > > >
> > > > @@ -859,6 +989,25 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > > >  			    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
> > > >  				break;
> > > >  		}
> > > > +
> > > > +		if (unlikely(data.cow_new_page)) {
> > > > +			/*
> > > > +			 * If cow_new_page set, we must be at the 2nd round of
> > > > +			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
> > > > +			 * page now.  Note that in all cases page_break_cow()
> > > > +			 * will properly release the objects in copy_mm_data.
> > > > +			 */
> > > > +			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
> > > > +			if (pte_install_copied_page(dst_mm, new, src_pte,
> > > > +						    dst_pte, addr, rss,
> > > > +						    &data)) {
> > > > +				/* We installed the pte successfully; move on */
> > > > +				progress++;
> > > > +				continue;
> > >
> > > I'm afraid I misread this patch too ;)
> > >
> > > But it seems to me in this case the main loop can really "leak"
> > > COPY_MM_BREAK_COW. Suppose the the next 31 pte's are pte_none() and
> > > need_resched() is true.
> > >
> > > No?
>
> I still think it's a no...
>
> Note that now we'll reset "progress" every time before the do loop, so we'll
> never reach need_resched() (since progress<32) before pte_install_copied_page()
> when needed.

Yes. But copy_ret is still COPY_MM_BREAK_COW after pte_install_copied_page().
Now suppose that the next 31 pte's are pte_none(), progress will be incremented
every time.

> I explicitly put the pte_install_copied_page() into the loop just...
...
> >  	progress = 0;
> > +	if (unlikely(copy_ret == COPY_MM_BREAK_COW)) {
> > +		/*
> > +		 * Note that in all cases pte_install_copied_page()
> > +		 * will properly release the objects in copy_mm_data.
> > +		 */
> > +		copy_ret = COPY_MM_DONE;
> > +		if (pte_install_copied_page(dst_mm, new, src_pte,
> > +					    dst_pte, addr, rss,
> > +					    &data)) {
> > +			/* We installed the pte successfully; move on */
> > +			progress++;
> > +			goto next;
>
> ... to avoid jumps like this because I think it's really tricky. :)

To me it looks better before the main loop because we know that
data.cow_new_page != NULL is only possible at the 1st iterattion after
restart ;)

But I agree, this is subjective, please ignore. However, I still think
it is better to rely on the copy_ret == COPY_MM_BREAK_COW check rather
than data.cow_new_page != NULL.

> >  	case COPY_MM_SWAP_CONT:
> >  		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
> >  			return -ENOMEM;
> > -		break;
> > +		copy_ret = COPY_MM_DONE;
>
> Kind of a continuation of the discussion from previous patch - I think we'd
> better reset copy_ret not only for this case, but move it after the switch
> (just in case there'll be new ones).  The new BREAK_COW uses goto so it's quite
> special.
>
> > +		goto again;
>
> I feel like this could go wrong without the "addr != end" check later, when
> this is the last pte to check.

How? We know that copy_one_pte() failed and returned COPY_MM_SWAP_CONT
before addr = end.

And this matters "case COPY_MM_BREAK_COW" below which does "goto again"
without the "addr != end" check.

Oleg.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22 16:03           ` Peter Xu
@ 2020-09-22 16:53             ` Oleg Nesterov
  2020-09-22 18:13               ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-22 16:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On 09/22, Peter Xu wrote:
>
> On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > > However since I didn't change this logic in this patch, it probably means this
> > > bug is also in the original code before this series...  I'm thinking maybe I
> > > should prepare a standalone patch to clear the swp_entry_t and cc stable.
> >
> > Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 0, then
> > pte_none(*src_pte) is not possible after restart? This means that copy_one_pte()
> > will be called at least once.
>
> Note that we've released the page table locks, so afaict the old swp entry can
> be gone under us when we go back to the "do" loop... :)

But how?

I am just curious, I don't understand this code enough.

Oleg.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 16:10       ` Jason Gunthorpe
@ 2020-09-22 17:54         ` Peter Xu
  2020-09-22 19:11           ` Jason Gunthorpe
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-22 17:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 01:10:46PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 11:17:36AM -0400, Peter Xu wrote:
> 
> > > But it's admittedly a cosmetic point, combined with my perennial fear that
> > > I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :)
> > 
> > Yeah but I hope I'm using it right.. :) I used READ_ONCE/WRITE_ONCE explicitly
> > because I think they're cheaper than atomic operations, (which will, iiuc, lock
> > the bus).
> 
> It is worth thinking a bit about racing fork with
> pin_user_pages(). The desired outcome is:
> 
>   If fork wins the page is write protected, and pin_user_pages_fast()
>   will COW it.
> 
>   If pin_user_pages_fast() wins then fork must see the READ_ONCE and
>   the pin.
> 
> As get_user_pages_fast() is lockless it looks like the ordering has to
> be like this:
> 
>   pin_user_pages_fast()                   fork()
>    atomic_set(has_pinned, 1);
>    [..]
>    atomic_add(page->_refcount)
>    ordered check write protect()
>                                           ordered set write protect()
>                                           atomic_read(page->_refcount)
>                                           atomic_read(has_pinned)
> 
> Such that in all the degenerate racy cases the outcome is that both
> sides COW, never neither.
> 
> Thus I think it does have to be atomics purely from an ordering
> perspective, observing an increased _refcount requires that has_pinned
> != 0 if we are pinning.
> 
> So, to make this 100% this ordering will need to be touched up too.

Thanks for spotting this.  So something like below should work, right?

diff --git a/mm/memory.c b/mm/memory.c
index 8f3521be80ca..6591f3f33299 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
                 * Because we'll need to release the locks before doing cow,
                 * pass this work to upper layer.
                 */
-               if (READ_ONCE(src_mm->has_pinned) && wp &&
-                   page_maybe_dma_pinned(page)) {
+               if (wp && page_maybe_dma_pinned(page) &&
+                   READ_ONCE(src_mm->has_pinned)) {
                        /* We've got the page already; we're safe */
                        data->cow_old_page = page;
                        data->cow_oldpte = *src_pte;

I can also add some more comment to emphasize this.

I think the WRITE_ONCE/READ_ONCE can actually be kept, because atomic ops
should contain proper memory barriers already so the memory access orders
should be guaranteed (e.g., atomic_add() will have an implicit wmb(); rmb() for
the other side).  However maybe it's even simpler to change has_pinned into
atomic as John suggested.  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 15:17     ` Peter Xu
  2020-09-22 16:10       ` Jason Gunthorpe
@ 2020-09-22 18:02       ` John Hubbard
  2020-09-22 18:15         ` Peter Xu
  2020-09-22 19:11       ` John Hubbard
  2 siblings, 1 reply; 110+ messages in thread
From: John Hubbard @ 2020-09-22 18:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On 9/22/20 8:17 AM, Peter Xu wrote:
> On Mon, Sep 21, 2020 at 04:53:38PM -0700, John Hubbard wrote:
>> On 9/21/20 2:17 PM, Peter Xu wrote:
>>> (Commit message collected from Jason Gunthorpe)
>>>
>>> Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
>>
>> Not yet, it doesn't. :)  More:
>>
>>> track if the mm_struct has ever been used with pin_user_pages(). mm_structs
>>> that have never been passed to pin_user_pages() cannot have a positive
>>> page_maybe_dma_pinned() by definition. This allows cases that might drive up
>>> the page ref_count to avoid any penalty from handling dma_pinned pages.
>>>
>>> Due to complexities with unpining this trivial version is a permanent sticky
>>> bit, future work will be needed to make this a counter.
>>
>> How about this instead:
>>
>> Subsequent patches intend to reduce the chance of false positives from
>> page_maybe_dma_pinned(), by also considering whether or not a page has
>> even been part of an mm struct that has ever had pin_user_pages*()
>> applied to any of its pages.
>>
>> In order to allow that, provide a boolean value (even though it's not
>> implemented exactly as a boolean type) within the mm struct, that is
>> simply set once and never cleared. This will suffice for an early, rough
>> implementation that fixes a few problems.
>>
>> Future work is planned, to provide a more sophisticated solution, likely
>> involving a counter, and *not* involving something that is set and never
>> cleared.
> 
> This looks good, thanks.  Though I think Jason's version is good too (as long
> as we remove the confusing sentence, that's the one starting with "mm_structs
> that have never been passed... ").  Before I drop Jason's version, I think I'd
> better figure out what's the major thing we missed so that maybe we can add
> another paragraph.  E.g., "future work will be needed to make this a counter"
> already means "involving a counter, and *not* involving something that is set
> and never cleared" to me... Because otherwise it won't be called a counter..
> 

That's just a bit of harmless redundancy, intended to help clarify where this
is going. But if the redundancy isn't actually helping, you could simply
truncate it to the first half of the sentence, like this:

"Future work is planned, to provide a more sophisticated solution, likely
involving a counter."


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22 16:53             ` Oleg Nesterov
@ 2020-09-22 18:13               ` Peter Xu
  2020-09-22 18:23                 ` Oleg Nesterov
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-22 18:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 06:53:55PM +0200, Oleg Nesterov wrote:
> On 09/22, Peter Xu wrote:
> >
> > On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > > > However since I didn't change this logic in this patch, it probably means this
> > > > bug is also in the original code before this series...  I'm thinking maybe I
> > > > should prepare a standalone patch to clear the swp_entry_t and cc stable.
> > >
> > > Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 0, then
> > > pte_none(*src_pte) is not possible after restart? This means that copy_one_pte()
> > > will be called at least once.
> >
> > Note that we've released the page table locks, so afaict the old swp entry can
> > be gone under us when we go back to the "do" loop... :)
> 
> But how?
> 
> I am just curious, I don't understand this code enough.

Me neither.

The point is I think we can't assume *src_pte will read the same if we have
released the src_ptl in copy_pte_range(), because imho the src_ptl is the only
thing to protect it.  Or to be more explicit, we need pte_alloc_map_lock() to
read a stable pmd/pte or before update (since src_ptl itself could change too).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 18:02       ` John Hubbard
@ 2020-09-22 18:15         ` Peter Xu
  0 siblings, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-22 18:15 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 11:02:03AM -0700, John Hubbard wrote:
> On 9/22/20 8:17 AM, Peter Xu wrote:
> > On Mon, Sep 21, 2020 at 04:53:38PM -0700, John Hubbard wrote:
> > > On 9/21/20 2:17 PM, Peter Xu wrote:
> > > > (Commit message collected from Jason Gunthorpe)
> > > > 
> > > > Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
> > > 
> > > Not yet, it doesn't. :)  More:
> > > 
> > > > track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> > > > that have never been passed to pin_user_pages() cannot have a positive
> > > > page_maybe_dma_pinned() by definition. This allows cases that might drive up
> > > > the page ref_count to avoid any penalty from handling dma_pinned pages.
> > > > 
> > > > Due to complexities with unpining this trivial version is a permanent sticky
> > > > bit, future work will be needed to make this a counter.
> > > 
> > > How about this instead:
> > > 
> > > Subsequent patches intend to reduce the chance of false positives from
> > > page_maybe_dma_pinned(), by also considering whether or not a page has
> > > even been part of an mm struct that has ever had pin_user_pages*()
> > > applied to any of its pages.
> > > 
> > > In order to allow that, provide a boolean value (even though it's not
> > > implemented exactly as a boolean type) within the mm struct, that is
> > > simply set once and never cleared. This will suffice for an early, rough
> > > implementation that fixes a few problems.
> > > 
> > > Future work is planned, to provide a more sophisticated solution, likely
> > > involving a counter, and *not* involving something that is set and never
> > > cleared.
> > 
> > This looks good, thanks.  Though I think Jason's version is good too (as long
> > as we remove the confusing sentence, that's the one starting with "mm_structs
> > that have never been passed... ").  Before I drop Jason's version, I think I'd
> > better figure out what's the major thing we missed so that maybe we can add
> > another paragraph.  E.g., "future work will be needed to make this a counter"
> > already means "involving a counter, and *not* involving something that is set
> > and never cleared" to me... Because otherwise it won't be called a counter..
> > 
> 
> That's just a bit of harmless redundancy, intended to help clarify where this
> is going. But if the redundancy isn't actually helping, you could simply
> truncate it to the first half of the sentence, like this:
> 
> "Future work is planned, to provide a more sophisticated solution, likely
> involving a counter."

Will do.  Thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22 18:13               ` Peter Xu
@ 2020-09-22 18:23                 ` Oleg Nesterov
  2020-09-22 18:49                   ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-22 18:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On 09/22, Peter Xu wrote:
>
> On Tue, Sep 22, 2020 at 06:53:55PM +0200, Oleg Nesterov wrote:
> > On 09/22, Peter Xu wrote:
> > >
> > > On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > > > > However since I didn't change this logic in this patch, it probably means this
> > > > > bug is also in the original code before this series...  I'm thinking maybe I
> > > > > should prepare a standalone patch to clear the swp_entry_t and cc stable.
> > > >
> > > > Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 0, then
> > > > pte_none(*src_pte) is not possible after restart? This means that copy_one_pte()
> > > > will be called at least once.
> > >
> > > Note that we've released the page table locks, so afaict the old swp entry can
> > > be gone under us when we go back to the "do" loop... :)
> >
> > But how?
> >
> > I am just curious, I don't understand this code enough.
>
> Me neither.
>
> The point is I think we can't assume *src_pte will read the same if we have
> released the src_ptl in copy_pte_range(),

This is clear.

But I still think that !pte_none() -> pte_none() transition is not possible
under mmap_write_lock()...

OK, let me repeat I don't understans these code paths enough, let me reword:
I don't see how this transition is possible.

Oleg.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-22 16:52         ` Oleg Nesterov
@ 2020-09-22 18:34           ` Peter Xu
  2020-09-22 18:44             ` Oleg Nesterov
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-22 18:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Tue, Sep 22, 2020 at 06:52:17PM +0200, Oleg Nesterov wrote:
> On 09/22, Peter Xu wrote:
> >
> > On Tue, Sep 22, 2020 at 02:40:14PM +0200, Oleg Nesterov wrote:
> > > On 09/22, Oleg Nesterov wrote:
> > > >
> > > > On 09/21, Peter Xu wrote:
> > > > >
> > > > > @@ -859,6 +989,25 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > > > >  			    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
> > > > >  				break;
> > > > >  		}
> > > > > +
> > > > > +		if (unlikely(data.cow_new_page)) {
> > > > > +			/*
> > > > > +			 * If cow_new_page set, we must be at the 2nd round of
> > > > > +			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
> > > > > +			 * page now.  Note that in all cases page_break_cow()
> > > > > +			 * will properly release the objects in copy_mm_data.
> > > > > +			 */
> > > > > +			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
> > > > > +			if (pte_install_copied_page(dst_mm, new, src_pte,
> > > > > +						    dst_pte, addr, rss,
> > > > > +						    &data)) {
> > > > > +				/* We installed the pte successfully; move on */
> > > > > +				progress++;
> > > > > +				continue;
> > > >
> > > > I'm afraid I misread this patch too ;)
> > > >
> > > > But it seems to me in this case the main loop can really "leak"
> > > > COPY_MM_BREAK_COW. Suppose the the next 31 pte's are pte_none() and
> > > > need_resched() is true.
> > > >
> > > > No?
> >
> > I still think it's a no...
> >
> > Note that now we'll reset "progress" every time before the do loop, so we'll
> > never reach need_resched() (since progress<32) before pte_install_copied_page()
> > when needed.
> 
> Yes. But copy_ret is still COPY_MM_BREAK_COW after pte_install_copied_page().
> Now suppose that the next 31 pte's are pte_none(), progress will be incremented
> every time.

Yes, I think you're right - I'll need to reset that.

> 
> > I explicitly put the pte_install_copied_page() into the loop just...
> ...
> > >  	progress = 0;
> > > +	if (unlikely(copy_ret == COPY_MM_BREAK_COW)) {
> > > +		/*
> > > +		 * Note that in all cases pte_install_copied_page()
> > > +		 * will properly release the objects in copy_mm_data.
> > > +		 */
> > > +		copy_ret = COPY_MM_DONE;
> > > +		if (pte_install_copied_page(dst_mm, new, src_pte,
> > > +					    dst_pte, addr, rss,
> > > +					    &data)) {
> > > +			/* We installed the pte successfully; move on */
> > > +			progress++;
> > > +			goto next;
> >
> > ... to avoid jumps like this because I think it's really tricky. :)
> 
> To me it looks better before the main loop because we know that
> data.cow_new_page != NULL is only possible at the 1st iterattion after
> restart ;)
> 
> But I agree, this is subjective, please ignore.

Thanks.  For simplicity, I'll keep the code majorly as is.  But I'm still open
to change if e.g. someone else still perfers the other way.

> However, I still think
> it is better to rely on the copy_ret == COPY_MM_BREAK_COW check rather
> than data.cow_new_page != NULL.

Yes.  Logically we should check both, but now as I'm written it as:

        if (unlikely(data.cow_new_page)) {
                WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
                ...
        }

I think it's even safer because it's actually checking both, but also warn if
only cow_new_page is set, which should never happen anyways.

Or I can also do it in inverted order if you think better:

        if (unlikely(copy_ret == COPY_MM_BREAK_COW)) {
                WARN_ON_ONCE(!data.cow_new_page);
                ...
        }

> 
> > >  	case COPY_MM_SWAP_CONT:
> > >  		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
> > >  			return -ENOMEM;
> > > -		break;
> > > +		copy_ret = COPY_MM_DONE;
> >
> > Kind of a continuation of the discussion from previous patch - I think we'd
> > better reset copy_ret not only for this case, but move it after the switch
> > (just in case there'll be new ones).  The new BREAK_COW uses goto so it's quite
> > special.
> >
> > > +		goto again;
> >
> > I feel like this could go wrong without the "addr != end" check later, when
> > this is the last pte to check.
> 
> How? We know that copy_one_pte() failed and returned COPY_MM_SWAP_CONT
> before addr = end.

I think you're right, again. :)

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-22 18:34           ` Peter Xu
@ 2020-09-22 18:44             ` Oleg Nesterov
  2020-09-23  1:03               ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-22 18:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On 09/22, Peter Xu wrote:
>
> Or I can also do it in inverted order if you think better:
>
>         if (unlikely(copy_ret == COPY_MM_BREAK_COW)) {
>                 WARN_ON_ONCE(!data.cow_new_page);
>                 ...
>         }

Peter, let me say this again. I don't understand this code enough, you
can safely ignore me ;)

However. Personally I strongly prefer the above. Personally I really
dislike this part of 4/5:

	 again:
	+	/* We don't reset this for COPY_MM_BREAK_COW */
	+	memset(&data, 0, sizeof(data));
	+
	+again_break_cow:

If we rely on "copy_ret == COPY_MM_BREAK_COW" we can unify "again" and
"again_break_cow", we don't need to clear ->cow_new_page, this makes the
logic more understandable. To me at least ;)

Oleg.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22 18:23                 ` Oleg Nesterov
@ 2020-09-22 18:49                   ` Peter Xu
  2020-09-23  6:52                     ` Oleg Nesterov
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-22 18:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 08:23:18PM +0200, Oleg Nesterov wrote:
> On 09/22, Peter Xu wrote:
> >
> > On Tue, Sep 22, 2020 at 06:53:55PM +0200, Oleg Nesterov wrote:
> > > On 09/22, Peter Xu wrote:
> > > >
> > > > On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > > > > > However since I didn't change this logic in this patch, it probably means this
> > > > > > bug is also in the original code before this series...  I'm thinking maybe I
> > > > > > should prepare a standalone patch to clear the swp_entry_t and cc stable.
> > > > >
> > > > > Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 0, then
> > > > > pte_none(*src_pte) is not possible after restart? This means that copy_one_pte()
> > > > > will be called at least once.
> > > >
> > > > Note that we've released the page table locks, so afaict the old swp entry can
> > > > be gone under us when we go back to the "do" loop... :)
> > >
> > > But how?
> > >
> > > I am just curious, I don't understand this code enough.
> >
> > Me neither.
> >
> > The point is I think we can't assume *src_pte will read the same if we have
> > released the src_ptl in copy_pte_range(),
> 
> This is clear.
> 
> But I still think that !pte_none() -> pte_none() transition is not possible
> under mmap_write_lock()...
> 
> OK, let me repeat I don't understans these code paths enough, let me reword:
> I don't see how this transition is possible.

Though I guess I'll keep my wording, because I still think it's accurate to
me. :)

Can we e.g. punch a page hole without changing vmas?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 15:17     ` Peter Xu
  2020-09-22 16:10       ` Jason Gunthorpe
  2020-09-22 18:02       ` John Hubbard
@ 2020-09-22 19:11       ` John Hubbard
  2 siblings, 0 replies; 110+ messages in thread
From: John Hubbard @ 2020-09-22 19:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On 9/22/20 8:17 AM, Peter Xu wrote:
> On Mon, Sep 21, 2020 at 04:53:38PM -0700, John Hubbard wrote:
>> On 9/21/20 2:17 PM, Peter Xu wrote:
...
>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>>> index 496c3ff97cce..6f291f8b74c6 100644
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -441,6 +441,16 @@ struct mm_struct {
>>>    #endif
>>>    		int map_count;			/* number of VMAs */
>>> +		/**
>>> +		 * @has_pinned: Whether this mm has pinned any pages.  This can
>>> +		 * be either replaced in the future by @pinned_vm when it
>>> +		 * becomes stable, or grow into a counter on its own. We're
>>> +		 * aggresive on this bit now - even if the pinned pages were
>>> +		 * unpinned later on, we'll still keep this bit set for the
>>> +		 * lifecycle of this mm just for simplicity.
>>> +		 */
>>> +		int has_pinned;
>>
>> I think this would be elegant as an atomic_t, and using atomic_set() and
>> atomic_read(), which seem even more self-documenting that what you have here.
>>
>> But it's admittedly a cosmetic point, combined with my perennial fear that
>> I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :)
> 
> Yeah but I hope I'm using it right.. :) I used READ_ONCE/WRITE_ONCE explicitly
> because I think they're cheaper than atomic operations, (which will, iiuc, lock
> the bus).
> 

The "cheaper" argument vanishes if the longer-term plan is to use atomic ops.
Given the intent of this patchset, simpler is better, and "simpler that has the
same performance as the longer term solution" is definitely OK.

>>
>> It's completely OK to just ignore this comment, but I didn't want to completely
>> miss the opportunity to make it a tiny bit cleaner to the reader.
> 
> This can always become an atomic in the future, or am I wrong?  Actually if
> we're going to the counter way I feel like it's a must.
> 

Yes it can change. But you can get the simplicity and clarity now, rather
than waiting.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 17:54         ` Peter Xu
@ 2020-09-22 19:11           ` Jason Gunthorpe
  2020-09-23  0:27             ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 19:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 01:54:15PM -0400, Peter Xu wrote:
> diff --git a/mm/memory.c b/mm/memory.c
> index 8f3521be80ca..6591f3f33299 100644
> +++ b/mm/memory.c
> @@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>                  * Because we'll need to release the locks before doing cow,
>                  * pass this work to upper layer.
>                  */
> -               if (READ_ONCE(src_mm->has_pinned) && wp &&
> -                   page_maybe_dma_pinned(page)) {
> +               if (wp && page_maybe_dma_pinned(page) &&
> +                   READ_ONCE(src_mm->has_pinned)) {
>                         /* We've got the page already; we're safe */
>                         data->cow_old_page = page;
>                         data->cow_oldpte = *src_pte;
> 
> I can also add some more comment to emphasize this.

It is not just that, but the ptep_set_wrprotect() has to be done
earlier.

Otherwise it races like:

   pin_user_pages_fast()                   fork()
    atomic_set(has_pinned, 1);
    [..]
                                           atomic_read(page->_refcount) //false
                                           // skipped atomic_read(has_pinned)
    atomic_add(page->_refcount)
    ordered check write protect()
                                           ordered set write protect()

And now have a write protect on a DMA pinned page, which is the
invarient we are trying to create.

The best algorithm I've thought of is something like:

 pte_map_lock()
  if (page) {
      if (wp) {
	  ptep_set_wrprotect()
	  /* Order with try_grab_compound_head(), either we see
	   * page_maybe_dma_pinned(), or they see the wrprotect */
	  get_page();

	  if (page_maybe_dma_pinned() && READ_ONCE(src_mm->has_pinned)) {
	       put_page();
	       ptep_clear_wrprotect()

	       // do copy
	       return
	  }
      } else {
	  get_page();
      }
      page_dup_rmap()
 pte_unmap_lock()

Then the do_wp_page() path would have to detect that the page is not
write protected under the pte lock inside the fault handler and just
do nothing. Ie the set/clear could be visible to the CPU and trigger a
spurious fault, but never trigger a COW.

Thus 'wp' becomes a 'lock' that prevents GUP from returning this page.

Very tricky, deserves a huge comment near the ptep_clear_wrprotect()

Consider the above algorithm beside the gup_fast() algorithm:

		if (!pte_access_permitted(pte, flags & FOLL_WRITE))
			goto pte_unmap;
                [..]
		head = try_grab_compound_head(page, 1, flags);
		if (!head)
			goto pte_unmap;
		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
			put_compound_head(head, 1, flags);
			goto pte_unmap;

That last *ptep will check that the WP is not set after making
page_maybe_dma_pinned() true.

It still looks reasonable, the extra work is still just the additional
atomic in page_maybe_dma_pinned(), just everything else has to be very
carefully sequenced due to unlocked page table accessors.

> I think the WRITE_ONCE/READ_ONCE can actually be kept, because atomic ops
> should contain proper memory barriers already so the memory access orders
> should be guaranteed 

I always have to carefully check ORDERING in
Documentation/atomic_t.txt when asking those questions..

It seems very subtle to me, but yes, try_grab_compound_head() and
page_maybe_dma_pinned() are already paired ordering barriers, so both
the pte_val() on the GUP side and the READ_ONCE(has_pinned) look OK.

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-22 10:33     ` Jan Kara
@ 2020-09-22 20:01       ` John Hubbard
  2020-09-23  9:22         ` Jan Kara
  0 siblings, 1 reply; 110+ messages in thread
From: John Hubbard @ 2020-09-22 20:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Peter Xu, linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Oleg Nesterov, Kirill Tkhai,
	Hugh Dickins, Leon Romanovsky, Christoph Hellwig, Andrew Morton,
	Jason Gunthorpe, Andrea Arcangeli

On 9/22/20 3:33 AM, Jan Kara wrote:
> On Mon 21-09-20 23:41:16, John Hubbard wrote:
>> On 9/21/20 2:20 PM, Peter Xu wrote:
>> ...
>>> +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
>>> +		     page_maybe_dma_pinned(src_page))) {
>>
>> This condition would make a good static inline function. It's used in 3
>> places, and the condition is quite special and worth documenting, and
>> having a separate function helps with that, because the function name
>> adds to the story. I'd suggest approximately:
>>
>>      page_likely_dma_pinned()
>>
>> for the name.
> 
> Well, but we should also capture that this really only works for anonymous
> pages. For file pages mm->has_pinned does not work because the page may be
> still pinned by completely unrelated process as Jann already properly
> pointed out earlier in the thread. So maybe anon_page_likely_pinned()?
> Possibly also assert PageAnon(page) in it if we want to be paranoid...
> 
> 								Honza

The file-backed case doesn't really change anything, though:
page_maybe_dma_pinned() is already a "fuzzy yes" in the same sense: you
can get a false positive. Just like here, with an mm->has_pinned that
could be a false positive for a process.

And for that reason, I'm also not sure an "assert PageAnon(page)" is
desirable. That assertion would prevent file-backed callers from being
able to call a function that provides a fuzzy answer, but I don't see
why you'd want or need to do that. The goal here is to make the fuzzy
answer a little bit more definite, but it's not "broken" just because
the result is still fuzzy, right?

Apologies if I'm missing a huge point here... :)


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-22 19:11           ` Jason Gunthorpe
@ 2020-09-23  0:27             ` Peter Xu
  2020-09-23 13:10               ` Peter Xu
  2020-09-23 17:07               ` Jason Gunthorpe
  0 siblings, 2 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-23  0:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 04:11:16PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 01:54:15PM -0400, Peter Xu wrote:
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8f3521be80ca..6591f3f33299 100644
> > +++ b/mm/memory.c
> > @@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >                  * Because we'll need to release the locks before doing cow,
> >                  * pass this work to upper layer.
> >                  */
> > -               if (READ_ONCE(src_mm->has_pinned) && wp &&
> > -                   page_maybe_dma_pinned(page)) {
> > +               if (wp && page_maybe_dma_pinned(page) &&
> > +                   READ_ONCE(src_mm->has_pinned)) {
> >                         /* We've got the page already; we're safe */
> >                         data->cow_old_page = page;
> >                         data->cow_oldpte = *src_pte;
> > 
> > I can also add some more comment to emphasize this.
> 
> It is not just that, but the ptep_set_wrprotect() has to be done
> earlier.

Now I understand your point, I think..  So I guess it's not only about
has_pinned, but it should be a race between the fast-gup and the fork() code,
even if has_pinned is always set.

> 
> Otherwise it races like:
> 
>    pin_user_pages_fast()                   fork()
>     atomic_set(has_pinned, 1);
>     [..]
>                                            atomic_read(page->_refcount) //false
>                                            // skipped atomic_read(has_pinned)
>     atomic_add(page->_refcount)
>     ordered check write protect()
>                                            ordered set write protect()
> 
> And now have a write protect on a DMA pinned page, which is the
> invarient we are trying to create.
> 
> The best algorithm I've thought of is something like:
> 
>  pte_map_lock()
>   if (page) {
>       if (wp) {
> 	  ptep_set_wrprotect()
> 	  /* Order with try_grab_compound_head(), either we see
> 	   * page_maybe_dma_pinned(), or they see the wrprotect */
> 	  get_page();

Is this get_page() a must to be after ptep_set_wrprotect() explicitly?  IIUC
what we need is to order ptep_set_wrprotect() and page_maybe_dma_pinned() here.
E.g., would a "mb()" work?

Another thing is, do we need similar thing for e.g. gup_pte_range(), so that
to guarantee ordering of try_grab_compound_head() and the pte change check?

> 
> 	  if (page_maybe_dma_pinned() && READ_ONCE(src_mm->has_pinned)) {
> 	       put_page();
> 	       ptep_clear_wrprotect()
> 
> 	       // do copy
> 	       return
> 	  }
>       } else {
> 	  get_page();
>       }
>       page_dup_rmap()
>  pte_unmap_lock()
> 
> Then the do_wp_page() path would have to detect that the page is not
> write protected under the pte lock inside the fault handler and just
> do nothing.

Yes, iiuc do_wp_page() should be able to handle spurious write page faults like
this already, as below:

	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
	spin_lock(vmf->ptl);
        ...
	if (vmf->flags & FAULT_FLAG_WRITE) {
		if (!pte_write(entry))
			return do_wp_page(vmf);
		entry = pte_mkdirty(entry);
	}

So when spin_lock() returns:

  - When it's a real cow (not pinned pages; we write-protected it and it keeps
    write-protected), we should do cow here as usual.

  - When it's a fake cow (pinned pages), the write bit should have been
    recovered before the page table lock released, and we'll skip do_wp_page()
    and retry the page fault immediately.

> Ie the set/clear could be visible to the CPU and trigger a
> spurious fault, but never trigger a COW.
> 
> Thus 'wp' becomes a 'lock' that prevents GUP from returning this page.

Another question is, how about read fast-gup for pinning?  Because we can't use
the write-protect mechanism to block a read gup.  I remember we've discussed
similar things and iirc your point is "pinned pages should always be with
WRITE".  However now I still doubt it...  Because I feel like read gup is still
legal (as I mentioned previously - when device purely writes to the page and
the processor only reads from it).

> 
> Very tricky, deserves a huge comment near the ptep_clear_wrprotect()
> 
> Consider the above algorithm beside the gup_fast() algorithm:
> 
> 		if (!pte_access_permitted(pte, flags & FOLL_WRITE))
> 			goto pte_unmap;
>                 [..]
> 		head = try_grab_compound_head(page, 1, flags);
> 		if (!head)
> 			goto pte_unmap;
> 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> 			put_compound_head(head, 1, flags);
> 			goto pte_unmap;
> 
> That last *ptep will check that the WP is not set after making
> page_maybe_dma_pinned() true.
> 
> It still looks reasonable, the extra work is still just the additional
> atomic in page_maybe_dma_pinned(), just everything else has to be very
> carefully sequenced due to unlocked page table accessors.

Tricky!  I'm still thinking about some easier way but no much clue so far.
Hopefully we'll figure out something solid soon.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-22 18:44             ` Oleg Nesterov
@ 2020-09-23  1:03               ` Peter Xu
  2020-09-23 20:25                 ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-23  1:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Tue, Sep 22, 2020 at 08:44:00PM +0200, Oleg Nesterov wrote:
> On 09/22, Peter Xu wrote:
> >
> > Or I can also do it in inverted order if you think better:
> >
> >         if (unlikely(copy_ret == COPY_MM_BREAK_COW)) {
> >                 WARN_ON_ONCE(!data.cow_new_page);
> >                 ...
> >         }
> 
> Peter, let me say this again. I don't understand this code enough, you
> can safely ignore me ;)

Why? I appreciate every single comment from you! :)

> 
> However. Personally I strongly prefer the above. Personally I really
> dislike this part of 4/5:
> 
> 	 again:
> 	+	/* We don't reset this for COPY_MM_BREAK_COW */
> 	+	memset(&data, 0, sizeof(data));
> 	+
> 	+again_break_cow:
> 
> If we rely on "copy_ret == COPY_MM_BREAK_COW" we can unify "again" and
> "again_break_cow", we don't need to clear ->cow_new_page, this makes the
> logic more understandable. To me at least ;)

I see your point.  I'll definitely try it out.  I think I'll at least use what
you preferred above since it's actually the same as before, logically.  Then
I'll consider drop the again_break_cow, as long as I'm still as confident after
I do the change on not leaking anything :).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-22 18:49                   ` Peter Xu
@ 2020-09-23  6:52                     ` Oleg Nesterov
  0 siblings, 0 replies; 110+ messages in thread
From: Oleg Nesterov @ 2020-09-23  6:52 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On 09/22, Peter Xu wrote:
>
> On Tue, Sep 22, 2020 at 08:23:18PM +0200, Oleg Nesterov wrote:
> >
> > But I still think that !pte_none() -> pte_none() transition is not possible
> > under mmap_write_lock()...
> >
> > OK, let me repeat I don't understans these code paths enough, let me reword:
> > I don't see how this transition is possible.
>
> Though I guess I'll keep my wording, because I still think it's accurate to
> me. :)
>
> Can we e.g. punch a page hole without changing vmas?

punch a hole? I don't know what does it mean...

However, I think you are right anyway. I forgot that (at least) truncate can
clear this pte without mmap_sem after pte_unmap_unlock().

So I think you are right, the current code is wrong too.

Thanks!

Oleg.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-22 20:01       ` John Hubbard
@ 2020-09-23  9:22         ` Jan Kara
  2020-09-23 13:50           ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Jan Kara @ 2020-09-23  9:22 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jan Kara, Peter Xu, linux-mm, linux-kernel, Linus Torvalds,
	Michal Hocko, Kirill Shutemov, Jann Horn, Oleg Nesterov,
	Kirill Tkhai, Hugh Dickins, Leon Romanovsky, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Tue 22-09-20 13:01:13, John Hubbard wrote:
> On 9/22/20 3:33 AM, Jan Kara wrote:
> > On Mon 21-09-20 23:41:16, John Hubbard wrote:
> > > On 9/21/20 2:20 PM, Peter Xu wrote:
> > > ...
> > > > +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> > > > +		     page_maybe_dma_pinned(src_page))) {
> > > 
> > > This condition would make a good static inline function. It's used in 3
> > > places, and the condition is quite special and worth documenting, and
> > > having a separate function helps with that, because the function name
> > > adds to the story. I'd suggest approximately:
> > > 
> > >      page_likely_dma_pinned()
> > > 
> > > for the name.
> > 
> > Well, but we should also capture that this really only works for anonymous
> > pages. For file pages mm->has_pinned does not work because the page may be
> > still pinned by completely unrelated process as Jann already properly
> > pointed out earlier in the thread. So maybe anon_page_likely_pinned()?
> > Possibly also assert PageAnon(page) in it if we want to be paranoid...
> > 
> > 								Honza
> 
> The file-backed case doesn't really change anything, though:
> page_maybe_dma_pinned() is already a "fuzzy yes" in the same sense: you
> can get a false positive. Just like here, with an mm->has_pinned that
> could be a false positive for a process.
> 
> And for that reason, I'm also not sure an "assert PageAnon(page)" is
> desirable. That assertion would prevent file-backed callers from being
> able to call a function that provides a fuzzy answer, but I don't see
> why you'd want or need to do that. The goal here is to make the fuzzy
> answer a little bit more definite, but it's not "broken" just because
> the result is still fuzzy, right?
> 
> Apologies if I'm missing a huge point here... :)

But the problem is that if you apply mm->has_pinned check on file pages,
you can get false negatives now. And that's not acceptable...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 0/5] mm: Break COW for pinned pages during fork()
  2020-09-21 21:17 [PATCH 0/5] mm: Break COW for pinned pages during fork() Peter Xu
                   ` (4 preceding siblings ...)
  2020-09-21 21:20 ` [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork() Peter Xu
@ 2020-09-23 10:21 ` Leon Romanovsky
  2020-09-23 15:37   ` Peter Xu
  5 siblings, 1 reply; 110+ messages in thread
From: Leon Romanovsky @ 2020-09-23 10:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Linus Torvalds, Jann Horn

On Mon, Sep 21, 2020 at 05:17:39PM -0400, Peter Xu wrote:
> Finally I start to post formal patches because it's growing.  And also since
> we've discussed quite some issues already, so I feel like it's clearer on what
> we need to do, and how.
>
> This series is majorly inspired by the previous discussion on the list [1],
> starting from the report from Jason on the rdma test failure.  Linus proposed
> the solution, which seems to be a very nice approach to avoid the breakage of
> userspace apps that didn't use MADV_DONTFORK properly before.  More information
> can be found in that thread too.
>
> I believe the initial plan was to consider merging something like this for
> rc7/rc8.  However now I'm not sure due to the fact that the code change in
> copy_pte_range() is probably more than expected, so it can be with some risk.
> I'll leave this question to the reviewers...
>
> I tested it myself with fork() after vfio pinning a bunch of device pages, and
> I verified that the new copy pte logic worked as expected at least in the most
> general path.  However I didn't test thp case yet because afaict vfio does not
> support thp backed dma pages.  Luckily, the pmd/pud thp patch is much more
> straightforward than the pte one, so hopefully it can be directly verified by
> some code review plus some more heavy-weight rdma tests.
>
> Patch 1:      Introduce mm.has_pinned (as single patch as suggested by Jason)
> Patch 2-3:    Some slight rework on copy_page_range() path as preparation
> Patch 4:      Early cow solution for pte copy for pinned pages
> Patch 5:      Same as above, but for thp (pmd/pud).
>
> Hugetlbfs fix is still missing, but as planned, that's not urgent so we can
> work upon.  Comments greatly welcomed.

Hi Peter,

I'm ware that this series is under ongoing review and probably not
final, but we tested anyway and it solves our RDMA failures.

Thanks

>
> Thanks.
>
> Peter Xu (5):
>   mm: Introduce mm_struct.has_pinned
>   mm/fork: Pass new vma pointer into copy_page_range()
>   mm: Rework return value for copy_one_pte()
>   mm: Do early cow for pinned pages during fork() for ptes
>   mm/thp: Split huge pmds/puds if they're pinned when fork()
>
>  include/linux/mm.h       |   2 +-
>  include/linux/mm_types.h |  10 ++
>  kernel/fork.c            |   3 +-
>  mm/gup.c                 |   6 ++
>  mm/huge_memory.c         |  26 +++++
>  mm/memory.c              | 226 +++++++++++++++++++++++++++++++++++----
>  6 files changed, 248 insertions(+), 25 deletions(-)
>
> --
> 2.26.2
>
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-23  0:27             ` Peter Xu
@ 2020-09-23 13:10               ` Peter Xu
  2020-09-23 14:20                 ` Jan Kara
  2020-09-23 17:07               ` Jason Gunthorpe
  1 sibling, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-23 13:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 08:27:35PM -0400, Peter Xu wrote:
> On Tue, Sep 22, 2020 at 04:11:16PM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 22, 2020 at 01:54:15PM -0400, Peter Xu wrote:
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 8f3521be80ca..6591f3f33299 100644
> > > +++ b/mm/memory.c
> > > @@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > >                  * Because we'll need to release the locks before doing cow,
> > >                  * pass this work to upper layer.
> > >                  */
> > > -               if (READ_ONCE(src_mm->has_pinned) && wp &&
> > > -                   page_maybe_dma_pinned(page)) {
> > > +               if (wp && page_maybe_dma_pinned(page) &&
> > > +                   READ_ONCE(src_mm->has_pinned)) {
> > >                         /* We've got the page already; we're safe */
> > >                         data->cow_old_page = page;
> > >                         data->cow_oldpte = *src_pte;
> > > 
> > > I can also add some more comment to emphasize this.
> > 
> > It is not just that, but the ptep_set_wrprotect() has to be done
> > earlier.
> 
> Now I understand your point, I think..  So I guess it's not only about
> has_pinned, but it should be a race between the fast-gup and the fork() code,
> even if has_pinned is always set.
> 
> > 
> > Otherwise it races like:
> > 
> >    pin_user_pages_fast()                   fork()
> >     atomic_set(has_pinned, 1);
> >     [..]
> >                                            atomic_read(page->_refcount) //false
> >                                            // skipped atomic_read(has_pinned)
> >     atomic_add(page->_refcount)
> >     ordered check write protect()
> >                                            ordered set write protect()
> > 
> > And now have a write protect on a DMA pinned page, which is the
> > invarient we are trying to create.
> > 
> > The best algorithm I've thought of is something like:
> > 
> >  pte_map_lock()
> >   if (page) {
> >       if (wp) {
> > 	  ptep_set_wrprotect()
> > 	  /* Order with try_grab_compound_head(), either we see
> > 	   * page_maybe_dma_pinned(), or they see the wrprotect */
> > 	  get_page();
> 
> Is this get_page() a must to be after ptep_set_wrprotect() explicitly?  IIUC
> what we need is to order ptep_set_wrprotect() and page_maybe_dma_pinned() here.
> E.g., would a "mb()" work?
> 
> Another thing is, do we need similar thing for e.g. gup_pte_range(), so that
> to guarantee ordering of try_grab_compound_head() and the pte change check?
> 
> > 
> > 	  if (page_maybe_dma_pinned() && READ_ONCE(src_mm->has_pinned)) {
> > 	       put_page();
> > 	       ptep_clear_wrprotect()
> > 
> > 	       // do copy
> > 	       return
> > 	  }
> >       } else {
> > 	  get_page();
> >       }
> >       page_dup_rmap()
> >  pte_unmap_lock()
> > 
> > Then the do_wp_page() path would have to detect that the page is not
> > write protected under the pte lock inside the fault handler and just
> > do nothing.
> 
> Yes, iiuc do_wp_page() should be able to handle spurious write page faults like
> this already, as below:
> 
> 	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> 	spin_lock(vmf->ptl);
>         ...
> 	if (vmf->flags & FAULT_FLAG_WRITE) {
> 		if (!pte_write(entry))
> 			return do_wp_page(vmf);
> 		entry = pte_mkdirty(entry);
> 	}
> 
> So when spin_lock() returns:
> 
>   - When it's a real cow (not pinned pages; we write-protected it and it keeps
>     write-protected), we should do cow here as usual.
> 
>   - When it's a fake cow (pinned pages), the write bit should have been
>     recovered before the page table lock released, and we'll skip do_wp_page()
>     and retry the page fault immediately.
> 
> > Ie the set/clear could be visible to the CPU and trigger a
> > spurious fault, but never trigger a COW.
> > 
> > Thus 'wp' becomes a 'lock' that prevents GUP from returning this page.
> 
> Another question is, how about read fast-gup for pinning?  Because we can't use
> the write-protect mechanism to block a read gup.  I remember we've discussed
> similar things and iirc your point is "pinned pages should always be with
> WRITE".  However now I still doubt it...  Because I feel like read gup is still
> legal (as I mentioned previously - when device purely writes to the page and
> the processor only reads from it).
> 
> > 
> > Very tricky, deserves a huge comment near the ptep_clear_wrprotect()
> > 
> > Consider the above algorithm beside the gup_fast() algorithm:
> > 
> > 		if (!pte_access_permitted(pte, flags & FOLL_WRITE))
> > 			goto pte_unmap;
> >                 [..]
> > 		head = try_grab_compound_head(page, 1, flags);
> > 		if (!head)
> > 			goto pte_unmap;
> > 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> > 			put_compound_head(head, 1, flags);
> > 			goto pte_unmap;
> > 
> > That last *ptep will check that the WP is not set after making
> > page_maybe_dma_pinned() true.
> > 
> > It still looks reasonable, the extra work is still just the additional
> > atomic in page_maybe_dma_pinned(), just everything else has to be very
> > carefully sequenced due to unlocked page table accessors.
> 
> Tricky!  I'm still thinking about some easier way but no much clue so far.
> Hopefully we'll figure out something solid soon.

Hmm, how about something like below?  Would this be acceptable?

------8<--------
diff --git a/mm/gup.c b/mm/gup.c
index 2d9019bf1773..698bc2b520ac 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2136,6 +2136,18 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
        struct dev_pagemap *pgmap = NULL;
        int nr_start = *nr, ret = 0;
        pte_t *ptep, *ptem;
+       spinlock_t *ptl = NULL;
+
+       /*
+        * More strict with FOLL_PIN, otherwise it could race with fork().  The
+        * page table lock guarantees that fork() will capture all the pinned
+        * pages when dup_mm() and do proper page copy on them.
+        */
+       if (flags & FOLL_PIN) {
+               ptl = pte_lockptr(mm, pmd);
+               if (!spin_trylock(ptl))
+                       return 0;
+       }
 
        ptem = ptep = pte_offset_map(&pmd, addr);
        do {
@@ -2200,6 +2212,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
        ret = 1;
 
 pte_unmap:
+       if (ptl)
+               spin_unlock(ptl);
        if (pgmap)
                put_dev_pagemap(pgmap);
        pte_unmap(ptem);
------8<--------

Both of the solution would fail some fast-gups that might have succeeded in the
past.  The latter solution might even fail more (because pmd lock should be
definitely bigger than a single pte wrprotect), however afaict it's still a
very, very corner case as it's fast-gup+FOLL_PIN+lockfail (and not to mention
fast-gup should be allowed to fail).

To confirm it can fail, I also checked up that we have only one caller of
pin_user_pages_fast_only(), which is i915_gem_userptr_get_pages().  While it's:

	if (mm == current->mm) {
		pvec = kvmalloc_array(num_pages, sizeof(struct page *),
				      GFP_KERNEL |
				      __GFP_NORETRY |
				      __GFP_NOWARN);
		if (pvec) {
			/* defer to worker if malloc fails */
			if (!i915_gem_object_is_readonly(obj))
				gup_flags |= FOLL_WRITE;
			pinned = pin_user_pages_fast_only(obj->userptr.ptr,
							  num_pages, gup_flags,
							  pvec);
		}
	}

So looks like it can fallback to something slow too even if purely unlucky.  So
looks safe so far for either solution above.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-23  9:22         ` Jan Kara
@ 2020-09-23 13:50           ` Peter Xu
  2020-09-23 14:01             ` Jan Kara
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-23 13:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: John Hubbard, linux-mm, linux-kernel, Linus Torvalds,
	Michal Hocko, Kirill Shutemov, Jann Horn, Oleg Nesterov,
	Kirill Tkhai, Hugh Dickins, Leon Romanovsky, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Wed, Sep 23, 2020 at 11:22:05AM +0200, Jan Kara wrote:
> On Tue 22-09-20 13:01:13, John Hubbard wrote:
> > On 9/22/20 3:33 AM, Jan Kara wrote:
> > > On Mon 21-09-20 23:41:16, John Hubbard wrote:
> > > > On 9/21/20 2:20 PM, Peter Xu wrote:
> > > > ...
> > > > > +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> > > > > +		     page_maybe_dma_pinned(src_page))) {
> > > > 
> > > > This condition would make a good static inline function. It's used in 3
> > > > places, and the condition is quite special and worth documenting, and
> > > > having a separate function helps with that, because the function name
> > > > adds to the story. I'd suggest approximately:
> > > > 
> > > >      page_likely_dma_pinned()
> > > > 
> > > > for the name.
> > > 
> > > Well, but we should also capture that this really only works for anonymous
> > > pages. For file pages mm->has_pinned does not work because the page may be
> > > still pinned by completely unrelated process as Jann already properly
> > > pointed out earlier in the thread. So maybe anon_page_likely_pinned()?
> > > Possibly also assert PageAnon(page) in it if we want to be paranoid...
> > > 
> > > 								Honza
> > 
> > The file-backed case doesn't really change anything, though:
> > page_maybe_dma_pinned() is already a "fuzzy yes" in the same sense: you
> > can get a false positive. Just like here, with an mm->has_pinned that
> > could be a false positive for a process.
> > 
> > And for that reason, I'm also not sure an "assert PageAnon(page)" is
> > desirable. That assertion would prevent file-backed callers from being
> > able to call a function that provides a fuzzy answer, but I don't see
> > why you'd want or need to do that. The goal here is to make the fuzzy
> > answer a little bit more definite, but it's not "broken" just because
> > the result is still fuzzy, right?
> > 
> > Apologies if I'm missing a huge point here... :)
> 
> But the problem is that if you apply mm->has_pinned check on file pages,
> you can get false negatives now. And that's not acceptable...

Do you mean the case where proc A pinned page P from a file, then proc B mapped
the same page P on the file, then fork() on proc B?

If proc B didn't explicitly pinned page P in B's address space too, shouldn't
we return "false" for page_likely_dma_pinned(P)?  Because if proc B didn't pin
the page in its own address space, I'd think it's ok to get the page replaced
at any time as long as the content keeps the same.  Or couldn't we?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-23 13:50           ` Peter Xu
@ 2020-09-23 14:01             ` Jan Kara
  2020-09-23 15:44               ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Jan Kara @ 2020-09-23 14:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jan Kara, John Hubbard, linux-mm, linux-kernel, Linus Torvalds,
	Michal Hocko, Kirill Shutemov, Jann Horn, Oleg Nesterov,
	Kirill Tkhai, Hugh Dickins, Leon Romanovsky, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Wed 23-09-20 09:50:04, Peter Xu wrote:
> On Wed, Sep 23, 2020 at 11:22:05AM +0200, Jan Kara wrote:
> > On Tue 22-09-20 13:01:13, John Hubbard wrote:
> > > On 9/22/20 3:33 AM, Jan Kara wrote:
> > > > On Mon 21-09-20 23:41:16, John Hubbard wrote:
> > > > > On 9/21/20 2:20 PM, Peter Xu wrote:
> > > > > ...
> > > > > > +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> > > > > > +		     page_maybe_dma_pinned(src_page))) {
> > > > > 
> > > > > This condition would make a good static inline function. It's used in 3
> > > > > places, and the condition is quite special and worth documenting, and
> > > > > having a separate function helps with that, because the function name
> > > > > adds to the story. I'd suggest approximately:
> > > > > 
> > > > >      page_likely_dma_pinned()
> > > > > 
> > > > > for the name.
> > > > 
> > > > Well, but we should also capture that this really only works for anonymous
> > > > pages. For file pages mm->has_pinned does not work because the page may be
> > > > still pinned by completely unrelated process as Jann already properly
> > > > pointed out earlier in the thread. So maybe anon_page_likely_pinned()?
> > > > Possibly also assert PageAnon(page) in it if we want to be paranoid...
> > > > 
> > > > 								Honza
> > > 
> > > The file-backed case doesn't really change anything, though:
> > > page_maybe_dma_pinned() is already a "fuzzy yes" in the same sense: you
> > > can get a false positive. Just like here, with an mm->has_pinned that
> > > could be a false positive for a process.
> > > 
> > > And for that reason, I'm also not sure an "assert PageAnon(page)" is
> > > desirable. That assertion would prevent file-backed callers from being
> > > able to call a function that provides a fuzzy answer, but I don't see
> > > why you'd want or need to do that. The goal here is to make the fuzzy
> > > answer a little bit more definite, but it's not "broken" just because
> > > the result is still fuzzy, right?
> > > 
> > > Apologies if I'm missing a huge point here... :)
> > 
> > But the problem is that if you apply mm->has_pinned check on file pages,
> > you can get false negatives now. And that's not acceptable...
> 
> Do you mean the case where proc A pinned page P from a file, then proc B
> mapped the same page P on the file, then fork() on proc B?

Yes.

> If proc B didn't explicitly pinned page P in B's address space too,
> shouldn't we return "false" for page_likely_dma_pinned(P)?  Because if
> proc B didn't pin the page in its own address space, I'd think it's ok to
> get the page replaced at any time as long as the content keeps the same.
> Or couldn't we?

So it depends on the reason why you call page_likely_dma_pinned(). For your
COW purposes the check is correct but e.g. for "can filesystem safely
writeback this page" the page_likely_dma_pinned() would be wrong. So I'm
not objecting to the mechanism as such. I'm mainly objecting to the generic
function name which suggests something else than what it really checks and
thus it could be used in wrong places in the future... That's why I'd
prefer to restrict the function to PageAnon pages where there's no risk of
confusion what the check actually does.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-23 13:10               ` Peter Xu
@ 2020-09-23 14:20                 ` Jan Kara
  2020-09-23 17:12                   ` Jason Gunthorpe
  0 siblings, 1 reply; 110+ messages in thread
From: Jan Kara @ 2020-09-23 14:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jason Gunthorpe, John Hubbard, linux-mm, linux-kernel,
	Andrew Morton, Jan Kara, Michal Hocko, Kirill Tkhai,
	Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Linus Torvalds,
	Jann Horn

On Wed 23-09-20 09:10:43, Peter Xu wrote:
> On Tue, Sep 22, 2020 at 08:27:35PM -0400, Peter Xu wrote:
> > On Tue, Sep 22, 2020 at 04:11:16PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Sep 22, 2020 at 01:54:15PM -0400, Peter Xu wrote:
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index 8f3521be80ca..6591f3f33299 100644
> > > > +++ b/mm/memory.c
> > > > @@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > > >                  * Because we'll need to release the locks before doing cow,
> > > >                  * pass this work to upper layer.
> > > >                  */
> > > > -               if (READ_ONCE(src_mm->has_pinned) && wp &&
> > > > -                   page_maybe_dma_pinned(page)) {
> > > > +               if (wp && page_maybe_dma_pinned(page) &&
> > > > +                   READ_ONCE(src_mm->has_pinned)) {
> > > >                         /* We've got the page already; we're safe */
> > > >                         data->cow_old_page = page;
> > > >                         data->cow_oldpte = *src_pte;
> > > > 
> > > > I can also add some more comment to emphasize this.
> > > 
> > > It is not just that, but the ptep_set_wrprotect() has to be done
> > > earlier.
> > 
> > Now I understand your point, I think..  So I guess it's not only about
> > has_pinned, but it should be a race between the fast-gup and the fork() code,
> > even if has_pinned is always set.
> > 
> > > 
> > > Otherwise it races like:
> > > 
> > >    pin_user_pages_fast()                   fork()
> > >     atomic_set(has_pinned, 1);
> > >     [..]
> > >                                            atomic_read(page->_refcount) //false
> > >                                            // skipped atomic_read(has_pinned)
> > >     atomic_add(page->_refcount)
> > >     ordered check write protect()
> > >                                            ordered set write protect()
> > > 
> > > And now have a write protect on a DMA pinned page, which is the
> > > invarient we are trying to create.
> > > 
> > > The best algorithm I've thought of is something like:
> > > 
> > >  pte_map_lock()
> > >   if (page) {
> > >       if (wp) {
> > > 	  ptep_set_wrprotect()
> > > 	  /* Order with try_grab_compound_head(), either we see
> > > 	   * page_maybe_dma_pinned(), or they see the wrprotect */
> > > 	  get_page();
> > 
> > Is this get_page() a must to be after ptep_set_wrprotect() explicitly?  IIUC
> > what we need is to order ptep_set_wrprotect() and page_maybe_dma_pinned() here.
> > E.g., would a "mb()" work?
> > 
> > Another thing is, do we need similar thing for e.g. gup_pte_range(), so that
> > to guarantee ordering of try_grab_compound_head() and the pte change check?
> > 
> > > 
> > > 	  if (page_maybe_dma_pinned() && READ_ONCE(src_mm->has_pinned)) {
> > > 	       put_page();
> > > 	       ptep_clear_wrprotect()
> > > 
> > > 	       // do copy
> > > 	       return
> > > 	  }
> > >       } else {
> > > 	  get_page();
> > >       }
> > >       page_dup_rmap()
> > >  pte_unmap_lock()
> > > 
> > > Then the do_wp_page() path would have to detect that the page is not
> > > write protected under the pte lock inside the fault handler and just
> > > do nothing.
> > 
> > Yes, iiuc do_wp_page() should be able to handle spurious write page faults like
> > this already, as below:
> > 
> > 	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> > 	spin_lock(vmf->ptl);
> >         ...
> > 	if (vmf->flags & FAULT_FLAG_WRITE) {
> > 		if (!pte_write(entry))
> > 			return do_wp_page(vmf);
> > 		entry = pte_mkdirty(entry);
> > 	}
> > 
> > So when spin_lock() returns:
> > 
> >   - When it's a real cow (not pinned pages; we write-protected it and it keeps
> >     write-protected), we should do cow here as usual.
> > 
> >   - When it's a fake cow (pinned pages), the write bit should have been
> >     recovered before the page table lock released, and we'll skip do_wp_page()
> >     and retry the page fault immediately.
> > 
> > > Ie the set/clear could be visible to the CPU and trigger a
> > > spurious fault, but never trigger a COW.
> > > 
> > > Thus 'wp' becomes a 'lock' that prevents GUP from returning this page.
> > 
> > Another question is, how about read fast-gup for pinning?  Because we can't use
> > the write-protect mechanism to block a read gup.  I remember we've discussed
> > similar things and iirc your point is "pinned pages should always be with
> > WRITE".  However now I still doubt it...  Because I feel like read gup is still
> > legal (as I mentioned previously - when device purely writes to the page and
> > the processor only reads from it).
> > 
> > > 
> > > Very tricky, deserves a huge comment near the ptep_clear_wrprotect()
> > > 
> > > Consider the above algorithm beside the gup_fast() algorithm:
> > > 
> > > 		if (!pte_access_permitted(pte, flags & FOLL_WRITE))
> > > 			goto pte_unmap;
> > >                 [..]
> > > 		head = try_grab_compound_head(page, 1, flags);
> > > 		if (!head)
> > > 			goto pte_unmap;
> > > 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> > > 			put_compound_head(head, 1, flags);
> > > 			goto pte_unmap;
> > > 
> > > That last *ptep will check that the WP is not set after making
> > > page_maybe_dma_pinned() true.
> > > 
> > > It still looks reasonable, the extra work is still just the additional
> > > atomic in page_maybe_dma_pinned(), just everything else has to be very
> > > carefully sequenced due to unlocked page table accessors.
> > 
> > Tricky!  I'm still thinking about some easier way but no much clue so far.
> > Hopefully we'll figure out something solid soon.
> 
> Hmm, how about something like below?  Would this be acceptable?
> 
> ------8<--------
> diff --git a/mm/gup.c b/mm/gup.c
> index 2d9019bf1773..698bc2b520ac 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2136,6 +2136,18 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>         struct dev_pagemap *pgmap = NULL;
>         int nr_start = *nr, ret = 0;
>         pte_t *ptep, *ptem;
> +       spinlock_t *ptl = NULL;
> +
> +       /*
> +        * More strict with FOLL_PIN, otherwise it could race with fork().  The
> +        * page table lock guarantees that fork() will capture all the pinned
> +        * pages when dup_mm() and do proper page copy on them.
> +        */
> +       if (flags & FOLL_PIN) {
> +               ptl = pte_lockptr(mm, pmd);
> +               if (!spin_trylock(ptl))
> +                       return 0;
> +       }

I'd hate to take spinlock in the GUP-fast path. Also I don't think this is
quite correct because GUP-fast-only can be called from interrupt context
and page table locks are not interrupt safe. That being said I don't see
what's wrong with the solution Jason proposed of first setting writeprotect 
and then checking page_may_be_dma_pinned() during fork(). That should work
just fine AFAICT... BTW note that GUP-fast code is (and this is deliberated
because e.g. DAX depends on this) first updating page->_refcount and then
rechecking PTE didn't change and the page->_refcount update is actually
done using atomic_add_unless() so that it cannot be reordered wrt the PTE
check. So the fork() code only needs to add barriers to pair with this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-22 12:05   ` Jason Gunthorpe
@ 2020-09-23 15:24     ` Peter Xu
  2020-09-23 16:07       ` Yang Shi
  2020-09-23 17:17       ` Jason Gunthorpe
  0 siblings, 2 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-23 15:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Oleg Nesterov, Kirill Tkhai,
	Hugh Dickins, Leon Romanovsky, Jan Kara, John Hubbard,
	Christoph Hellwig, Andrew Morton, Andrea Arcangeli

On Tue, Sep 22, 2020 at 09:05:05AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 21, 2020 at 05:20:31PM -0400, Peter Xu wrote:
> > Pinned pages shouldn't be write-protected when fork() happens, because follow
> > up copy-on-write on these pages could cause the pinned pages to be replaced by
> > random newly allocated pages.
> > 
> > For huge PMDs, we split the huge pmd if pinning is detected.  So that future
> > handling will be done by the PTE level (with our latest changes, each of the
> > small pages will be copied).  We can achieve this by let copy_huge_pmd() return
> > -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and
> > finally land the next copy_pte_range() call.
> > 
> > Huge PUDs will be even more special - so far it does not support anonymous
> > pages.  But it can actually be done the same as the huge PMDs even if the split
> > huge PUDs means to erase the PUD entries.  It'll guarantee the follow up fault
> > ins will remap the same pages in either parent/child later.
> > 
> > This might not be the most efficient way, but it should be easy and clean
> > enough.  It should be fine, since we're tackling with a very rare case just to
> > make sure userspaces that pinned some thps will still work even without
> > MADV_DONTFORK and after they fork()ed.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> >  mm/huge_memory.c | 26 ++++++++++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 7ff29cc3d55c..c40aac0ad87e 100644
> > +++ b/mm/huge_memory.c
> > @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >  
> >  	src_page = pmd_page(pmd);
> >  	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
> > +
> > +	/*
> > +	 * If this page is a potentially pinned page, split and retry the fault
> > +	 * with smaller page size.  Normally this should not happen because the
> > +	 * userspace should use MADV_DONTFORK upon pinned regions.  This is a
> > +	 * best effort that the pinned pages won't be replaced by another
> > +	 * random page during the coming copy-on-write.
> > +	 */
> > +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> > +		     page_maybe_dma_pinned(src_page))) {
> > +		pte_free(dst_mm, pgtable);
> > +		spin_unlock(src_ptl);
> > +		spin_unlock(dst_ptl);
> > +		__split_huge_pmd(vma, src_pmd, addr, false, NULL);
> > +		return -EAGAIN;
> > +	}
> 
> Not sure why, but the PMD stuff here is not calling is_cow_mapping()
> before doing the write protect. Seems like it might be an existing
> bug?

IMHO it's not a bug, because splitting a huge pmd should always be safe.

One thing I can think of that might be special here is when the pmd is
anonymously mapped but also shared (shared, tmpfs thp, I think?), then here
we'll also mark it as wrprotected even if we don't need to (or maybe we need it
for some reason..).  But again I think it's safe anyways - when page fault
happens, wp_huge_pmd() should split it into smaller pages unconditionally.  I
just don't know whether it's the ideal way for the shared case.  Andrea should
definitely know it better (because it is there since the 1st day of thp).

> 
> In any event, the has_pinned logic shouldn't be used without also
> checking is_cow_mapping(), so it should be added to that test. Same
> remarks for PUD

I think the case mentioned above is also the special case here when we didn't
check is_cow_mapping().  The major difference is whether we'll split the page
right now, or postpone it until the next write to each mm.  But I think, yes,
maybe I should better still keep the is_cow_mapping() to be explicit.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 0/5] mm: Break COW for pinned pages during fork()
  2020-09-23 10:21 ` [PATCH 0/5] mm: Break COW for pinned pages during fork() Leon Romanovsky
@ 2020-09-23 15:37   ` Peter Xu
  0 siblings, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-23 15:37 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, John Hubbard, Oleg Nesterov,
	Linus Torvalds, Jann Horn

On Wed, Sep 23, 2020 at 01:21:19PM +0300, Leon Romanovsky wrote:
> Hi Peter,
> 
> I'm ware that this series is under ongoing review and probably not
> final, but we tested anyway and it solves our RDMA failures.

Hi, Leon,

Yes I think there'll definitely be more version(s), but thank you for the quick
follow up!  This is a very valuable information to know that we're at least
with a good base point.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-23 14:01             ` Jan Kara
@ 2020-09-23 15:44               ` Peter Xu
  2020-09-23 20:19                 ` John Hubbard
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-23 15:44 UTC (permalink / raw)
  To: Jan Kara
  Cc: John Hubbard, linux-mm, linux-kernel, Linus Torvalds,
	Michal Hocko, Kirill Shutemov, Jann Horn, Oleg Nesterov,
	Kirill Tkhai, Hugh Dickins, Leon Romanovsky, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Wed, Sep 23, 2020 at 04:01:14PM +0200, Jan Kara wrote:
> On Wed 23-09-20 09:50:04, Peter Xu wrote:
> > On Wed, Sep 23, 2020 at 11:22:05AM +0200, Jan Kara wrote:
> > > On Tue 22-09-20 13:01:13, John Hubbard wrote:
> > > > On 9/22/20 3:33 AM, Jan Kara wrote:
> > > > > On Mon 21-09-20 23:41:16, John Hubbard wrote:
> > > > > > On 9/21/20 2:20 PM, Peter Xu wrote:
> > > > > > ...
> > > > > > > +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> > > > > > > +		     page_maybe_dma_pinned(src_page))) {
> > > > > > 
> > > > > > This condition would make a good static inline function. It's used in 3
> > > > > > places, and the condition is quite special and worth documenting, and
> > > > > > having a separate function helps with that, because the function name
> > > > > > adds to the story. I'd suggest approximately:
> > > > > > 
> > > > > >      page_likely_dma_pinned()
> > > > > > 
> > > > > > for the name.
> > > > > 
> > > > > Well, but we should also capture that this really only works for anonymous
> > > > > pages. For file pages mm->has_pinned does not work because the page may be
> > > > > still pinned by completely unrelated process as Jann already properly
> > > > > pointed out earlier in the thread. So maybe anon_page_likely_pinned()?
> > > > > Possibly also assert PageAnon(page) in it if we want to be paranoid...
> > > > > 
> > > > > 								Honza
> > > > 
> > > > The file-backed case doesn't really change anything, though:
> > > > page_maybe_dma_pinned() is already a "fuzzy yes" in the same sense: you
> > > > can get a false positive. Just like here, with an mm->has_pinned that
> > > > could be a false positive for a process.
> > > > 
> > > > And for that reason, I'm also not sure an "assert PageAnon(page)" is
> > > > desirable. That assertion would prevent file-backed callers from being
> > > > able to call a function that provides a fuzzy answer, but I don't see
> > > > why you'd want or need to do that. The goal here is to make the fuzzy
> > > > answer a little bit more definite, but it's not "broken" just because
> > > > the result is still fuzzy, right?
> > > > 
> > > > Apologies if I'm missing a huge point here... :)
> > > 
> > > But the problem is that if you apply mm->has_pinned check on file pages,
> > > you can get false negatives now. And that's not acceptable...
> > 
> > Do you mean the case where proc A pinned page P from a file, then proc B
> > mapped the same page P on the file, then fork() on proc B?
> 
> Yes.
> 
> > If proc B didn't explicitly pinned page P in B's address space too,
> > shouldn't we return "false" for page_likely_dma_pinned(P)?  Because if
> > proc B didn't pin the page in its own address space, I'd think it's ok to
> > get the page replaced at any time as long as the content keeps the same.
> > Or couldn't we?
> 
> So it depends on the reason why you call page_likely_dma_pinned(). For your
> COW purposes the check is correct but e.g. for "can filesystem safely
> writeback this page" the page_likely_dma_pinned() would be wrong. So I'm
> not objecting to the mechanism as such. I'm mainly objecting to the generic
> function name which suggests something else than what it really checks and
> thus it could be used in wrong places in the future... That's why I'd
> prefer to restrict the function to PageAnon pages where there's no risk of
> confusion what the check actually does.

How about I introduce the helper as John suggested, but rename it to

  page_maybe_dma_pinned_by_mm()

?

Then we also don't need to judge on which is more likely to happen (between
"maybe" and "likely", since that will confuse me if I only read these words..).

I didn't use any extra suffix like "cow" because I think it might be useful for
things besides cow.  Fundamentally the new helper will be mm-based, so "by_mm"
seems to suite better to me.

Does that sound ok?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-22  6:41   ` John Hubbard
  2020-09-22 10:33     ` Jan Kara
@ 2020-09-23 16:06     ` Peter Xu
  1 sibling, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-23 16:06 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Oleg Nesterov, Kirill Tkhai,
	Hugh Dickins, Leon Romanovsky, Jan Kara, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Mon, Sep 21, 2020 at 11:41:16PM -0700, John Hubbard wrote:
> On 9/21/20 2:20 PM, Peter Xu wrote:
> ...
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 7ff29cc3d55c..c40aac0ad87e 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >   	src_page = pmd_page(pmd);
> >   	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
> > +
> > +	/*
> > +	 * If this page is a potentially pinned page, split and retry the fault
> > +	 * with smaller page size.  Normally this should not happen because the
> > +	 * userspace should use MADV_DONTFORK upon pinned regions.  This is a
> > +	 * best effort that the pinned pages won't be replaced by another
> > +	 * random page during the coming copy-on-write.
> > +	 */
> > +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> > +		     page_maybe_dma_pinned(src_page))) {

[...]

> > +		pte_free(dst_mm, pgtable);
> > +		spin_unlock(src_ptl);
> > +		spin_unlock(dst_ptl);
> > +		__split_huge_pmd(vma, src_pmd, addr, false, NULL);
> > +		return -EAGAIN;
> > +	}
> 
> 
> Why wait until we are so deep into this routine to detect this and unwind?
> It seems like if you could do a check near the beginning of this routine, and
> handle it there, with less unwinding? In fact, after taking only the src_ptl,
> the check could be made, right?

Because that's where we've fetched the page from the pmd so I can directly
reference src_page.  Also I think at least I need to check against swp entries?
So it seems still easier to keep it here, considering it's an unlikely path.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-23 15:24     ` Peter Xu
@ 2020-09-23 16:07       ` Yang Shi
  2020-09-24 15:47         ` Peter Xu
  2020-09-23 17:17       ` Jason Gunthorpe
  1 sibling, 1 reply; 110+ messages in thread
From: Yang Shi @ 2020-09-23 16:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jason Gunthorpe, Linux MM, Linux Kernel Mailing List,
	Linus Torvalds, Michal Hocko, Kirill Shutemov, Jann Horn,
	Oleg Nesterov, Kirill Tkhai, Hugh Dickins, Leon Romanovsky,
	Jan Kara, John Hubbard, Christoph Hellwig, Andrew Morton,
	Andrea Arcangeli

On Wed, Sep 23, 2020 at 8:26 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Sep 22, 2020 at 09:05:05AM -0300, Jason Gunthorpe wrote:
> > On Mon, Sep 21, 2020 at 05:20:31PM -0400, Peter Xu wrote:
> > > Pinned pages shouldn't be write-protected when fork() happens, because follow
> > > up copy-on-write on these pages could cause the pinned pages to be replaced by
> > > random newly allocated pages.
> > >
> > > For huge PMDs, we split the huge pmd if pinning is detected.  So that future
> > > handling will be done by the PTE level (with our latest changes, each of the
> > > small pages will be copied).  We can achieve this by let copy_huge_pmd() return
> > > -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and
> > > finally land the next copy_pte_range() call.
> > >
> > > Huge PUDs will be even more special - so far it does not support anonymous
> > > pages.  But it can actually be done the same as the huge PMDs even if the split
> > > huge PUDs means to erase the PUD entries.  It'll guarantee the follow up fault
> > > ins will remap the same pages in either parent/child later.
> > >
> > > This might not be the most efficient way, but it should be easy and clean
> > > enough.  It should be fine, since we're tackling with a very rare case just to
> > > make sure userspaces that pinned some thps will still work even without
> > > MADV_DONTFORK and after they fork()ed.
> > >
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > >  mm/huge_memory.c | 26 ++++++++++++++++++++++++++
> > >  1 file changed, 26 insertions(+)
> > >
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 7ff29cc3d55c..c40aac0ad87e 100644
> > > +++ b/mm/huge_memory.c
> > > @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > >
> > >     src_page = pmd_page(pmd);
> > >     VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
> > > +
> > > +   /*
> > > +    * If this page is a potentially pinned page, split and retry the fault
> > > +    * with smaller page size.  Normally this should not happen because the
> > > +    * userspace should use MADV_DONTFORK upon pinned regions.  This is a
> > > +    * best effort that the pinned pages won't be replaced by another
> > > +    * random page during the coming copy-on-write.
> > > +    */
> > > +   if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> > > +                page_maybe_dma_pinned(src_page))) {
> > > +           pte_free(dst_mm, pgtable);
> > > +           spin_unlock(src_ptl);
> > > +           spin_unlock(dst_ptl);
> > > +           __split_huge_pmd(vma, src_pmd, addr, false, NULL);
> > > +           return -EAGAIN;
> > > +   }
> >
> > Not sure why, but the PMD stuff here is not calling is_cow_mapping()
> > before doing the write protect. Seems like it might be an existing
> > bug?
>
> IMHO it's not a bug, because splitting a huge pmd should always be safe.
>
> One thing I can think of that might be special here is when the pmd is
> anonymously mapped but also shared (shared, tmpfs thp, I think?), then here
> we'll also mark it as wrprotected even if we don't need to (or maybe we need it
> for some reason..).  But again I think it's safe anyways - when page fault

For tmpfs map, the pmd split just clears the pmd entry without
reinstalling ptes (oppositely anonymous map would reinstall ptes). It
looks this patch intends to copy at pte level by splitting pmd. But
I'm afraid this may not work for tmpfs mappings.

> happens, wp_huge_pmd() should split it into smaller pages unconditionally.  I
> just don't know whether it's the ideal way for the shared case.  Andrea should
> definitely know it better (because it is there since the 1st day of thp).
>
> >
> > In any event, the has_pinned logic shouldn't be used without also
> > checking is_cow_mapping(), so it should be added to that test. Same
> > remarks for PUD
>
> I think the case mentioned above is also the special case here when we didn't
> check is_cow_mapping().  The major difference is whether we'll split the page
> right now, or postpone it until the next write to each mm.  But I think, yes,
> maybe I should better still keep the is_cow_mapping() to be explicit.
>
> Thanks,
>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-23  0:27             ` Peter Xu
  2020-09-23 13:10               ` Peter Xu
@ 2020-09-23 17:07               ` Jason Gunthorpe
  2020-09-24 14:35                 ` Peter Xu
  1 sibling, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-23 17:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Tue, Sep 22, 2020 at 08:27:35PM -0400, Peter Xu wrote:
> On Tue, Sep 22, 2020 at 04:11:16PM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 22, 2020 at 01:54:15PM -0400, Peter Xu wrote:
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 8f3521be80ca..6591f3f33299 100644
> > > +++ b/mm/memory.c
> > > @@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > >                  * Because we'll need to release the locks before doing cow,
> > >                  * pass this work to upper layer.
> > >                  */
> > > -               if (READ_ONCE(src_mm->has_pinned) && wp &&
> > > -                   page_maybe_dma_pinned(page)) {
> > > +               if (wp && page_maybe_dma_pinned(page) &&
> > > +                   READ_ONCE(src_mm->has_pinned)) {
> > >                         /* We've got the page already; we're safe */
> > >                         data->cow_old_page = page;
> > >                         data->cow_oldpte = *src_pte;
> > > 
> > > I can also add some more comment to emphasize this.
> > 
> > It is not just that, but the ptep_set_wrprotect() has to be done
> > earlier.
> 
> Now I understand your point, I think..  So I guess it's not only about
> has_pinned, but it should be a race between the fast-gup and the fork() code,
> even if has_pinned is always set.

Yes

> > The best algorithm I've thought of is something like:
> > 
> >  pte_map_lock()
> >   if (page) {
> >       if (wp) {
> > 	  ptep_set_wrprotect()
> > 	  /* Order with try_grab_compound_head(), either we see
> > 	   * page_maybe_dma_pinned(), or they see the wrprotect */
> > 	  get_page();
> 
> Is this get_page() a must to be after ptep_set_wrprotect()
> explicitly?  

No, just before page_maybe_dma_pinned()

> IIUC what we need is to order ptep_set_wrprotect() and
> page_maybe_dma_pinned() here.  E.g., would a "mb()" work?

mb() is not needed because page_maybe_dma_pinned() has an atomic
barrier too. I like to see get_page() followed immediately by
page_maybe_dma_pinned() since they are accessing the same atomic and
could be fused together someday

> Another thing is, do we need similar thing for e.g. gup_pte_range(), so that
> to guarantee ordering of try_grab_compound_head() and the pte change
> check?

gup_pte_range() is as I quoted? The gup slow path ends up in
follow_page_pte() which uses the pte lock so is OK.
> 
> Another question is, how about read fast-gup for pinning?  Because we can't use
> the write-protect mechanism to block a read gup.  I remember we've discussed
> similar things and iirc your point is "pinned pages should always be with
> WRITE".  However now I still doubt it...  Because I feel like read gup is still
> legal (as I mentioned previously - when device purely writes to the page and
> the processor only reads from it).

We need a definition for what FOLL_PIN means. After this work on fork
I propose that FOLL_PIN means:

  The page is in-use for DMA and the CPU PTE should not be changed
  without explicit involvement of the application (eg via mmap/munmap)

If GUP encounters a read-only page during FOLL_PIN the behavior should
depend on what the fault handler would do. If the fault handler would
trigger COW and replace the PTE then it violates the above. GUP should
do the COW before pinning.

If the fault handler would SIGSEGV then GUP can keep the read-only
page and allow !FOLL_WRITE access. The PTE should not be replaced for
other reasons (though I think there is work there too).

For COW related issues the idea is the mm_struct doing the pin will
never trigger a COW. When other processes hit the COW they copy the
page into their mm and don't touch the source MM's PTE.

Today we do this roughly with FOLL_FORCE and FOLL_WRITE in the users,
but a more nuanced version and documentation would be much clearer.

Unfortunately just doing simple read GUP potentially exposes things to
various COW related data corruption races.

This is a discussion beyond this series though..

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-23 14:20                 ` Jan Kara
@ 2020-09-23 17:12                   ` Jason Gunthorpe
  2020-09-24  7:44                     ` Jan Kara
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-23 17:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Peter Xu, John Hubbard, linux-mm, linux-kernel, Andrew Morton,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Wed, Sep 23, 2020 at 04:20:03PM +0200, Jan Kara wrote:

> I'd hate to take spinlock in the GUP-fast path. Also I don't think this is
> quite correct because GUP-fast-only can be called from interrupt context
> and page table locks are not interrupt safe. 

Yes, IIRC, that is a key element of GUP-fast. Was it something to do
with futexes?

> and then checking page_may_be_dma_pinned() during fork(). That should work
> just fine AFAICT... BTW note that GUP-fast code is (and this is deliberated
> because e.g. DAX depends on this) first updating page->_refcount and then
> rechecking PTE didn't change and the page->_refcount update is actually
> done using atomic_add_unless() so that it cannot be reordered wrt the PTE
> check. So the fork() code only needs to add barriers to pair with this.

It is not just DAX, everything needs this check.

After the page is pinned it is prevented from being freed and
recycled. After GUP has the pin it must check that the PTE still
points at the same page, otherwise it might have pinned a page that is
alreay free'd - and that would be a use-after-free issue.

ason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-21 21:17 ` [PATCH 3/5] mm: Rework return value for copy_one_pte() Peter Xu
  2020-09-22  7:11   ` John Hubbard
  2020-09-22 10:08   ` Oleg Nesterov
@ 2020-09-23 17:16   ` Linus Torvalds
  2020-09-23 21:24     ` Linus Torvalds
  2 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-23 17:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linux-MM, Linux Kernel Mailing List, Jason Gunthorpe,
	Andrew Morton, Jan Kara, Michal Hocko, Kirill Tkhai,
	Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, John Hubbard, Oleg Nesterov, Leon Romanovsky,
	Jann Horn

[-- Attachment #1: Type: text/plain, Size: 2240 bytes --]

On Mon, Sep 21, 2020 at 2:18 PM Peter Xu <peterx@redhat.com> wrote:
>
> There's one special path for copy_one_pte() with swap entries, in which
> add_swap_count_continuation(GFP_ATOMIC) might fail.  In that case we'll return
> the swp_entry_t so that the caller will release the locks and redo the same
> thing with GFP_KERNEL.
>
> It's confusing when copy_one_pte() must return a swp_entry_t (even if all the
> ptes are non-swap entries).  More importantly, we face other requirement to
> extend this "we need to do something else, but without the locks held" case.
>
> Rework the return value into something easier to understand, as defined in enum
> copy_mm_ret.  We'll pass the swp_entry_t back using the newly introduced union
> copy_mm_data parameter.

Ok, I'm reading this series, and I do hate this.

And I think it's unnecessary.

There's a very simple way to avoid this all: split out the
"!pte_present(pte)" case from the function entirely.

That actually makes the code much more legible: that non-present case
is very different, and it's also unlikely() and causes deeper
indentation etc.

Because it's unlikely, it probably also shouldn't be inline.

That unlikely case is also why when then have that special
"out_set_pte" label, which should just go away and be copied into the
(now uninlined) function.

Once that re-organization has been done, the second step is to then
just move the "pte_present()" check into the caller, and suddenly all
the ugly return value games will go entirely away.

I'm attaching the two patches that do this here, but I do want to note
how that first patch is much more legible with "--ignore-all-space",
and then you really see that the diff is a _pure_ code movement thing.
Otherwise it looks like it's doing a big change.

Comments?

NOTE! The intent here is that now we can easily add new argument (a
pre-allocated page or NULL) and a return value to
"copy_present_page()": it can return "I needed a temporary page but
you hadn't allocated one yet" or "I used up the temporary page you
gave me" or "all good, keep the temporary page around for the future".

But these two patches are very intentionally meant to be just "this
clearly changes NO semantics at all".

                   Linus

[-- Attachment #2: 0001-mm-split-out-the-non-present-case-from-copy_one_pte.patch --]
[-- Type: text/x-patch, Size: 6716 bytes --]

From df3a57d1f6072d07978bafa7dbd9904cdf8f3e13 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 23 Sep 2020 09:56:59 -0700
Subject: [PATCH 1/2] mm: split out the non-present case from copy_one_pte()

This is a purely mechanical split of the copy_one_pte() function.  It's
not immediately obvious when looking at the diff because of the
indentation change, but the way to see what is going on in this commit
is to use the "-w" flag to not show pure whitespace changes, and you see
how the first part of copy_one_pte() is simply lifted out into a
separate function.

And since the non-present case is marked unlikely, don't make the new
function be inlined.  Not that gcc really seems to care, since it looks
like it will inline it anyway due to the whole "single callsite for
static function" logic.  In fact, code generation with the function
split is almost identical to before.  But not marking it inline is the
right thing to do.

This is pure prep-work and cleanup for subsequent changes.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 152 ++++++++++++++++++++++++++++------------------------
 1 file changed, 82 insertions(+), 70 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 469af373ae76..31a3ab7d9aa3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -695,6 +695,84 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
  * covered by this vma.
  */
 
+static unsigned long
+copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
+		unsigned long addr, int *rss)
+{
+	unsigned long vm_flags = vma->vm_flags;
+	pte_t pte = *src_pte;
+	struct page *page;
+	swp_entry_t entry = pte_to_swp_entry(pte);
+
+	if (likely(!non_swap_entry(entry))) {
+		if (swap_duplicate(entry) < 0)
+			return entry.val;
+
+		/* make sure dst_mm is on swapoff's mmlist. */
+		if (unlikely(list_empty(&dst_mm->mmlist))) {
+			spin_lock(&mmlist_lock);
+			if (list_empty(&dst_mm->mmlist))
+				list_add(&dst_mm->mmlist,
+						&src_mm->mmlist);
+			spin_unlock(&mmlist_lock);
+		}
+		rss[MM_SWAPENTS]++;
+	} else if (is_migration_entry(entry)) {
+		page = migration_entry_to_page(entry);
+
+		rss[mm_counter(page)]++;
+
+		if (is_write_migration_entry(entry) &&
+				is_cow_mapping(vm_flags)) {
+			/*
+			 * COW mappings require pages in both
+			 * parent and child to be set to read.
+			 */
+			make_migration_entry_read(&entry);
+			pte = swp_entry_to_pte(entry);
+			if (pte_swp_soft_dirty(*src_pte))
+				pte = pte_swp_mksoft_dirty(pte);
+			if (pte_swp_uffd_wp(*src_pte))
+				pte = pte_swp_mkuffd_wp(pte);
+			set_pte_at(src_mm, addr, src_pte, pte);
+		}
+	} else if (is_device_private_entry(entry)) {
+		page = device_private_entry_to_page(entry);
+
+		/*
+		 * Update rss count even for unaddressable pages, as
+		 * they should treated just like normal pages in this
+		 * respect.
+		 *
+		 * We will likely want to have some new rss counters
+		 * for unaddressable pages, at some point. But for now
+		 * keep things as they are.
+		 */
+		get_page(page);
+		rss[mm_counter(page)]++;
+		page_dup_rmap(page, false);
+
+		/*
+		 * We do not preserve soft-dirty information, because so
+		 * far, checkpoint/restore is the only feature that
+		 * requires that. And checkpoint/restore does not work
+		 * when a device driver is involved (you cannot easily
+		 * save and restore device driver state).
+		 */
+		if (is_write_device_private_entry(entry) &&
+		    is_cow_mapping(vm_flags)) {
+			make_device_private_entry_read(&entry);
+			pte = swp_entry_to_pte(entry);
+			if (pte_swp_uffd_wp(*src_pte))
+				pte = pte_swp_mkuffd_wp(pte);
+			set_pte_at(src_mm, addr, src_pte, pte);
+		}
+	}
+	set_pte_at(dst_mm, addr, dst_pte, pte);
+	return 0;
+}
+
 static inline unsigned long
 copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
@@ -705,75 +783,10 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	struct page *page;
 
 	/* pte contains position in swap or file, so copy. */
-	if (unlikely(!pte_present(pte))) {
-		swp_entry_t entry = pte_to_swp_entry(pte);
-
-		if (likely(!non_swap_entry(entry))) {
-			if (swap_duplicate(entry) < 0)
-				return entry.val;
-
-			/* make sure dst_mm is on swapoff's mmlist. */
-			if (unlikely(list_empty(&dst_mm->mmlist))) {
-				spin_lock(&mmlist_lock);
-				if (list_empty(&dst_mm->mmlist))
-					list_add(&dst_mm->mmlist,
-							&src_mm->mmlist);
-				spin_unlock(&mmlist_lock);
-			}
-			rss[MM_SWAPENTS]++;
-		} else if (is_migration_entry(entry)) {
-			page = migration_entry_to_page(entry);
-
-			rss[mm_counter(page)]++;
-
-			if (is_write_migration_entry(entry) &&
-					is_cow_mapping(vm_flags)) {
-				/*
-				 * COW mappings require pages in both
-				 * parent and child to be set to read.
-				 */
-				make_migration_entry_read(&entry);
-				pte = swp_entry_to_pte(entry);
-				if (pte_swp_soft_dirty(*src_pte))
-					pte = pte_swp_mksoft_dirty(pte);
-				if (pte_swp_uffd_wp(*src_pte))
-					pte = pte_swp_mkuffd_wp(pte);
-				set_pte_at(src_mm, addr, src_pte, pte);
-			}
-		} else if (is_device_private_entry(entry)) {
-			page = device_private_entry_to_page(entry);
-
-			/*
-			 * Update rss count even for unaddressable pages, as
-			 * they should treated just like normal pages in this
-			 * respect.
-			 *
-			 * We will likely want to have some new rss counters
-			 * for unaddressable pages, at some point. But for now
-			 * keep things as they are.
-			 */
-			get_page(page);
-			rss[mm_counter(page)]++;
-			page_dup_rmap(page, false);
-
-			/*
-			 * We do not preserve soft-dirty information, because so
-			 * far, checkpoint/restore is the only feature that
-			 * requires that. And checkpoint/restore does not work
-			 * when a device driver is involved (you cannot easily
-			 * save and restore device driver state).
-			 */
-			if (is_write_device_private_entry(entry) &&
-			    is_cow_mapping(vm_flags)) {
-				make_device_private_entry_read(&entry);
-				pte = swp_entry_to_pte(entry);
-				if (pte_swp_uffd_wp(*src_pte))
-					pte = pte_swp_mkuffd_wp(pte);
-				set_pte_at(src_mm, addr, src_pte, pte);
-			}
-		}
-		goto out_set_pte;
-	}
+	if (unlikely(!pte_present(pte)))
+		return copy_nonpresent_pte(dst_mm, src_mm,
+					   dst_pte, src_pte, vma,
+					   addr, rss);
 
 	/*
 	 * If it's a COW mapping, write protect it both
@@ -807,7 +820,6 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		rss[mm_counter(page)]++;
 	}
 
-out_set_pte:
 	set_pte_at(dst_mm, addr, dst_pte, pte);
 	return 0;
 }
-- 
2.28.0.218.gc12ef3d349


[-- Attachment #3: 0002-mm-move-the-copy_one_pte-pte_present-check-into-the-.patch --]
[-- Type: text/x-patch, Size: 2839 bytes --]

From 79a1971c5f14ea3a6e2b0c4caf73a1760db7cab8 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 23 Sep 2020 10:04:16 -0700
Subject: [PATCH 2/2] mm: move the copy_one_pte() pte_present check into the
 caller

This completes the split of the non-present and present pte cases by
moving the check for the source pte being present into the single
caller, which also means that we clearly separate out the very different
return value case for a non-present pte.

The present pte case currently always succeeds.

This is a pure code re-organization with no semantic change: the intent
is to make it much easier to add a new return case to the present pte
case for when we do early COW at page table copy time.

This was split out from the previous commit simply to make it easy to
visually see that there were no semantic changes from this code
re-organization.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 31a3ab7d9aa3..e315b1f1ef08 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -773,8 +773,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	return 0;
 }
 
-static inline unsigned long
-copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+static inline void
+copy_present_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
 		unsigned long addr, int *rss)
 {
@@ -782,12 +782,6 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	pte_t pte = *src_pte;
 	struct page *page;
 
-	/* pte contains position in swap or file, so copy. */
-	if (unlikely(!pte_present(pte)))
-		return copy_nonpresent_pte(dst_mm, src_mm,
-					   dst_pte, src_pte, vma,
-					   addr, rss);
-
 	/*
 	 * If it's a COW mapping, write protect it both
 	 * in the parent and the child
@@ -821,7 +815,6 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	}
 
 	set_pte_at(dst_mm, addr, dst_pte, pte);
-	return 0;
 }
 
 static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
@@ -863,10 +856,17 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			progress++;
 			continue;
 		}
-		entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
+		if (unlikely(!pte_present(*src_pte))) {
+			entry.val = copy_nonpresent_pte(dst_mm, src_mm,
+							dst_pte, src_pte,
 							vma, addr, rss);
-		if (entry.val)
-			break;
+			if (entry.val)
+				break;
+			progress += 8;
+			continue;
+		}
+		copy_present_pte(dst_mm, src_mm, dst_pte, src_pte,
+				 vma, addr, rss);
 		progress += 8;
 	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
 
-- 
2.28.0.218.gc12ef3d349


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-23 15:24     ` Peter Xu
  2020-09-23 16:07       ` Yang Shi
@ 2020-09-23 17:17       ` Jason Gunthorpe
  1 sibling, 0 replies; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-23 17:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Oleg Nesterov, Kirill Tkhai,
	Hugh Dickins, Leon Romanovsky, Jan Kara, John Hubbard,
	Christoph Hellwig, Andrew Morton, Andrea Arcangeli

On Wed, Sep 23, 2020 at 11:24:09AM -0400, Peter Xu wrote:
> On Tue, Sep 22, 2020 at 09:05:05AM -0300, Jason Gunthorpe wrote:
> > On Mon, Sep 21, 2020 at 05:20:31PM -0400, Peter Xu wrote:
> > > Pinned pages shouldn't be write-protected when fork() happens, because follow
> > > up copy-on-write on these pages could cause the pinned pages to be replaced by
> > > random newly allocated pages.
> > > 
> > > For huge PMDs, we split the huge pmd if pinning is detected.  So that future
> > > handling will be done by the PTE level (with our latest changes, each of the
> > > small pages will be copied).  We can achieve this by let copy_huge_pmd() return
> > > -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and
> > > finally land the next copy_pte_range() call.
> > > 
> > > Huge PUDs will be even more special - so far it does not support anonymous
> > > pages.  But it can actually be done the same as the huge PMDs even if the split
> > > huge PUDs means to erase the PUD entries.  It'll guarantee the follow up fault
> > > ins will remap the same pages in either parent/child later.
> > > 
> > > This might not be the most efficient way, but it should be easy and clean
> > > enough.  It should be fine, since we're tackling with a very rare case just to
> > > make sure userspaces that pinned some thps will still work even without
> > > MADV_DONTFORK and after they fork()ed.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > >  mm/huge_memory.c | 26 ++++++++++++++++++++++++++
> > >  1 file changed, 26 insertions(+)
> > > 
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 7ff29cc3d55c..c40aac0ad87e 100644
> > > +++ b/mm/huge_memory.c
> > > @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > >  
> > >  	src_page = pmd_page(pmd);
> > >  	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
> > > +
> > > +	/*
> > > +	 * If this page is a potentially pinned page, split and retry the fault
> > > +	 * with smaller page size.  Normally this should not happen because the
> > > +	 * userspace should use MADV_DONTFORK upon pinned regions.  This is a
> > > +	 * best effort that the pinned pages won't be replaced by another
> > > +	 * random page during the coming copy-on-write.
> > > +	 */
> > > +	if (unlikely(READ_ONCE(src_mm->has_pinned) &&
> > > +		     page_maybe_dma_pinned(src_page))) {
> > > +		pte_free(dst_mm, pgtable);
> > > +		spin_unlock(src_ptl);
> > > +		spin_unlock(dst_ptl);
> > > +		__split_huge_pmd(vma, src_pmd, addr, false, NULL);
> > > +		return -EAGAIN;
> > > +	}
> > 
> > Not sure why, but the PMD stuff here is not calling is_cow_mapping()
> > before doing the write protect. Seems like it might be an existing
> > bug?
> 
> IMHO it's not a bug, because splitting a huge pmd should always be safe.

Sur splitting is safe, but testing has_pinned without checking COW is
not, for what Jann explained.

The 'maybe' in page_maybe_dma_pinned() means it can return true when
the correct answer is false. It can never return false when the
correct answer is true.

It is the same when has_pinned is involved, the combined expression
must never return false when true is correct. Which means it can only
be applied for COW cases.

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-23 15:44               ` Peter Xu
@ 2020-09-23 20:19                 ` John Hubbard
  2020-09-24 18:49                   ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: John Hubbard @ 2020-09-23 20:19 UTC (permalink / raw)
  To: Peter Xu, Jan Kara
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Oleg Nesterov, Kirill Tkhai,
	Hugh Dickins, Leon Romanovsky, Christoph Hellwig, Andrew Morton,
	Jason Gunthorpe, Andrea Arcangeli

On 9/23/20 8:44 AM, Peter Xu wrote:
> On Wed, Sep 23, 2020 at 04:01:14PM +0200, Jan Kara wrote:
>> On Wed 23-09-20 09:50:04, Peter Xu wrote:
...
>>>> But the problem is that if you apply mm->has_pinned check on file pages,
>>>> you can get false negatives now. And that's not acceptable...
>>>
>>> Do you mean the case where proc A pinned page P from a file, then proc B
>>> mapped the same page P on the file, then fork() on proc B?
>>
>> Yes.

aha, thanks for spelling out the false negative problem.

>>
>>> If proc B didn't explicitly pinned page P in B's address space too,
>>> shouldn't we return "false" for page_likely_dma_pinned(P)?  Because if
>>> proc B didn't pin the page in its own address space, I'd think it's ok to
>>> get the page replaced at any time as long as the content keeps the same.
>>> Or couldn't we?
>>
>> So it depends on the reason why you call page_likely_dma_pinned(). For your
>> COW purposes the check is correct but e.g. for "can filesystem safely
>> writeback this page" the page_likely_dma_pinned() would be wrong. So I'm
>> not objecting to the mechanism as such. I'm mainly objecting to the generic
>> function name which suggests something else than what it really checks and
>> thus it could be used in wrong places in the future... That's why I'd
>> prefer to restrict the function to PageAnon pages where there's no risk of
>> confusion what the check actually does.
> 
> How about I introduce the helper as John suggested, but rename it to
> 
>    page_maybe_dma_pinned_by_mm()
> 
> ?
> 
> Then we also don't need to judge on which is more likely to happen (between
> "maybe" and "likely", since that will confuse me if I only read these words..).
>

You're right, it is too subtle of a distinction after all. I agree that sticking
with "_maybe_" avoids that confusion.


> I didn't use any extra suffix like "cow" because I think it might be useful for
> things besides cow.  Fundamentally the new helper will be mm-based, so "by_mm"
> seems to suite better to me.
> 
> Does that sound ok?
> 

Actually, Jan nailed it. I just wasn't understanding his scenario, but now that
I do, and considering your other point about wording, I think we end up with:

     anon_page_maybe_pinned()

as a pretty good name for a helper function. (We don't want "_mm" because that
refers more to the mechanism used internally, rather than the behavior of the
function. "anon_" adds more meaning.)

...now I better go and try to grok what Jason is recommending for the new
meaning of FOLL_PIN, in another tributary of this thread. I don't *think* it affects
this naming point, though. :)

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-23  1:03               ` Peter Xu
@ 2020-09-23 20:25                 ` Linus Torvalds
  2020-09-24 15:08                   ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-23 20:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: Oleg Nesterov, Linux-MM, Linux Kernel Mailing List, Michal Hocko,
	Kirill Shutemov, Jann Horn, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]

On Tue, Sep 22, 2020 at 6:03 PM Peter Xu <peterx@redhat.com> wrote:
>
> > If we rely on "copy_ret == COPY_MM_BREAK_COW" we can unify "again" and
> > "again_break_cow", we don't need to clear ->cow_new_page, this makes the
> > logic more understandable. To me at least ;)
>
> I see your point.  I'll definitely try it out.  I think I'll at least use what
> you preferred above since it's actually the same as before, logically.  Then
> I'll consider drop the again_break_cow, as long as I'm still as confident after
> I do the change on not leaking anything :).

So the two patches I sent out to re-organize copy_one_pte() were
literally meant to make all this mess go away.

IOW, the third patch would be something (COMPLETELY UNTESTED) like the attached.

I think the logic for the preallocation is fairly obvious, but it
might be better to allocate a batch of pages for all I know. That
said, I can't really make myself care about the performance of a
fork() after you've pinned pages in it, so..

                 Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 2943 bytes --]

 mm/memory.c | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index e315b1f1ef08..524aa7183971 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -773,10 +773,14 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	return 0;
 }
 
-static inline void
+/*
+ * This returns 0 for success, >0 for "success, and I used the prealloc page",
+ * and <0 for "you need to preallocate a page and retry".
+ */
+static inline int
 copy_present_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
-		unsigned long addr, int *rss)
+		unsigned long addr, int *rss, struct page *prealloc)
 {
 	unsigned long vm_flags = vma->vm_flags;
 	pte_t pte = *src_pte;
@@ -815,6 +819,7 @@ copy_present_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	}
 
 	set_pte_at(dst_mm, addr, dst_pte, pte);
+	return 0;
 }
 
 static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
@@ -824,16 +829,19 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	pte_t *orig_src_pte, *orig_dst_pte;
 	pte_t *src_pte, *dst_pte;
 	spinlock_t *src_ptl, *dst_ptl;
-	int progress = 0;
+	int progress, used_page;
 	int rss[NR_MM_COUNTERS];
 	swp_entry_t entry = (swp_entry_t){0};
+	struct page *prealloc = NULL;
 
 again:
+	progress = 0;
+	used_page = 0;
 	init_rss_vec(rss);
 
 	dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
 	if (!dst_pte)
-		return -ENOMEM;
+		goto out_of_memory;
 	src_pte = pte_offset_map(src_pmd, addr);
 	src_ptl = pte_lockptr(src_mm, src_pmd);
 	spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
@@ -865,8 +873,12 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			progress += 8;
 			continue;
 		}
-		copy_present_pte(dst_mm, src_mm, dst_pte, src_pte,
-				 vma, addr, rss);
+		/* copy_present_page() may need to have a pre-allocated temporary page */
+		used_page = copy_present_pte(dst_mm, src_mm, dst_pte, src_pte, vma, addr, rss, prealloc);
+		if (used_page < 0)
+			break;
+		if (used_page)
+			prealloc = NULL;
 		progress += 8;
 	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
 
@@ -879,12 +891,24 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 	if (entry.val) {
 		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
+			goto out_of_memory;
+	}
+	/* Did we exit from the pte lock because we needed a new page? */
+	if (used_page < 0) {
+		prealloc = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, addr);
+		if (!prealloc)
 			return -ENOMEM;
-		progress = 0;
 	}
 	if (addr != end)
 		goto again;
+	if (prealloc)
+		free_unref_page(prealloc);
 	return 0;
+
+out_of_memory:
+	if (prealloc)
+		free_unref_page(prealloc);
+	return -ENOMEM;
 }
 
 static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()
  2020-09-23 17:16   ` Linus Torvalds
@ 2020-09-23 21:24     ` Linus Torvalds
  0 siblings, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2020-09-23 21:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linux-MM, Linux Kernel Mailing List, Jason Gunthorpe,
	Andrew Morton, Jan Kara, Michal Hocko, Kirill Tkhai,
	Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, John Hubbard, Oleg Nesterov, Leon Romanovsky,
	Jann Horn

On Wed, Sep 23, 2020 at 10:16 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But these two patches are very intentionally meant to be just "this
> clearly changes NO semantics at all".

The more I look at these, the more I go "this is a cleanup
regardless", so I'll just keep thes in my tree as-is.

                 Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-23 17:12                   ` Jason Gunthorpe
@ 2020-09-24  7:44                     ` Jan Kara
  2020-09-24 14:02                       ` Jason Gunthorpe
  0 siblings, 1 reply; 110+ messages in thread
From: Jan Kara @ 2020-09-24  7:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jan Kara, Peter Xu, John Hubbard, linux-mm, linux-kernel,
	Andrew Morton, Michal Hocko, Kirill Tkhai, Kirill Shutemov,
	Hugh Dickins, Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Wed 23-09-20 14:12:07, Jason Gunthorpe wrote:
> On Wed, Sep 23, 2020 at 04:20:03PM +0200, Jan Kara wrote:
> 
> > I'd hate to take spinlock in the GUP-fast path. Also I don't think this is
> > quite correct because GUP-fast-only can be called from interrupt context
> > and page table locks are not interrupt safe. 
> 
> Yes, IIRC, that is a key element of GUP-fast. Was it something to do
> with futexes?

Honestly, I'm not sure.

> > and then checking page_may_be_dma_pinned() during fork(). That should work
> > just fine AFAICT... BTW note that GUP-fast code is (and this is deliberated
> > because e.g. DAX depends on this) first updating page->_refcount and then
> > rechecking PTE didn't change and the page->_refcount update is actually
> > done using atomic_add_unless() so that it cannot be reordered wrt the PTE
> > check. So the fork() code only needs to add barriers to pair with this.
> 
> It is not just DAX, everything needs this check.
> 
> After the page is pinned it is prevented from being freed and
> recycled. After GUP has the pin it must check that the PTE still
> points at the same page, otherwise it might have pinned a page that is
> alreay free'd - and that would be a use-after-free issue.

I don't think a page use-after-free is really the reason - we add page
reference through page_ref_add_unless(page, x, 0) - i.e., it will fail for
already freed page. It's more about being able to make sure page is not
accessible anymore - and for that modifying pte and then checking page
refcount it *reliable* way to synchronize with GUP-fast...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-21 21:20 ` [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes Peter Xu
  2020-09-21 21:55   ` Jann Horn
  2020-09-22 11:48   ` Oleg Nesterov
@ 2020-09-24 11:48   ` Kirill Tkhai
  2020-09-24 15:16     ` Peter Xu
  2 siblings, 1 reply; 110+ messages in thread
From: Kirill Tkhai @ 2020-09-24 11:48 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Linus Torvalds, Michal Hocko, Kirill Shutemov, Jann Horn,
	Oleg Nesterov, Hugh Dickins, Leon Romanovsky, Jan Kara,
	John Hubbard, Christoph Hellwig, Andrew Morton, Jason Gunthorpe,
	Andrea Arcangeli

On 22.09.2020 00:20, Peter Xu wrote:
> This patch is greatly inspired by the discussions on the list from Linus, Jason
> Gunthorpe and others [1].
> 
> It allows copy_pte_range() to do early cow if the pages were pinned on the
> source mm.  Currently we don't have an accurate way to know whether a page is
> pinned or not.  The only thing we have is page_maybe_dma_pinned().  However
> that's good enough for now.  Especially, with the newly added mm->has_pinned
> flag to make sure we won't affect processes that never pinned any pages.
> 
> It would be easier if we can do GFP_KERNEL allocation within copy_one_pte().
> Unluckily, we can't because we're with the page table locks held for both the
> parent and child processes.  So the page copy process needs to be done outside
> copy_one_pte().
> 
> The new COPY_MM_BREAK_COW is introduced for this - copy_one_pte() would return
> this when it finds any pte that may need an early breaking of cow.
> 
> page_duplicate() is used to handle the page copy process in copy_pte_range().
> Of course we need to do that after releasing of the locks.
> 
> The slightly tricky part is page_duplicate() will fill in the copy_mm_data with
> the new page copied and we'll need to re-install the pte again with page table
> locks held again.  That's done in pte_install_copied_page().
> 
> The whole procedure looks quite similar to wp_page_copy() however it's simpler
> because we know the page is special (pinned) and we know we don't need tlb
> flushings because no one is referencing the new mm yet.
> 
> Though we still have to be very careful on maintaining the two pages (one old
> source page, one new allocated page) across all these lock taking/releasing
> process and make sure neither of them will get lost.
> 
> [1] https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/memory.c | 174 +++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 167 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 1530bb1070f4..8f3521be80ca 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -691,12 +691,72 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>  
>  #define  COPY_MM_DONE               0
>  #define  COPY_MM_SWAP_CONT          1
> +#define  COPY_MM_BREAK_COW          2
>  
>  struct copy_mm_data {
>  	/* COPY_MM_SWAP_CONT */
>  	swp_entry_t entry;
> +	/* COPY_MM_BREAK_COW */
> +	struct {
> +		struct page *cow_old_page; /* Released by page_duplicate() */
> +		struct page *cow_new_page; /* Released by page_release_cow() */
> +		pte_t cow_oldpte;
> +	};
>  };
>  
> +static inline void page_release_cow(struct copy_mm_data *data)
> +{
> +	/* The old page should only be released in page_duplicate() */
> +	WARN_ON_ONCE(data->cow_old_page);
> +
> +	if (data->cow_new_page) {
> +		put_page(data->cow_new_page);
> +		data->cow_new_page = NULL;
> +	}
> +}
> +
> +/*
> + * Duplicate the page for this PTE.  Returns zero if page copied (so we need to
> + * retry on the same PTE again to arm the copied page very soon), or negative
> + * if error happened.  In all cases, the old page will be properly released.
> + */
> +static int page_duplicate(struct mm_struct *src_mm, struct vm_area_struct *vma,
> +			  unsigned long address, struct copy_mm_data *data)
> +{
> +	struct page *new_page = NULL;
> +	int ret;
> +
> +	/* This should have been set in change_one_pte() when reach here */
> +	WARN_ON_ONCE(!data->cow_old_page);

Despite WARN() is preferred over BUG() in kernel, it looks a little strange that
we catch WARN once here, but later avoid panic in put_page().

> +	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> +	if (!new_page) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	copy_user_highpage(new_page, data->cow_old_page, address, vma);
> +	ret = mem_cgroup_charge(new_page, src_mm, GFP_KERNEL);

All failing operations should go first, while copy_user_highpage() should go last.

> +	if (ret) {
> +		put_page(new_page);
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	cgroup_throttle_swaprate(new_page, GFP_KERNEL);
> +	__SetPageUptodate(new_page);
> +
> +	/* So far so good; arm the new page for the next attempt */
> +	data->cow_new_page = new_page;
> +
> +out:
> +	/* Always release the old page */
> +	put_page(data->cow_old_page);
> +	data->cow_old_page = NULL;
> +
> +	return ret;
> +}
> +
>  /*
>   * copy one vm_area from one task to the other. Assumes the page tables
>   * already present in the new task to be cleared in the whole range
> @@ -711,6 +771,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	unsigned long vm_flags = vma->vm_flags;
>  	pte_t pte = *src_pte;
>  	struct page *page;
> +	bool wp;
>  
>  	/* pte contains position in swap or file, so copy. */
>  	if (unlikely(!pte_present(pte))) {
> @@ -789,10 +850,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	 * If it's a COW mapping, write protect it both
>  	 * in the parent and the child
>  	 */
> -	if (is_cow_mapping(vm_flags) && pte_write(pte)) {
> -		ptep_set_wrprotect(src_mm, addr, src_pte);
> -		pte = pte_wrprotect(pte);
> -	}
> +	wp = is_cow_mapping(vm_flags) && pte_write(pte);
>  
>  	/*
>  	 * If it's a shared mapping, mark it clean in
> @@ -813,15 +871,80 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	page = vm_normal_page(vma, addr, pte);
>  	if (page) {
>  		get_page(page);
> +
> +		/*
> +		 * If the page is pinned in source mm, do early cow right now
> +		 * so that the pinned page won't be replaced by another random
> +		 * page without being noticed after the fork().
> +		 *
> +		 * Note: there can be some very rare cases that we'll do
> +		 * unnecessary cow here, due to page_maybe_dma_pinned() is
> +		 * sometimes bogus, and has_pinned flag is currently aggresive
> +		 * too.  However this should be good enough for us for now as
> +		 * long as we covered all the pinned pages.  We can make this
> +		 * better in the future by providing an accurate accounting for
> +		 * pinned pages.
> +		 *
> +		 * Because we'll need to release the locks before doing cow,
> +		 * pass this work to upper layer.
> +		 */
> +		if (READ_ONCE(src_mm->has_pinned) && wp &&
> +		    page_maybe_dma_pinned(page)) {
> +			/* We've got the page already; we're safe */
> +			data->cow_old_page = page;
> +			data->cow_oldpte = *src_pte;
> +			return COPY_MM_BREAK_COW;
> +		}
> +
>  		page_dup_rmap(page, false);
>  		rss[mm_counter(page)]++;
>  	}
>  
> +	if (wp) {
> +		ptep_set_wrprotect(src_mm, addr, src_pte);
> +		pte = pte_wrprotect(pte);
> +	}
> +
>  out_set_pte:
>  	set_pte_at(dst_mm, addr, dst_pte, pte);
>  	return COPY_MM_DONE;
>  }
>  
> +/*
> + * Install the pte with the copied page stored in `data'.  Returns true when
> + * installation completes, or false when src pte has changed.
> + */
> +static int pte_install_copied_page(struct mm_struct *dst_mm,
> +				   struct vm_area_struct *new,
> +				   pte_t *src_pte, pte_t *dst_pte,
> +				   unsigned long addr, int *rss,
> +				   struct copy_mm_data *data)
> +{
> +	struct page *new_page = data->cow_new_page;
> +	pte_t entry;
> +
> +	if (!pte_same(*src_pte, data->cow_oldpte)) {
> +		/* PTE has changed under us.  Release the page and retry */
> +		page_release_cow(data);
> +		return false;
> +	}
> +
> +	entry = mk_pte(new_page, new->vm_page_prot);
> +	entry = pte_sw_mkyoung(entry);
> +	entry = maybe_mkwrite(pte_mkdirty(entry), new);
> +	page_add_new_anon_rmap(new_page, new, addr, false);
> +	set_pte_at(dst_mm, addr, dst_pte, entry);
> +	rss[mm_counter(new_page)]++;
> +
> +	/*
> +	 * Manually clear the new page pointer since we've moved ownership to
> +	 * the newly armed PTE.
> +	 */
> +	data->cow_new_page = NULL;
> +
> +	return true;
> +}
> +
>  static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  		   pmd_t *dst_pmd, pmd_t *src_pmd, struct vm_area_struct *vma,
>  		   struct vm_area_struct *new,
> @@ -830,16 +953,23 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	pte_t *orig_src_pte, *orig_dst_pte;
>  	pte_t *src_pte, *dst_pte;
>  	spinlock_t *src_ptl, *dst_ptl;
> -	int progress, copy_ret = COPY_MM_DONE;
> +	int progress, ret, copy_ret = COPY_MM_DONE;
>  	int rss[NR_MM_COUNTERS];
>  	struct copy_mm_data data;
>  
>  again:
> +	/* We don't reset this for COPY_MM_BREAK_COW */
> +	memset(&data, 0, sizeof(data));
> +
> +again_break_cow:
>  	init_rss_vec(rss);
>  
>  	dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
> -	if (!dst_pte)
> +	if (!dst_pte) {
> +		/* Guarantee that the new page is released if there is */
> +		page_release_cow(&data);
>  		return -ENOMEM;
> +	}
>  	src_pte = pte_offset_map(src_pmd, addr);
>  	src_ptl = pte_lockptr(src_mm, src_pmd);
>  	spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
> @@ -859,6 +989,25 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  			    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
>  				break;
>  		}
> +
> +		if (unlikely(data.cow_new_page)) {
> +			/*
> +			 * If cow_new_page set, we must be at the 2nd round of
> +			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
> +			 * page now.  Note that in all cases page_break_cow()
> +			 * will properly release the objects in copy_mm_data.
> +			 */
> +			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
> +			if (pte_install_copied_page(dst_mm, new, src_pte,
> +						    dst_pte, addr, rss,
> +						    &data)) {

It looks a little confusing, that all helpers in this function return 0 in case of success,
while pte_install_copied_page() returns true. Won't be better to return 0 and -EAGAIN instead
from it?

> +				/* We installed the pte successfully; move on */
> +				progress++;
> +				continue;
> +			}
> +			/* PTE changed.  Retry this pte (falls through) */
> +		}
> +
>  		if (pte_none(*src_pte)) {
>  			progress++;
>  			continue;
> @@ -882,8 +1031,19 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  		if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
>  			return -ENOMEM;
>  		break;
> -	default:
> +	case COPY_MM_BREAK_COW:
> +		/* Do accounting onto parent mm directly */
> +		ret = page_duplicate(src_mm, vma, addr, &data);
> +		if (ret)
> +			return ret;
> +		goto again_break_cow;
> +	case COPY_MM_DONE:
> +		/* This means we're all good. */
>  		break;
> +	default:
> +		/* This should mean copy_ret < 0.  Time to fail this fork().. */
> +		WARN_ON_ONCE(copy_ret >= 0);
> +		return copy_ret;
>  	}
>  
>  	if (addr != end)
> 



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-24  7:44                     ` Jan Kara
@ 2020-09-24 14:02                       ` Jason Gunthorpe
  2020-09-24 14:45                         ` Jan Kara
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-24 14:02 UTC (permalink / raw)
  To: Jan Kara
  Cc: Peter Xu, John Hubbard, linux-mm, linux-kernel, Andrew Morton,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Thu, Sep 24, 2020 at 09:44:09AM +0200, Jan Kara wrote:
> > After the page is pinned it is prevented from being freed and
> > recycled. After GUP has the pin it must check that the PTE still
> > points at the same page, otherwise it might have pinned a page that is
> > alreay free'd - and that would be a use-after-free issue.
> 
> I don't think a page use-after-free is really the reason - we add page
> reference through page_ref_add_unless(page, x, 0) - i.e., it will fail for
> already freed page. 

I mean, the page could have been freed and already reallocated with a
positive refcount, so the add_unless check isn't protective.

The add_unless prevents the page from being freed. The 2nd pte read
ensures it wasn't already freed/reassigned before the pin.

If something drives the page refcount to zero then it is already
synchronized with GUP fast because of the atomic add_unless, no need
to re-check the pte for that case?? But I don't know what the DAX case
is you mentioned.

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-23 17:07               ` Jason Gunthorpe
@ 2020-09-24 14:35                 ` Peter Xu
  2020-09-24 16:51                   ` Jason Gunthorpe
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-24 14:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Wed, Sep 23, 2020 at 02:07:59PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 08:27:35PM -0400, Peter Xu wrote:
> > On Tue, Sep 22, 2020 at 04:11:16PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Sep 22, 2020 at 01:54:15PM -0400, Peter Xu wrote:
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index 8f3521be80ca..6591f3f33299 100644
> > > > +++ b/mm/memory.c
> > > > @@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > > >                  * Because we'll need to release the locks before doing cow,
> > > >                  * pass this work to upper layer.
> > > >                  */
> > > > -               if (READ_ONCE(src_mm->has_pinned) && wp &&
> > > > -                   page_maybe_dma_pinned(page)) {
> > > > +               if (wp && page_maybe_dma_pinned(page) &&
> > > > +                   READ_ONCE(src_mm->has_pinned)) {
> > > >                         /* We've got the page already; we're safe */
> > > >                         data->cow_old_page = page;
> > > >                         data->cow_oldpte = *src_pte;
> > > > 
> > > > I can also add some more comment to emphasize this.
> > > 
> > > It is not just that, but the ptep_set_wrprotect() has to be done
> > > earlier.
> > 
> > Now I understand your point, I think..  So I guess it's not only about
> > has_pinned, but it should be a race between the fast-gup and the fork() code,
> > even if has_pinned is always set.
> 
> Yes
> 
> > > The best algorithm I've thought of is something like:
> > > 
> > >  pte_map_lock()
> > >   if (page) {
> > >       if (wp) {
> > > 	  ptep_set_wrprotect()
> > > 	  /* Order with try_grab_compound_head(), either we see
> > > 	   * page_maybe_dma_pinned(), or they see the wrprotect */
> > > 	  get_page();
> > 
> > Is this get_page() a must to be after ptep_set_wrprotect()
> > explicitly?  
> 
> No, just before page_maybe_dma_pinned()
> 
> > IIUC what we need is to order ptep_set_wrprotect() and
> > page_maybe_dma_pinned() here.  E.g., would a "mb()" work?
> 
> mb() is not needed because page_maybe_dma_pinned() has an atomic
> barrier too. I like to see get_page() followed immediately by
> page_maybe_dma_pinned() since they are accessing the same atomic and
> could be fused together someday

If so, I'd hope you won't disagree that I still move the get_page() out of the
"if (wp)".  Not only it's a shared operation no matter whether "if (wp)" or
not, but I'm afraid it would confuse future readers on a special ordering on
the get_page() and the wrprotect(), especially with the comment above.

> 
> > Another thing is, do we need similar thing for e.g. gup_pte_range(), so that
> > to guarantee ordering of try_grab_compound_head() and the pte change
> > check?
> 
> gup_pte_range() is as I quoted? The gup slow path ends up in
> follow_page_pte() which uses the pte lock so is OK.
> > 
> > Another question is, how about read fast-gup for pinning?  Because we can't use
> > the write-protect mechanism to block a read gup.  I remember we've discussed
> > similar things and iirc your point is "pinned pages should always be with
> > WRITE".  However now I still doubt it...  Because I feel like read gup is still
> > legal (as I mentioned previously - when device purely writes to the page and
> > the processor only reads from it).
> 
> We need a definition for what FOLL_PIN means. After this work on fork
> I propose that FOLL_PIN means:
> 
>   The page is in-use for DMA and the CPU PTE should not be changed
>   without explicit involvement of the application (eg via mmap/munmap)
> 
> If GUP encounters a read-only page during FOLL_PIN the behavior should
> depend on what the fault handler would do. If the fault handler would
> trigger COW and replace the PTE then it violates the above. GUP should
> do the COW before pinning.
> 
> If the fault handler would SIGSEGV then GUP can keep the read-only
> page and allow !FOLL_WRITE access. The PTE should not be replaced for
> other reasons (though I think there is work there too).
> 
> For COW related issues the idea is the mm_struct doing the pin will
> never trigger a COW. When other processes hit the COW they copy the
> page into their mm and don't touch the source MM's PTE.
> 
> Today we do this roughly with FOLL_FORCE and FOLL_WRITE in the users,
> but a more nuanced version and documentation would be much clearer.
> 
> Unfortunately just doing simple read GUP potentially exposes things to
> various COW related data corruption races.
> 
> This is a discussion beyond this series though..

Yes.  It's kind of related here on whether we can still use wrprotect() to
guard against fast-gup, though.  So my understanding is that we should still at
least need the other patch [1] that I proposed in the other thread to force
break-cow for read-only gups (that patch is not only for fast-gup, of course).

But I agree that should be another bigger topic.  I hope we don't need to pick
that patch up someday by another dma report on read-only pinned pages...

Regarding the solution here, I think we can also cover read-only fast-gup too
in the future - IIUC what we need to do is to make it pte_protnone() instead of
pte_wrprotect(), then in the fault handler we should identify this special
pte_protnone() against numa balancing (change_prot_numa()).  I think it should
work fine too, iiuc, because I don't think we should migrate a page at all if
it's pinned for any reason...

So I think I'll focus on the wrprotect() solution for now.  Thanks!

[1] https://lore.kernel.org/lkml/20200915151746.GB2949@xz-x1/

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-24 14:02                       ` Jason Gunthorpe
@ 2020-09-24 14:45                         ` Jan Kara
  0 siblings, 0 replies; 110+ messages in thread
From: Jan Kara @ 2020-09-24 14:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jan Kara, Peter Xu, John Hubbard, linux-mm, linux-kernel,
	Andrew Morton, Michal Hocko, Kirill Tkhai, Kirill Shutemov,
	Hugh Dickins, Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Thu 24-09-20 11:02:37, Jason Gunthorpe wrote:
> On Thu, Sep 24, 2020 at 09:44:09AM +0200, Jan Kara wrote:
> > > After the page is pinned it is prevented from being freed and
> > > recycled. After GUP has the pin it must check that the PTE still
> > > points at the same page, otherwise it might have pinned a page that is
> > > alreay free'd - and that would be a use-after-free issue.
> > 
> > I don't think a page use-after-free is really the reason - we add page
> > reference through page_ref_add_unless(page, x, 0) - i.e., it will fail for
> > already freed page. 
> 
> I mean, the page could have been freed and already reallocated with a
> positive refcount, so the add_unless check isn't protective.
>
> The add_unless prevents the page from being freed. The 2nd pte read
> ensures it wasn't already freed/reassigned before the pin.

Ah, right!

> If something drives the page refcount to zero then it is already
> synchronized with GUP fast because of the atomic add_unless, no need
> to re-check the pte for that case?? But I don't know what the DAX case
> is you mentioned.

DAX needs to make sure no new references (including GUP-fast) can be created
for a page before truncating page from a file and freeing it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-23 20:25                 ` Linus Torvalds
@ 2020-09-24 15:08                   ` Peter Xu
  0 siblings, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-24 15:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Linux-MM, Linux Kernel Mailing List, Michal Hocko,
	Kirill Shutemov, Jann Horn, Kirill Tkhai, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Wed, Sep 23, 2020 at 01:25:52PM -0700, Linus Torvalds wrote:
> IOW, the third patch would be something (COMPLETELY UNTESTED) like the attached.

Thanks.  I'll rework on top.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes
  2020-09-24 11:48   ` Kirill Tkhai
@ 2020-09-24 15:16     ` Peter Xu
  0 siblings, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-24 15:16 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Oleg Nesterov, Hugh Dickins,
	Leon Romanovsky, Jan Kara, John Hubbard, Christoph Hellwig,
	Andrew Morton, Jason Gunthorpe, Andrea Arcangeli

On Thu, Sep 24, 2020 at 02:48:00PM +0300, Kirill Tkhai wrote:
> > +/*
> > + * Duplicate the page for this PTE.  Returns zero if page copied (so we need to
> > + * retry on the same PTE again to arm the copied page very soon), or negative
> > + * if error happened.  In all cases, the old page will be properly released.
> > + */
> > +static int page_duplicate(struct mm_struct *src_mm, struct vm_area_struct *vma,
> > +			  unsigned long address, struct copy_mm_data *data)
> > +{
> > +	struct page *new_page = NULL;
> > +	int ret;
> > +
> > +	/* This should have been set in change_one_pte() when reach here */
> > +	WARN_ON_ONCE(!data->cow_old_page);
> 
> Despite WARN() is preferred over BUG() in kernel, it looks a little strange that
> we catch WARN once here, but later avoid panic in put_page().

Do you mean "it'll panic in put_page()"?  I'll agree if so, seems this
WARN_ON_ONCE() won't help much.

> 
> > +	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> > +	if (!new_page) {
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +
> > +	copy_user_highpage(new_page, data->cow_old_page, address, vma);
> > +	ret = mem_cgroup_charge(new_page, src_mm, GFP_KERNEL);
> 
> All failing operations should go first, while copy_user_highpage() should go last.

Since I'll rebase to Linus's patch, I'll move this into the critical section
because the preallocated page can be used by any pte after that.  The spin
locks will need to be taken longer for that, but assuming that's not a problem
for an unlikely path.

> > @@ -859,6 +989,25 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >  			    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
> >  				break;
> >  		}
> > +
> > +		if (unlikely(data.cow_new_page)) {
> > +			/*
> > +			 * If cow_new_page set, we must be at the 2nd round of
> > +			 * a previous COPY_MM_BREAK_COW.  Try to arm the new
> > +			 * page now.  Note that in all cases page_break_cow()
> > +			 * will properly release the objects in copy_mm_data.
> > +			 */
> > +			WARN_ON_ONCE(copy_ret != COPY_MM_BREAK_COW);
> > +			if (pte_install_copied_page(dst_mm, new, src_pte,
> > +						    dst_pte, addr, rss,
> > +						    &data)) {
> 
> It looks a little confusing, that all helpers in this function return 0 in case of success,
> while pte_install_copied_page() returns true. Won't be better to return 0 and -EAGAIN instead
> from it?

IMHO it's fine as long as no real errno will be popped out of the new helper.
But no strong opinion either, I'll see what I can do after the rebase.

Thanks for reviewing the patch even if it's going away.

-- 
Peter Xu




^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-23 16:07       ` Yang Shi
@ 2020-09-24 15:47         ` Peter Xu
  2020-09-24 17:29           ` Yang Shi
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-24 15:47 UTC (permalink / raw)
  To: Yang Shi
  Cc: Jason Gunthorpe, Linux MM, Linux Kernel Mailing List,
	Linus Torvalds, Michal Hocko, Kirill Shutemov, Jann Horn,
	Oleg Nesterov, Kirill Tkhai, Hugh Dickins, Leon Romanovsky,
	Jan Kara, John Hubbard, Christoph Hellwig, Andrew Morton,
	Andrea Arcangeli

On Wed, Sep 23, 2020 at 09:07:49AM -0700, Yang Shi wrote:
> For tmpfs map, the pmd split just clears the pmd entry without
> reinstalling ptes (oppositely anonymous map would reinstall ptes). It
> looks this patch intends to copy at pte level by splitting pmd. But
> I'm afraid this may not work for tmpfs mappings.

IIUC that's exactly what we want.

We only want to make sure the pinned tmpfs shared pages will be kept there in
the parent.  It's not a must to copy the pages to the child, as long as they
can be faulted in later correctly.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-24 14:35                 ` Peter Xu
@ 2020-09-24 16:51                   ` Jason Gunthorpe
  2020-09-24 17:55                     ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-24 16:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Thu, Sep 24, 2020 at 10:35:17AM -0400, Peter Xu wrote:

> If so, I'd hope you won't disagree that I still move the get_page() out of the
> "if (wp)".  Not only it's a shared operation no matter whether "if (wp)" or
> not, but I'm afraid it would confuse future readers on a special ordering on
> the get_page() and the wrprotect(), especially with the comment above.

Sure, you could add a comment before the page_maybe_dma_pinned that it
could be fused with get_page()

> Yes.  It's kind of related here on whether we can still use wrprotect() to
> guard against fast-gup, though.  So my understanding is that we should still at
> least need the other patch [1] that I proposed in the other thread to force
> break-cow for read-only gups (that patch is not only for fast-gup, of course).

Probably, I haven't intensively studied that patch, and it should go
along with edits to some of the callers..

> But I agree that should be another bigger topic.  I hope we don't need to pick
> that patch up someday by another dma report on read-only pinned pages...

In RDMA we found long ago that read only pins don't work well, I think
most other places are likely the same - the problems are easy enough
to hit. Something like your COW break patch on read is really needed
to allow read-only GUP.

> Regarding the solution here, I think we can also cover read-only fast-gup too
> in the future - IIUC what we need to do is to make it pte_protnone() instead of
> pte_wrprotect(), then in the fault handler we should identify this special
> pte_protnone() against numa balancing (change_prot_numa()).  I think it should
> work fine too, iiuc, because I don't think we should migrate a page at all if
> it's pinned for any reason...

With your COW breaking patch the read only fast-gup should break the
COW because of the write protect, just like for the write side. Not
seeing why we need to do something more?

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-24 15:47         ` Peter Xu
@ 2020-09-24 17:29           ` Yang Shi
  0 siblings, 0 replies; 110+ messages in thread
From: Yang Shi @ 2020-09-24 17:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jason Gunthorpe, Linux MM, Linux Kernel Mailing List,
	Linus Torvalds, Michal Hocko, Kirill Shutemov, Jann Horn,
	Oleg Nesterov, Kirill Tkhai, Hugh Dickins, Leon Romanovsky,
	Jan Kara, John Hubbard, Christoph Hellwig, Andrew Morton,
	Andrea Arcangeli

On Thu, Sep 24, 2020 at 8:47 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Sep 23, 2020 at 09:07:49AM -0700, Yang Shi wrote:
> > For tmpfs map, the pmd split just clears the pmd entry without
> > reinstalling ptes (oppositely anonymous map would reinstall ptes). It
> > looks this patch intends to copy at pte level by splitting pmd. But
> > I'm afraid this may not work for tmpfs mappings.
>
> IIUC that's exactly what we want.
>
> We only want to make sure the pinned tmpfs shared pages will be kept there in
> the parent.  It's not a must to copy the pages to the child, as long as they
> can be faulted in later correctly.

Aha, got your point. Yes, they can be refaulted in later. This is how
the file THP pmd split was designed.

>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-24 16:51                   ` Jason Gunthorpe
@ 2020-09-24 17:55                     ` Peter Xu
  2020-09-24 18:15                       ` Jason Gunthorpe
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-24 17:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Thu, Sep 24, 2020 at 01:51:52PM -0300, Jason Gunthorpe wrote:
> > Regarding the solution here, I think we can also cover read-only fast-gup too
> > in the future - IIUC what we need to do is to make it pte_protnone() instead of
> > pte_wrprotect(), then in the fault handler we should identify this special
> > pte_protnone() against numa balancing (change_prot_numa()).  I think it should
> > work fine too, iiuc, because I don't think we should migrate a page at all if
> > it's pinned for any reason...

[1]

> 
> With your COW breaking patch the read only fast-gup should break the
> COW because of the write protect, just like for the write side. Not
> seeing why we need to do something more?

Consider this sequence of a parent process managed to fork() a child:

       buf = malloc();
       // RDONLY gup
       pin_user_pages(buf, !WRITE);
       // pte of buf duplicated on both sides
       fork();
       mprotect(buf, WRITE);
       *buf = 1;
       // buf page replaced as cow triggered

Currently when fork() we'll happily share a pinned read-only page with the
child by copying the pte directly.  However it also means that starting from
this point, the child started to share this pinned page with the parent.  Then
if we can somehow trigger a "page unshare"/"cow", problem could occur.

In this case I'm using cow (by another mprotect() to trigger).  However I'm not
sure whether this is the only way to replace the pinned page for the parent.

As a summary: imho the important thing is we should not allow any kind of
sharing of any dma page, even it's pinned for read.

If my above understanding is correct - Above [1] may provide a solution for us
(in the future) when we want to block read-only fast-gup too in this patch just
like how we do that using wrprotect().

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-24 17:55                     ` Peter Xu
@ 2020-09-24 18:15                       ` Jason Gunthorpe
  2020-09-24 18:34                         ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-24 18:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Thu, Sep 24, 2020 at 01:55:31PM -0400, Peter Xu wrote:
> On Thu, Sep 24, 2020 at 01:51:52PM -0300, Jason Gunthorpe wrote:
> > > Regarding the solution here, I think we can also cover read-only fast-gup too
> > > in the future - IIUC what we need to do is to make it pte_protnone() instead of
> > > pte_wrprotect(), then in the fault handler we should identify this special
> > > pte_protnone() against numa balancing (change_prot_numa()).  I think it should
> > > work fine too, iiuc, because I don't think we should migrate a page at all if
> > > it's pinned for any reason...
> 
> [1]
> 
> > 
> > With your COW breaking patch the read only fast-gup should break the
> > COW because of the write protect, just like for the write side. Not
> > seeing why we need to do something more?
> 
> Consider this sequence of a parent process managed to fork() a child:
> 
>        buf = malloc();
>        // RDONLY gup
>        pin_user_pages(buf, !WRITE);
>        // pte of buf duplicated on both sides
>        fork();
>        mprotect(buf, WRITE);
>        *buf = 1;
>        // buf page replaced as cow triggered
> 
> Currently when fork() we'll happily share a pinned read-only page with the
> child by copying the pte directly.  

Why? This series prevents that, the page will be maybe_dma_pinned, so
fork() will copy it.

> As a summary: imho the important thing is we should not allow any kind of
> sharing of any dma page, even it's pinned for read.

Any sharing that results in COW. MAP_SHARED is fine, for instance

My feeling for READ when FOLL_PIN is used GUP_fast will go to the slow
path any time it sees a read-only page.

The slow path will determine if it is read-only because it could be
COW'd or read-only for some other reason

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-24 18:15                       ` Jason Gunthorpe
@ 2020-09-24 18:34                         ` Peter Xu
  2020-09-24 18:39                           ` Jason Gunthorpe
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-24 18:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Thu, Sep 24, 2020 at 03:15:01PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 24, 2020 at 01:55:31PM -0400, Peter Xu wrote:
> > On Thu, Sep 24, 2020 at 01:51:52PM -0300, Jason Gunthorpe wrote:
> > > > Regarding the solution here, I think we can also cover read-only fast-gup too
> > > > in the future - IIUC what we need to do is to make it pte_protnone() instead of
> > > > pte_wrprotect(), then in the fault handler we should identify this special
> > > > pte_protnone() against numa balancing (change_prot_numa()).  I think it should
> > > > work fine too, iiuc, because I don't think we should migrate a page at all if
> > > > it's pinned for any reason...
> > 
> > [1]
> > 
> > > 
> > > With your COW breaking patch the read only fast-gup should break the
> > > COW because of the write protect, just like for the write side. Not
> > > seeing why we need to do something more?
> > 
> > Consider this sequence of a parent process managed to fork() a child:
> > 
> >        buf = malloc();

Sorry! I think I missed something like:

           mprotect(buf, !WRITE);

Here.

> >        // RDONLY gup
> >        pin_user_pages(buf, !WRITE);
> >        // pte of buf duplicated on both sides
> >        fork();
> >        mprotect(buf, WRITE);
> >        *buf = 1;
> >        // buf page replaced as cow triggered
> > 
> > Currently when fork() we'll happily share a pinned read-only page with the
> > child by copying the pte directly.  
> 
> Why? This series prevents that, the page will be maybe_dma_pinned, so
> fork() will copy it.

With the extra mprotect(!WRITE), I think we'll see a !pte_write() entry.  Then
it'll not go into maybe_dma_pinned() at all since cow==false.

> 
> > As a summary: imho the important thing is we should not allow any kind of
> > sharing of any dma page, even it's pinned for read.
> 
> Any sharing that results in COW. MAP_SHARED is fine, for instance

Oh right, MAP_SHARED is definitely special.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-24 18:34                         ` Peter Xu
@ 2020-09-24 18:39                           ` Jason Gunthorpe
  2020-09-24 21:30                             ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-24 18:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Thu, Sep 24, 2020 at 02:34:18PM -0400, Peter Xu wrote:

> > >        // RDONLY gup
> > >        pin_user_pages(buf, !WRITE);
> > >        // pte of buf duplicated on both sides
> > >        fork();
> > >        mprotect(buf, WRITE);
> > >        *buf = 1;
> > >        // buf page replaced as cow triggered
> > > 
> > > Currently when fork() we'll happily share a pinned read-only page with the
> > > child by copying the pte directly.  
> > 
> > Why? This series prevents that, the page will be maybe_dma_pinned, so
> > fork() will copy it.
> 
> With the extra mprotect(!WRITE), I think we'll see a !pte_write() entry.  Then
> it'll not go into maybe_dma_pinned() at all since cow==false.

Hum that seems like a problem in this patch, we still need to do the
DMA pinned logic even if the pte is already write protected.

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork()
  2020-09-23 20:19                 ` John Hubbard
@ 2020-09-24 18:49                   ` Peter Xu
  0 siblings, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-24 18:49 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jan Kara, linux-mm, linux-kernel, Linus Torvalds, Michal Hocko,
	Kirill Shutemov, Jann Horn, Oleg Nesterov, Kirill Tkhai,
	Hugh Dickins, Leon Romanovsky, Christoph Hellwig, Andrew Morton,
	Jason Gunthorpe, Andrea Arcangeli

On Wed, Sep 23, 2020 at 01:19:08PM -0700, John Hubbard wrote:
> On 9/23/20 8:44 AM, Peter Xu wrote:
> > On Wed, Sep 23, 2020 at 04:01:14PM +0200, Jan Kara wrote:
> > > On Wed 23-09-20 09:50:04, Peter Xu wrote:
> ...
> > > > > But the problem is that if you apply mm->has_pinned check on file pages,
> > > > > you can get false negatives now. And that's not acceptable...
> > > > 
> > > > Do you mean the case where proc A pinned page P from a file, then proc B
> > > > mapped the same page P on the file, then fork() on proc B?
> > > 
> > > Yes.
> 
> aha, thanks for spelling out the false negative problem.
> 
> > > 
> > > > If proc B didn't explicitly pinned page P in B's address space too,
> > > > shouldn't we return "false" for page_likely_dma_pinned(P)?  Because if
> > > > proc B didn't pin the page in its own address space, I'd think it's ok to
> > > > get the page replaced at any time as long as the content keeps the same.
> > > > Or couldn't we?
> > > 
> > > So it depends on the reason why you call page_likely_dma_pinned(). For your
> > > COW purposes the check is correct but e.g. for "can filesystem safely
> > > writeback this page" the page_likely_dma_pinned() would be wrong. So I'm
> > > not objecting to the mechanism as such. I'm mainly objecting to the generic
> > > function name which suggests something else than what it really checks and
> > > thus it could be used in wrong places in the future... That's why I'd
> > > prefer to restrict the function to PageAnon pages where there's no risk of
> > > confusion what the check actually does.
> > 
> > How about I introduce the helper as John suggested, but rename it to
> > 
> >    page_maybe_dma_pinned_by_mm()
> > 
> > ?
> > 
> > Then we also don't need to judge on which is more likely to happen (between
> > "maybe" and "likely", since that will confuse me if I only read these words..).
> > 
> 
> You're right, it is too subtle of a distinction after all. I agree that sticking
> with "_maybe_" avoids that confusion.
> 
> 
> > I didn't use any extra suffix like "cow" because I think it might be useful for
> > things besides cow.  Fundamentally the new helper will be mm-based, so "by_mm"
> > seems to suite better to me.
> > 
> > Does that sound ok?
> > 
> 
> Actually, Jan nailed it. I just wasn't understanding his scenario, but now that
> I do, and considering your other point about wording, I think we end up with:
> 
>     anon_page_maybe_pinned()
> 
> as a pretty good name for a helper function. (We don't want "_mm" because that
> refers more to the mechanism used internally, rather than the behavior of the
> function. "anon_" adds more meaning.)

Actually it was really my intention when I suggested "_by_mm", because IMHO the
new helper actually means "whether this page may be pinned by _this mm_ (not
any other address space)".  IOW, the case that Jan mentioned on the share page
can be reflected in this case, because although that page was pinned, however
it was not pinned "by this mm" for e.g. proc B above.

Though I've no strong opinion either. I'll start with anon_page_maybe_pinned().
To me it's probably more important to prepare the next spin first and see
whether we'd still like it for this release.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-24 18:39                           ` Jason Gunthorpe
@ 2020-09-24 21:30                             ` Peter Xu
  2020-09-25 19:56                               ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-24 21:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, linux-mm, linux-kernel, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Christoph Hellwig, Andrea Arcangeli, Oleg Nesterov,
	Leon Romanovsky, Linus Torvalds, Jann Horn

On Thu, Sep 24, 2020 at 03:39:53PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 24, 2020 at 02:34:18PM -0400, Peter Xu wrote:
> 
> > > >        // RDONLY gup
> > > >        pin_user_pages(buf, !WRITE);
> > > >        // pte of buf duplicated on both sides
> > > >        fork();
> > > >        mprotect(buf, WRITE);
> > > >        *buf = 1;
> > > >        // buf page replaced as cow triggered
> > > > 
> > > > Currently when fork() we'll happily share a pinned read-only page with the
> > > > child by copying the pte directly.  
> > > 
> > > Why? This series prevents that, the page will be maybe_dma_pinned, so
> > > fork() will copy it.
> > 
> > With the extra mprotect(!WRITE), I think we'll see a !pte_write() entry.  Then
> > it'll not go into maybe_dma_pinned() at all since cow==false.
> 
> Hum that seems like a problem in this patch, we still need to do the
> DMA pinned logic even if the pte is already write protected.

Yes I agree.  I'll take care of that in the next version too.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-24 21:30                             ` Peter Xu
@ 2020-09-25 19:56                               ` Linus Torvalds
  2020-09-25 21:06                                 ` Linus Torvalds
  2020-09-25 21:13                                 ` Peter Xu
  0 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2020-09-25 19:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jason Gunthorpe, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Jann Horn

On Thu, Sep 24, 2020 at 2:30 PM Peter Xu <peterx@redhat.com> wrote:
>
> > >
> > > With the extra mprotect(!WRITE), I think we'll see a !pte_write() entry.  Then
> > > it'll not go into maybe_dma_pinned() at all since cow==false.
> >
> > Hum that seems like a problem in this patch, we still need to do the
> > DMA pinned logic even if the pte is already write protected.
>
> Yes I agree.  I'll take care of that in the next version too.

You people seem to be worrying too much about crazy use cases.

The fact is, if people do pinning, they had better be careful
afterwards. I agree that marking things MADV_DONTFORK may not be
great, and there may be apps that do it. But honestly, if people then
do mprotect() to make a VM non-writable after pinning a page for
writing (and before the IO has completed), such an app only has itself
to blame.

So I don't think this issue is even worth worrying about.  At some
point, when apps do broken things, the kernel says "you broke it, you
get to keep both pieces". Not "Oh, you're doing unreasonable things,
let me help you".

This has dragged out a lot longer than I hoped it would, and I think
it's been over-complicated.

In fact, looking at this all, I'm starting to think that we don't
actually even need the mm_struct.has_pinned logic, because we can work
with something much simpler: the page mapping count.

A pinned page will have the page count increased by
GUP_PIN_COUNTING_BIAS, and my worry was that this would be ambiguous
with the traditional "fork a lot" UNIX style behavior. And that
traditional case is obviously one of the cases we very much don't want
to slow down.

But a pinned page has _another_ thing that is special about it: the
pinning action broke COW.

So I think we can simply add a

        if (page_mapcount(page) != 1)
                return false;

to page_maybe_dma_pinned(), and that very naturally protects against
the "is the page count perhaps elevated due to a lot of forking?"

Because pinning forces the mapcount to 1, and while it is pinned,
nothing else should possibly increase it - since the only thing that
would increase it is fork, and the whole point is that we won't be
doing that "page_dup_rmap()" for this page (which is what increases
the mapcount).

So we actually already have a very nice flag for "this page isn't
duplicated by forking".

And if we keep the existing early "ptep_set_wrprotect()", we also know
that we cannot be racing with another thread that is pinning at the
same time, because the fast-gup code won't be touching a read-only
pte.

So we'll just have to mark it writable again before we release the
page table lock, and we avoid that race too.

And honestly, since this is all getting fairly late in the rc, and it
took longer than I thought, I think we should do the GFP_ATOMIC
approach for now - not great, but since it only triggers for this case
that really should never happen anyway, I think it's probably the best
thing for 5.9, and we can improve on things later.

                 Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-25 19:56                               ` Linus Torvalds
@ 2020-09-25 21:06                                 ` Linus Torvalds
  2020-09-26  0:41                                   ` Jason Gunthorpe
  2020-09-25 21:13                                 ` Peter Xu
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-25 21:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jason Gunthorpe, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Jann Horn

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

On Fri, Sep 25, 2020 at 12:56 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And honestly, since this is all getting fairly late in the rc, and it
> took longer than I thought, I think we should do the GFP_ATOMIC
> approach for now - not great, but since it only triggers for this case
> that really should never happen anyway, I think it's probably the best
> thing for 5.9, and we can improve on things later.

I'm not super-happy with this patch, but I'm throwing it out anyway, in case

 (a) somebody can test it - I don't have any test cases

 (b) somebody can find issues and improve on it

but it's the simplest patch I can come up with for the small-page case.

I have *NOT* tested it. I have tried to think about it, and there are
more lines of comments than there are lines of code, but that only
means that if I didn't think about some case, it's neither in the
comments nor in the code.

I'm happy to take Peter's series too, this is more of an alternative
simplified version to keep the discussion going.

Hmm? What did I miss?

                     Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 5633 bytes --]

 mm/memory.c | 128 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 122 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index f3eb55975902..49ceddd91db4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -773,7 +773,115 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	return 0;
 }
 
-static inline void
+/*
+ * Copy a single small page for fork().
+ *
+ * We have already marked it read-only in the parent if
+ * it's a COW page, and the pte passed in has also been
+ * marked read-only. So the normal thing to do is to
+ * simply increae the page count and the page mapping
+ * count, and the rss, and use the pte as-is. Done.
+ *
+ * However, there is one situation where we can't just
+ * rely on the COW behavior - if the page has been pinned
+ * for DMA in the parent, we can't just give a reference
+ * to it to the child, and say "whoever writes to it will
+ * force a COW". No, the pinned page needs to remain
+ * with the parent, and we need to give the child a copy.
+ *
+ * NOTE! This should never happen. Good pinning users
+ * will either not fork, or will mark the area they pinned
+ * as MADV_DONTFORK so that this situation never comes up.
+ * But if you don't do that...
+ *
+ * Note that if a small page has been pinned, we know the
+ * mapcount for that page should be 1, since the pinning
+ * will have doen the COW at that point. So together with
+ * the elevated refcount, we have very solid heuristics
+ * for "is this page something we need to worry about"
+ */
+static int copy_normal_page(struct vm_area_struct *vma, unsigned long addr,
+		struct mm_struct *src_mm, struct mm_struct *dst_mm,
+		pte_t *src_pte, pte_t *dst_pte,
+		struct page *src_page, int *rss)
+{
+	struct page *dst_page;
+
+	if (likely(!page_maybe_dma_pinned(src_page)))
+		goto reuse_page;
+
+	if (!is_cow_mapping(vma->vm_flags))
+		goto reuse_page;
+
+	if (__page_mapcount(src_page) != 1)
+		goto reuse_page;
+
+	if (!vma->anon_vma || !pte_dirty(*src_pte))
+		goto reuse_page;
+
+	/*
+	 * We have now checked that the page count implies that
+	 * it's pinned, and that it's mapped only in this process,
+	 * and that it's dirty and we have an anonvma (so it's
+	 * an actual write pin, not some read-only one).
+	 *
+	 * That means we have to treat is specially. Nasty.
+	 */
+
+	/*
+	 * Note the wrong 'vma' - source rather than destination.
+	 * It's only used for policy, which is the same.
+	 *
+	 * The bigger issue is that we're holding the ptl lock,
+	 * so this needs to be a non-sleeping allocation.
+	 */
+	dst_page = alloc_page_vma(GFP_ATOMIC | __GFP_HIGH | __GFP_NOWARN, vma, addr);
+	if (!dst_page)
+		return -ENOMEM;
+
+	if (mem_cgroup_charge(dst_page, dst_mm, GFP_ATOMIC)) {
+		put_page(dst_page);
+		return -ENOMEM;
+	}
+	cgroup_throttle_swaprate(dst_page, GFP_ATOMIC);
+	__SetPageUptodate(dst_page);
+
+	copy_user_highpage(dst_page, src_page, addr, vma);
+	*dst_pte = mk_pte(dst_page, vma->vm_page_prot);
+
+	/*
+	 * NOTE! This uses the wrong vma again, but the only thing
+	 * that matters are the vma flags and anon_vma, which are
+	 * the same for source and destination.
+	 */
+	page_add_new_anon_rmap(dst_page, vma, addr, false);
+	lru_cache_add_inactive_or_unevictable(dst_page, vma);
+	rss[mm_counter(dst_page)]++;
+
+	/*
+	 * Final note: make the source writable again. The fact that
+	 * it was unwritable means that we didn't race with any new
+	 * PIN events using fast-GUP, and we've held on to the page
+	 * table lock the whole time so it's safe to just make it
+	 * writable again here.
+	 *
+	 * We might race with hardware walkers, but the dirty bit
+	 * was already set, so no fear of losing a race with a hw
+	 * walker that sets that.
+	 */
+	if (vma->vm_flags & VM_WRITE)
+		*src_pte = pte_mkwrite(*src_pte);
+
+	return 0;
+
+reuse_page:
+	get_page(src_page);
+	page_dup_rmap(src_page, false);
+	rss[mm_counter(src_page)]++;
+	return 0;
+}
+
+static inline int
 copy_present_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
 		unsigned long addr, int *rss)
@@ -809,12 +917,15 @@ copy_present_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 	page = vm_normal_page(vma, addr, pte);
 	if (page) {
-		get_page(page);
-		page_dup_rmap(page, false);
-		rss[mm_counter(page)]++;
+		int error;
+
+		error = copy_normal_page(vma, addr, src_mm, dst_mm, src_pte, &pte, page, rss);
+		if (error)
+			return error;
 	}
 
 	set_pte_at(dst_mm, addr, dst_pte, pte);
+	return 0;
 }
 
 static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
@@ -824,7 +935,7 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	pte_t *orig_src_pte, *orig_dst_pte;
 	pte_t *src_pte, *dst_pte;
 	spinlock_t *src_ptl, *dst_ptl;
-	int progress = 0;
+	int progress = 0, error = 0;
 	int rss[NR_MM_COUNTERS];
 	swp_entry_t entry = (swp_entry_t){0};
 
@@ -865,8 +976,10 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			progress += 8;
 			continue;
 		}
-		copy_present_pte(dst_mm, src_mm, dst_pte, src_pte,
+		error = copy_present_pte(dst_mm, src_mm, dst_pte, src_pte,
 				 vma, addr, rss);
+		if (error)
+			break;
 		progress += 8;
 	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
 
@@ -877,6 +990,9 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	pte_unmap_unlock(orig_dst_pte, dst_ptl);
 	cond_resched();
 
+	if (error)
+		return error;
+
 	if (entry.val) {
 		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
 			return -ENOMEM;

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-25 19:56                               ` Linus Torvalds
  2020-09-25 21:06                                 ` Linus Torvalds
@ 2020-09-25 21:13                                 ` Peter Xu
  2020-09-25 22:08                                   ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-25 21:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Jann Horn

On Fri, Sep 25, 2020 at 12:56:05PM -0700, Linus Torvalds wrote:
> So I think we can simply add a
> 
>         if (page_mapcount(page) != 1)
>                 return false;
> 
> to page_maybe_dma_pinned(), and that very naturally protects against
> the "is the page count perhaps elevated due to a lot of forking?"

How about the MAP_SHARED case where the page is pinned by some process but also
shared (so mapcount can be >1)?

> And honestly, since this is all getting fairly late in the rc, and it
> took longer than I thought, I think we should do the GFP_ATOMIC
> approach for now - not great, but since it only triggers for this case
> that really should never happen anyway, I think it's probably the best
> thing for 5.9, and we can improve on things later.

Sorry for that.  Maybe I should have moved even faster.

Would the ATOMIC version always work?  I mean, I thought it could fail anytime,
so any fork() can start to fail for the tests too.

PS. I do plan to post a GFP_KERNEL version soon today, no matter for this
release or the next one.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-25 21:13                                 ` Peter Xu
@ 2020-09-25 22:08                                   ` Linus Torvalds
  0 siblings, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2020-09-25 22:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jason Gunthorpe, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Jann Horn

On Fri, Sep 25, 2020 at 2:13 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Sep 25, 2020 at 12:56:05PM -0700, Linus Torvalds wrote:
> > So I think we can simply add a
> >
> >         if (page_mapcount(page) != 1)
> >                 return false;
> >
> > to page_maybe_dma_pinned(), and that very naturally protects against
> > the "is the page count perhaps elevated due to a lot of forking?"
>
> How about the MAP_SHARED case where the page is pinned by some process but also
> shared (so mapcount can be >1)?

MAP_SHARED doesn't matter, since it's not getting COW'ed anyway, and
we keep the page around regardless.

So MAP_SHARED is the easy case. We'll never get to any of this code,
because is_cow_mapping() won't be true.

You can still screw up MAP_SHARED if you do a truncate() on the
underlying file or something like that, but that most *definitely*
falls under the "you only have yourself to blame" heading.

> Would the ATOMIC version always work?  I mean, I thought it could fail anytime,
> so any fork() can start to fail for the tests too.

Sure. I'm not really happy about GFP_ATOMIC, but I suspect it works in practice.

Honestly, if somebody first pins megabytes of memory, and then does a
fork(), they are doing some seriously odd and wrong things. So I think
this should be a "we will try to handle it gracefully, but your load
is broken" case.

I am still inclined to add some kind of warning to this case, but I'm
also a bit on the fence wrt the whole "convenience" issue - for some
very occasional use it's probably convenient to not have to worry
about this in user space.

Actually, what I'm even less happy about than the GFP_ATOMIC is how
much annoying boilerplate this just "map anonymous page" required with
the whole cgroup_charge, throttle, anon_rmap, lru_cache_add thing.
Looking at that patch, it all looks _fairly_ simple, but there's a lot
of details that got duplicated from the pte_none() new-page-fault case
(and that the do_cow_page() case also shares)

I understand why it happens, and there's not *that* many cases, it
made me go "ouch, this is a lot of small details, maybe I missed
some", and I got the feeling that I should try to re-organize a few
helper functions to avoid duplicating the same basic code over and
over again.

But I decided that I wanted to keep the patch minimal and as focused
as possible, so I didn't actually do that. But we clearly have decades
of adding rules that just makes even something as "simple" as "add a
new page to a VM" fairly complex.

Also, to avoid making the patch bigger, I skipped your "pass
destination vma around" patch. I think it's the right thing
conceptually, but everything I looked at also screamed "we don't
actually care about the differences" to me.

I dunno. I'm conflicted. This really _feels_ to me like "we're so
close to just fixing this once and for all", but then I also go "maybe
we should just revert everything and do this for 5.10".

Except "reverting everything" is sadly really really problematic too.
It will fix the rdma issue, but in this case "everything" goes all the
way back to "uhhuh, we have a security issue with COW going the wrong
way". Otherwise I'd have gone that way two weeks ago already.

               Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-25 21:06                                 ` Linus Torvalds
@ 2020-09-26  0:41                                   ` Jason Gunthorpe
  2020-09-26  1:15                                     ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-26  0:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, John Hubbard, Linux-MM, Linux Kernel Mailing List,
	Andrew Morton, Jan Kara, Michal Hocko, Kirill Tkhai,
	Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Jann Horn

On Fri, Sep 25, 2020 at 02:06:59PM -0700, Linus Torvalds wrote:
> On Fri, Sep 25, 2020 at 12:56 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > And honestly, since this is all getting fairly late in the rc, and it
> > took longer than I thought, I think we should do the GFP_ATOMIC
> > approach for now - not great, but since it only triggers for this case
> > that really should never happen anyway, I think it's probably the best
> > thing for 5.9, and we can improve on things later.
> 
> I'm not super-happy with this patch, but I'm throwing it out anyway, in case
> 
>  (a) somebody can test it - I don't have any test cases

It looks like it will work and resolve the RDMA case that triggered
this discussion. I will send it to our testers, should hear back
around Monday.

They previously said Peter's v1 patch worked, expecting the same here,
unless something unexpected hits the extra pre-conditions.

Though, we might hit the THP case and find it fails...

>  (b) somebody can find issues and improve on it

The THP hunks from Peter's series looked pretty straightforward, I'd
include at least the PMD one.

As a tiny optimization, the preconditions in copy_normal_page() could
order the atomics last to try and reduce the atomics done per fork.

> I'm happy to take Peter's series too, this is more of an alternative
> simplified version to keep the discussion going.

I don't completely grok the consequences of the anon_vma check. We
can exclude file backed mappings as they are broken for pinning
anyhow, so what is left that could be MAP_PRIVATE of a non-anon_vma?

Feels obscure, probably OK. If something does break userspace could
use MAP_SHARED and be fixed.

Otherwise, I do prefer Peter's version because of the GFP_KERNEL. To
touch on your other email..

It was my hope we could move away from the "This should never
happen". From a RDMA POV this idea was sort of managable years ago,
but now I have folks writing data science/ML software in Python that
deep under the libraries use RDMA and has pinned pages. It was a
Python program that detected this regression.

Having all that "just work" regardless of what foolish stuff happens
in the Python layer is very appealing.

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-26  0:41                                   ` Jason Gunthorpe
@ 2020-09-26  1:15                                     ` Linus Torvalds
  2020-09-26 22:28                                       ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-26  1:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, John Hubbard, Linux-MM, Linux Kernel Mailing List,
	Andrew Morton, Jan Kara, Michal Hocko, Kirill Tkhai,
	Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Jann Horn

On Fri, Sep 25, 2020 at 5:41 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> I don't completely grok the consequences of the anon_vma check. We
> can exclude file backed mappings as they are broken for pinning
> anyhow, so what is left that could be MAP_PRIVATE of a non-anon_vma?

It really shouldn't ever happen.

The only way a COW vma can have a NULL anon_vma should be if it has no
pages mapped at all.

Technically I think it could happen if you only ever mapped the
special zero page in there (but that shouldn't then get to the
"vm_normal_page()").

> Otherwise, I do prefer Peter's version because of the GFP_KERNEL. To
> touch on your other email..

Yeah, no, I just hadn't seen a new version, so I started getting antsy
and that's when I decided to see what a minimal patch looks like.

I think that over the weekend I'll do Peter's version but with the
"page_mapcount() == 1"  check, because I'm starting to like that
better than the mm->has_pinned.

Comments on that plan?

              Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-26  1:15                                     ` Linus Torvalds
@ 2020-09-26 22:28                                       ` Linus Torvalds
  2020-09-27  6:23                                         ` Leon Romanovsky
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-26 22:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, John Hubbard, Linux-MM, Linux Kernel Mailing List,
	Andrew Morton, Jan Kara, Michal Hocko, Kirill Tkhai,
	Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Leon Romanovsky, Jann Horn

On Fri, Sep 25, 2020 at 6:15 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I think that over the weekend I'll do Peter's version but with the
> "page_mapcount() == 1"  check, because I'm starting to like that
> better than the mm->has_pinned.

Actually, rafter the first read-through, I feel like I'll just apply
the series as-is.

But I'll look at it some more, and do another read-through and make
the final decision tomorrow.

If anybody has any concerns about the v2 patch series from Peter, holler.

              Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [mm] 698ac7610f: will-it-scale.per_thread_ops 8.2% improvement
  2020-09-21 21:17 ` [PATCH 1/5] mm: Introduce mm_struct.has_pinned Peter Xu
  2020-09-21 21:43   ` Jann Horn
  2020-09-21 23:53   ` John Hubbard
@ 2020-09-27  0:41   ` kernel test robot
  2 siblings, 0 replies; 110+ messages in thread
From: kernel test robot @ 2020-09-27  0:41 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Jason Gunthorpe, Andrew Morton, Jan Kara,
	Michal Hocko, Kirill Tkhai, Kirill Shutemov, Hugh Dickins,
	Peter Xu, Christoph Hellwig, Andrea Arcangeli, John Hubbard,
	Oleg Nesterov, Leon Romanovsky, Linus Torvalds, Jann Horn,
	0day robot, lkp, ying.huang, feng.tang, zhengjun.xing

[-- Attachment #1: Type: text/plain, Size: 31270 bytes --]

Greeting,

FYI, we noticed a 8.2% improvement of will-it-scale.per_thread_ops due to commit:


commit: 698ac7610f7928ddfa44a0736e89d776579d8b82 ("[PATCH 1/5] mm: Introduce mm_struct.has_pinned")
url: https://github.com/0day-ci/linux/commits/Peter-Xu/mm-Break-COW-for-pinned-pages-during-fork/20200922-052211
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git bcf876870b95592b52519ed4aafcf9d95999bc9c

in testcase: will-it-scale
on test machine: 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: mmap2
	cpufreq_governor: performance
	ucode: 0x4002f01

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp4/mmap2/will-it-scale/0x4002f01

commit: 
  v5.8
  698ac7610f ("mm: Introduce mm_struct.has_pinned")

            v5.8 698ac7610f7928ddfa44a0736e8 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2003            +8.2%       2168        will-it-scale.per_thread_ops
    192350            +8.3%     208245        will-it-scale.workload
      2643 ± 33%     -36.3%       1683        meminfo.Active(file)
      3.88 ±  2%      -0.6        3.25 ±  2%  mpstat.cpu.all.idle%
      0.00 ±  3%      -0.0        0.00 ±  8%  mpstat.cpu.all.iowait%
    307629 ±  3%     +10.5%     340075 ±  3%  numa-numastat.node0.local_node
     15503 ± 60%     +60.2%      24839 ± 25%  numa-numastat.node1.other_node
    161670           -58.7%      66739 ±  4%  vmstat.system.cs
    209406            -3.9%     201176        vmstat.system.in
    364.00 ± 23%     -53.8%     168.00        slabinfo.pid_namespace.active_objs
    364.00 ± 23%     -53.8%     168.00        slabinfo.pid_namespace.num_objs
    985.50 ± 11%     +14.6%       1129 ±  8%  slabinfo.task_group.active_objs
    985.50 ± 11%     +14.6%       1129 ±  8%  slabinfo.task_group.num_objs
    660.25 ± 33%     -36.3%     420.25        proc-vmstat.nr_active_file
    302055            +2.4%     309211        proc-vmstat.nr_file_pages
    281010            +2.6%     288385        proc-vmstat.nr_unevictable
    660.25 ± 33%     -36.3%     420.25        proc-vmstat.nr_zone_active_file
    281010            +2.6%     288385        proc-vmstat.nr_zone_unevictable
  20640832 ± 16%     +40.0%   28888446 ±  6%  cpuidle.C1.time
   1743036 ±  6%     -54.5%     792376 ±  4%  cpuidle.C1.usage
 5.048e+08 ± 54%     -98.3%    8642335 ±  2%  cpuidle.C6.time
    706531 ± 51%     -94.9%      36224        cpuidle.C6.usage
  38313880           -56.5%   16666274 ±  5%  cpuidle.POLL.time
  18289550           -59.1%    7488947 ±  5%  cpuidle.POLL.usage
    302.94 ±  5%     -32.9%     203.13 ±  6%  sched_debug.cfs_rq:/.exec_clock.stddev
     31707 ±  6%     -43.6%      17867 ±  6%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.77 ± 22%     +41.1%       1.09 ±  4%  sched_debug.cfs_rq:/.nr_spread_over.avg
    163292 ± 15%     -75.8%      39543 ± 24%  sched_debug.cfs_rq:/.spread0.avg
    220569 ± 13%     -65.9%      75287 ± 16%  sched_debug.cfs_rq:/.spread0.max
     -1903         +1952.7%     -39073        sched_debug.cfs_rq:/.spread0.min
     31680 ±  6%     -43.7%      17850 ±  6%  sched_debug.cfs_rq:/.spread0.stddev
    698820 ±  2%     -28.4%     500526 ±  3%  sched_debug.cpu.avg_idle.avg
   1100275 ±  3%      -7.2%    1020875 ±  3%  sched_debug.cpu.avg_idle.max
    250869           -58.4%     104239 ±  4%  sched_debug.cpu.nr_switches.avg
    766741 ± 25%     -64.8%     269919 ± 10%  sched_debug.cpu.nr_switches.max
    111347 ± 16%     -50.8%      54786 ± 11%  sched_debug.cpu.nr_switches.min
    108077 ± 11%     -67.3%      35316 ±  8%  sched_debug.cpu.nr_switches.stddev
    262769           -59.1%     107346 ±  4%  sched_debug.cpu.sched_count.avg
    800567 ± 25%     -65.9%     272755 ± 10%  sched_debug.cpu.sched_count.max
    115870 ± 15%     -51.6%      56034 ± 11%  sched_debug.cpu.sched_count.min
    112678 ± 11%     -67.8%      36309 ±  9%  sched_debug.cpu.sched_count.stddev
    122760           -59.2%      50082 ±  4%  sched_debug.cpu.sched_goidle.avg
    372289 ± 25%     -65.9%     126854 ± 10%  sched_debug.cpu.sched_goidle.max
     53911 ± 15%     -51.7%      26040 ± 11%  sched_debug.cpu.sched_goidle.min
     52986 ± 12%     -67.7%      17092 ±  9%  sched_debug.cpu.sched_goidle.stddev
    138914           -59.1%      56816 ±  4%  sched_debug.cpu.ttwu_count.avg
    168089           -57.1%      72151 ±  4%  sched_debug.cpu.ttwu_count.max
    123046           -61.1%      47837 ±  4%  sched_debug.cpu.ttwu_count.min
     11828 ± 15%     -39.8%       7114 ±  4%  sched_debug.cpu.ttwu_count.stddev
      3050 ±  6%     -34.3%       2004 ±  6%  sched_debug.cpu.ttwu_local.max
    383.97 ±  6%     -30.1%     268.54 ± 12%  sched_debug.cpu.ttwu_local.stddev
     51.21            -0.4       50.79        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
     51.24            -0.4       50.82        perf-profile.calltrace.cycles-pp.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.84 ±  8%      -0.1        0.76 ±  4%  perf-profile.calltrace.cycles-pp.secondary_startup_64
      0.83 ±  8%      -0.1        0.74 ±  4%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
      0.83 ±  8%      -0.1        0.74 ±  4%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
      0.83 ±  8%      -0.1        0.74 ±  4%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
     47.27            +0.5       47.79        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     47.27            +0.5       47.79        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     47.26            +0.5       47.78        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     47.25            +0.5       47.78        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     47.28            +0.5       47.81        perf-profile.calltrace.cycles-pp.__munmap
     46.75            +0.5       47.30        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64
     46.24            +0.5       46.79        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
     46.77            +0.6       47.33        perf-profile.calltrace.cycles-pp.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     46.70            +0.6       47.26        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap
      0.84 ±  8%      -0.1        0.76 ±  4%  perf-profile.children.cycles-pp.secondary_startup_64
      0.84 ±  8%      -0.1        0.76 ±  4%  perf-profile.children.cycles-pp.cpu_startup_entry
      0.84 ±  8%      -0.1        0.76 ±  4%  perf-profile.children.cycles-pp.do_idle
      0.83 ±  8%      -0.1        0.74 ±  4%  perf-profile.children.cycles-pp.start_secondary
      0.11 ±  4%      -0.1        0.04 ± 57%  perf-profile.children.cycles-pp.__sched_text_start
      0.14 ±  3%      -0.1        0.08 ±  6%  perf-profile.children.cycles-pp.rwsem_wake
      0.10 ±  4%      -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.wake_up_q
      0.10 ±  4%      -0.0        0.05        perf-profile.children.cycles-pp.try_to_wake_up
      0.07 ±  7%      -0.0        0.05        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.24 ±  2%      +0.0        0.26        perf-profile.children.cycles-pp.mmap_region
      0.42 ±  2%      +0.0        0.45        perf-profile.children.cycles-pp.do_mmap
      0.67            +0.0        0.71 ±  2%  perf-profile.children.cycles-pp.rwsem_spin_on_owner
     97.96            +0.1       98.09        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     98.02            +0.1       98.14        perf-profile.children.cycles-pp.down_write_killable
     97.86            +0.2       98.02        perf-profile.children.cycles-pp.rwsem_optimistic_spin
     96.90            +0.2       97.07        perf-profile.children.cycles-pp.osq_lock
     47.26            +0.5       47.78        perf-profile.children.cycles-pp.__x64_sys_munmap
     47.28            +0.5       47.81        perf-profile.children.cycles-pp.__munmap
     47.25            +0.5       47.78        perf-profile.children.cycles-pp.__vm_munmap
      0.24            -0.0        0.19 ±  4%  perf-profile.self.cycles-pp.rwsem_optimistic_spin
      0.06            -0.0        0.03 ±100%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.09            +0.0        0.10        perf-profile.self.cycles-pp.find_vma
      0.65            +0.0        0.70 ±  2%  perf-profile.self.cycles-pp.rwsem_spin_on_owner
     96.31            +0.2       96.53        perf-profile.self.cycles-pp.osq_lock
      0.66           -21.0%       0.52 ±  2%  perf-stat.i.MPKI
      0.11            +0.0        0.11        perf-stat.i.branch-miss-rate%
  14774053            +4.3%   15410076        perf-stat.i.branch-misses
     41.17            +0.9       42.09        perf-stat.i.cache-miss-rate%
  19999033           -18.7%   16264313 ±  3%  perf-stat.i.cache-misses
  48696879           -20.4%   38779964 ±  2%  perf-stat.i.cache-references
    162553           -58.9%      66790 ±  4%  perf-stat.i.context-switches
    157.36            -3.9%     151.20        perf-stat.i.cpu-migrations
     14271           +23.9%      17676 ±  3%  perf-stat.i.cycles-between-cache-misses
      0.00 ±  8%      +0.0        0.00 ±  4%  perf-stat.i.dTLB-load-miss-rate%
    155048 ±  5%     +46.3%     226826 ±  4%  perf-stat.i.dTLB-load-misses
      0.00 ±  2%      +0.0        0.01        perf-stat.i.dTLB-store-miss-rate%
     16761           +19.2%      19972 ±  2%  perf-stat.i.dTLB-store-misses
 4.118e+08           -15.7%   3.47e+08        perf-stat.i.dTLB-stores
     93.99            +1.4       95.42        perf-stat.i.iTLB-load-miss-rate%
   4103535 ±  3%      -6.7%    3827881        perf-stat.i.iTLB-load-misses
    263239 ±  4%     -28.6%     188037 ±  2%  perf-stat.i.iTLB-loads
     18498 ±  3%      +7.5%      19885        perf-stat.i.instructions-per-iTLB-miss
      0.31          +226.8%       1.01 ±  3%  perf-stat.i.metric.K/sec
     88.51            +1.8       90.30        perf-stat.i.node-load-miss-rate%
   8104644           -13.5%    7009101        perf-stat.i.node-load-misses
   1047577           -28.3%     751122        perf-stat.i.node-loads
   3521008           -13.1%    3058055        perf-stat.i.node-store-misses
      0.64           -20.6%       0.51 ±  2%  perf-stat.overall.MPKI
      0.10            +0.0        0.10        perf-stat.overall.branch-miss-rate%
     41.07            +0.9       41.92        perf-stat.overall.cache-miss-rate%
     14273           +23.7%      17662 ±  3%  perf-stat.overall.cycles-between-cache-misses
      0.00 ±  5%      +0.0        0.00 ±  5%  perf-stat.overall.dTLB-load-miss-rate%
      0.00 ±  2%      +0.0        0.01 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
     93.95            +1.3       95.29        perf-stat.overall.iTLB-load-miss-rate%
     18507 ±  3%      +7.5%      19890        perf-stat.overall.instructions-per-iTLB-miss
     88.55            +1.8       90.32        perf-stat.overall.node-load-miss-rate%
 1.191e+08            -7.0%  1.107e+08        perf-stat.overall.path-length
  14739033            +4.2%   15364552        perf-stat.ps.branch-misses
  19930054           -18.7%   16209879 ±  3%  perf-stat.ps.cache-misses
  48535919           -20.3%   38662093 ±  2%  perf-stat.ps.cache-references
    161841           -58.9%      66477 ±  4%  perf-stat.ps.context-switches
    157.06            -3.9%     150.89        perf-stat.ps.cpu-migrations
    154906 ±  5%     +46.1%     226269 ±  5%  perf-stat.ps.dTLB-load-misses
     16730           +19.1%      19927 ±  2%  perf-stat.ps.dTLB-store-misses
 4.104e+08           -15.7%  3.459e+08        perf-stat.ps.dTLB-stores
   4089203 ±  3%      -6.7%    3814320        perf-stat.ps.iTLB-load-misses
    263134 ±  4%     -28.4%     188441 ±  2%  perf-stat.ps.iTLB-loads
   8076488           -13.5%    6984923        perf-stat.ps.node-load-misses
   1044002           -28.3%     748425        perf-stat.ps.node-loads
   3509068           -13.1%    3048190        perf-stat.ps.node-store-misses
      2270 ±170%     -98.9%      24.50 ±166%  interrupts.36:PCI-MSI.31981569-edge.i40e-eth0-TxRx-0
   3730764 ±  2%     -62.0%    1416300 ±  4%  interrupts.CAL:Function_call_interrupts
      2270 ±170%     -98.9%      24.25 ±168%  interrupts.CPU0.36:PCI-MSI.31981569-edge.i40e-eth0-TxRx-0
    111116 ± 29%     -67.4%      36227 ±  8%  interrupts.CPU0.CAL:Function_call_interrupts
     11091 ± 38%     -62.2%       4195 ± 12%  interrupts.CPU0.RES:Rescheduling_interrupts
     48001 ± 26%     -53.7%      22228 ± 14%  interrupts.CPU1.CAL:Function_call_interrupts
      4189 ± 29%     -43.2%       2378 ±  5%  interrupts.CPU1.RES:Rescheduling_interrupts
     34024 ± 32%     -44.7%      18804 ±  4%  interrupts.CPU10.CAL:Function_call_interrupts
     26171 ± 19%     -26.1%      19334 ±  3%  interrupts.CPU11.CAL:Function_call_interrupts
      2370 ± 13%     -20.8%       1876 ± 18%  interrupts.CPU11.RES:Rescheduling_interrupts
     30568 ± 23%     -45.9%      16537 ±  4%  interrupts.CPU12.CAL:Function_call_interrupts
      3094 ± 27%     -46.9%       1643 ±  8%  interrupts.CPU12.RES:Rescheduling_interrupts
    514.50 ± 10%     +35.6%     697.75 ± 16%  interrupts.CPU12.TLB:TLB_shootdowns
     29819 ± 32%     -41.7%      17393        interrupts.CPU13.CAL:Function_call_interrupts
     35364 ± 38%     -49.6%      17819 ±  9%  interrupts.CPU14.CAL:Function_call_interrupts
    694.75 ± 11%     +23.2%     856.00 ±  9%  interrupts.CPU14.TLB:TLB_shootdowns
     38361 ± 48%     -49.8%      19273 ±  9%  interrupts.CPU16.CAL:Function_call_interrupts
     30069 ± 23%     -31.6%      20559 ±  3%  interrupts.CPU17.CAL:Function_call_interrupts
    809.75 ±  7%     -19.5%     651.75 ± 13%  interrupts.CPU17.TLB:TLB_shootdowns
     28245 ± 29%     -33.1%      18894 ±  6%  interrupts.CPU18.CAL:Function_call_interrupts
     33560 ± 23%     -48.2%      17369 ±  6%  interrupts.CPU19.CAL:Function_call_interrupts
      2863 ± 19%     -40.3%       1709 ± 13%  interrupts.CPU19.RES:Rescheduling_interrupts
     47118 ± 32%     -55.7%      20868 ±  3%  interrupts.CPU2.CAL:Function_call_interrupts
      3897 ± 38%     -48.9%       1991 ±  7%  interrupts.CPU2.RES:Rescheduling_interrupts
     34735 ± 29%     -41.7%      20246 ±  9%  interrupts.CPU20.CAL:Function_call_interrupts
     37232 ± 23%     -46.6%      19883 ± 12%  interrupts.CPU21.CAL:Function_call_interrupts
     32345 ± 16%     -38.6%      19845 ±  6%  interrupts.CPU22.CAL:Function_call_interrupts
     34083 ± 22%     -43.4%      19301 ±  6%  interrupts.CPU23.CAL:Function_call_interrupts
     61308 ± 16%     -76.3%      14529 ± 13%  interrupts.CPU24.CAL:Function_call_interrupts
      6610 ± 26%     -76.3%       1568 ± 11%  interrupts.CPU24.RES:Rescheduling_interrupts
     51384 ± 32%     -75.0%      12848 ± 12%  interrupts.CPU25.CAL:Function_call_interrupts
      4643 ± 39%     -70.6%       1366 ±  9%  interrupts.CPU25.RES:Rescheduling_interrupts
     48788 ± 25%     -71.7%      13826 ± 17%  interrupts.CPU26.CAL:Function_call_interrupts
      4076 ± 32%     -70.5%       1203 ± 18%  interrupts.CPU26.RES:Rescheduling_interrupts
     45702 ± 14%     -70.7%      13369 ± 12%  interrupts.CPU27.CAL:Function_call_interrupts
      3614 ± 21%     -67.3%       1180 ± 16%  interrupts.CPU27.RES:Rescheduling_interrupts
     51216 ± 14%     -71.6%      14546 ± 15%  interrupts.CPU28.CAL:Function_call_interrupts
      4395 ± 24%     -67.9%       1410 ± 21%  interrupts.CPU28.RES:Rescheduling_interrupts
    614.25 ± 18%     +33.3%     818.75 ± 17%  interrupts.CPU28.TLB:TLB_shootdowns
     44945 ± 23%     -66.5%      15059 ± 14%  interrupts.CPU29.CAL:Function_call_interrupts
      3994 ± 34%     -68.2%       1271 ± 10%  interrupts.CPU29.RES:Rescheduling_interrupts
     39154 ± 24%     -41.6%      22857 ±  6%  interrupts.CPU3.CAL:Function_call_interrupts
     45674 ± 11%     -68.3%      14470 ±  8%  interrupts.CPU30.CAL:Function_call_interrupts
      4097 ± 23%     -68.8%       1278 ± 10%  interrupts.CPU30.RES:Rescheduling_interrupts
     51890 ± 13%     -72.6%      14227 ± 16%  interrupts.CPU31.CAL:Function_call_interrupts
      4557 ± 26%     -71.4%       1305 ± 21%  interrupts.CPU31.RES:Rescheduling_interrupts
     41324 ± 23%     -76.0%       9933 ± 11%  interrupts.CPU32.CAL:Function_call_interrupts
      3284 ± 33%     -73.4%     873.75 ± 15%  interrupts.CPU32.RES:Rescheduling_interrupts
     39758 ± 31%     -74.5%      10120 ± 17%  interrupts.CPU33.CAL:Function_call_interrupts
      3373 ± 42%     -74.2%     869.00 ± 15%  interrupts.CPU33.RES:Rescheduling_interrupts
    513.00 ± 27%     +46.0%     748.75 ± 16%  interrupts.CPU33.TLB:TLB_shootdowns
     40015 ± 14%     -72.8%      10885 ±  8%  interrupts.CPU34.CAL:Function_call_interrupts
      3402 ± 25%     -68.2%       1080 ± 13%  interrupts.CPU34.RES:Rescheduling_interrupts
    635.25 ± 22%     +49.3%     948.25 ± 13%  interrupts.CPU34.TLB:TLB_shootdowns
     45251 ± 19%     -75.2%      11204 ± 17%  interrupts.CPU35.CAL:Function_call_interrupts
      3731 ± 31%     -73.4%     992.50 ± 20%  interrupts.CPU35.RES:Rescheduling_interrupts
     43390 ± 11%     -78.3%       9434 ± 15%  interrupts.CPU36.CAL:Function_call_interrupts
      3536 ± 23%     -77.3%     803.75 ± 14%  interrupts.CPU36.RES:Rescheduling_interrupts
     39820 ± 11%     -75.9%       9613 ± 10%  interrupts.CPU37.CAL:Function_call_interrupts
      2987 ± 21%     -70.8%     873.25 ±  9%  interrupts.CPU37.RES:Rescheduling_interrupts
     42969 ± 32%     -76.6%      10068 ± 17%  interrupts.CPU38.CAL:Function_call_interrupts
      3202 ± 36%     -74.4%     818.50 ± 27%  interrupts.CPU38.RES:Rescheduling_interrupts
     35571 ± 16%     -72.4%       9822 ±  9%  interrupts.CPU39.CAL:Function_call_interrupts
      2986 ± 24%     -73.9%     778.75 ± 15%  interrupts.CPU39.RES:Rescheduling_interrupts
     45001 ± 21%     -48.2%      23317 ±  7%  interrupts.CPU4.CAL:Function_call_interrupts
      3689 ± 24%     -43.6%       2080 ±  2%  interrupts.CPU4.RES:Rescheduling_interrupts
     39302 ± 21%     -73.4%      10463 ±  8%  interrupts.CPU40.CAL:Function_call_interrupts
      2968 ± 32%     -71.4%     848.50 ± 18%  interrupts.CPU40.RES:Rescheduling_interrupts
     40826 ± 19%     -75.3%      10070 ± 10%  interrupts.CPU41.CAL:Function_call_interrupts
      3321 ± 30%     -70.9%     967.25 ± 10%  interrupts.CPU41.RES:Rescheduling_interrupts
    700.25 ± 18%     +26.2%     883.75 ± 17%  interrupts.CPU41.TLB:TLB_shootdowns
     35368 ± 11%     -73.7%       9308 ± 15%  interrupts.CPU42.CAL:Function_call_interrupts
      2839 ± 12%     -70.3%     844.50 ± 13%  interrupts.CPU42.RES:Rescheduling_interrupts
     45459 ± 25%     -78.7%       9687 ± 11%  interrupts.CPU43.CAL:Function_call_interrupts
      3703 ± 29%     -74.1%     959.50 ± 16%  interrupts.CPU43.RES:Rescheduling_interrupts
     41495 ± 15%     -77.1%       9522 ± 12%  interrupts.CPU44.CAL:Function_call_interrupts
      3153 ± 28%     -75.0%     789.75 ± 15%  interrupts.CPU44.RES:Rescheduling_interrupts
     38501 ± 26%     -72.5%      10601 ± 14%  interrupts.CPU45.CAL:Function_call_interrupts
      3024 ± 38%     -73.8%     791.00 ± 19%  interrupts.CPU45.RES:Rescheduling_interrupts
     39083 ± 35%     -73.6%      10323 ± 18%  interrupts.CPU46.CAL:Function_call_interrupts
      3173 ± 37%     -73.9%     829.75 ± 24%  interrupts.CPU46.RES:Rescheduling_interrupts
     44486 ± 20%     -75.3%      10968 ± 15%  interrupts.CPU47.CAL:Function_call_interrupts
      3773 ± 34%     -76.4%     890.25 ± 24%  interrupts.CPU47.RES:Rescheduling_interrupts
     34967 ± 42%     -51.0%      17117 ± 10%  interrupts.CPU48.CAL:Function_call_interrupts
     31969 ± 38%     -51.7%      15432 ± 12%  interrupts.CPU49.CAL:Function_call_interrupts
     33786 ± 12%     -29.2%      23918 ±  5%  interrupts.CPU5.CAL:Function_call_interrupts
      3014 ± 16%     -33.2%       2012 ±  9%  interrupts.CPU5.RES:Rescheduling_interrupts
     30514 ± 29%     -46.4%      16343 ±  6%  interrupts.CPU51.CAL:Function_call_interrupts
     34448 ± 26%     -48.7%      17686 ±  4%  interrupts.CPU52.CAL:Function_call_interrupts
      2811 ± 34%     -42.0%       1631 ±  4%  interrupts.CPU52.RES:Rescheduling_interrupts
     30848 ± 33%     -47.9%      16059 ±  3%  interrupts.CPU54.CAL:Function_call_interrupts
     31017 ± 22%     -52.7%      14676 ±  7%  interrupts.CPU55.CAL:Function_call_interrupts
      2501 ± 41%     -41.8%       1455 ± 10%  interrupts.CPU55.RES:Rescheduling_interrupts
     28249 ± 23%     -46.9%      14997 ± 10%  interrupts.CPU56.CAL:Function_call_interrupts
      2113 ± 18%     -36.0%       1352 ± 15%  interrupts.CPU56.RES:Rescheduling_interrupts
     27658 ± 20%     -49.3%      14034 ±  3%  interrupts.CPU57.CAL:Function_call_interrupts
     26559 ± 34%     -40.6%      15778 ± 11%  interrupts.CPU58.CAL:Function_call_interrupts
     27984 ± 27%     -39.9%      16815 ± 12%  interrupts.CPU59.CAL:Function_call_interrupts
     35098 ± 33%     -37.5%      21921        interrupts.CPU6.CAL:Function_call_interrupts
      3073 ± 37%     -40.6%       1825 ±  8%  interrupts.CPU6.RES:Rescheduling_interrupts
     29248 ± 38%     -48.2%      15149 ±  6%  interrupts.CPU60.CAL:Function_call_interrupts
     30880 ± 33%     -52.3%      14722 ± 10%  interrupts.CPU61.CAL:Function_call_interrupts
     31218 ± 43%     -51.5%      15152 ±  3%  interrupts.CPU62.CAL:Function_call_interrupts
     29210 ± 40%     -46.5%      15627 ±  6%  interrupts.CPU63.CAL:Function_call_interrupts
     26813 ± 15%     -39.0%      16343 ± 13%  interrupts.CPU64.CAL:Function_call_interrupts
     24791 ± 14%     -32.6%      16704 ± 10%  interrupts.CPU67.CAL:Function_call_interrupts
     29638 ± 33%     -42.9%      16914 ±  9%  interrupts.CPU68.CAL:Function_call_interrupts
     36247 ± 33%     -48.5%      18670 ±  8%  interrupts.CPU69.CAL:Function_call_interrupts
     30379 ± 24%     -30.6%      21096 ±  5%  interrupts.CPU7.CAL:Function_call_interrupts
      3027 ± 25%     -32.0%       2059 ±  3%  interrupts.CPU7.RES:Rescheduling_interrupts
     31064 ± 25%     -42.8%      17774        interrupts.CPU70.CAL:Function_call_interrupts
     52949 ± 14%     -77.0%      12198 ± 10%  interrupts.CPU72.CAL:Function_call_interrupts
      4057 ± 23%     -75.7%     985.00 ±  8%  interrupts.CPU72.RES:Rescheduling_interrupts
     42694 ± 22%     -73.6%      11281 ± 16%  interrupts.CPU73.CAL:Function_call_interrupts
      3318 ± 39%     -74.3%     851.75 ± 16%  interrupts.CPU73.RES:Rescheduling_interrupts
     49143 ± 24%     -76.1%      11756 ± 12%  interrupts.CPU74.CAL:Function_call_interrupts
      3606 ± 31%     -73.8%     946.00 ± 12%  interrupts.CPU74.RES:Rescheduling_interrupts
     50587 ± 24%     -72.5%      13930 ± 20%  interrupts.CPU75.CAL:Function_call_interrupts
      3655 ± 36%     -71.1%       1056 ± 12%  interrupts.CPU75.RES:Rescheduling_interrupts
     57791 ± 21%     -78.4%      12488 ± 11%  interrupts.CPU76.CAL:Function_call_interrupts
      5109 ± 37%     -79.7%       1037 ± 20%  interrupts.CPU76.RES:Rescheduling_interrupts
     52455 ± 26%     -75.4%      12922 ±  5%  interrupts.CPU77.CAL:Function_call_interrupts
      3997 ± 37%     -73.9%       1043 ± 14%  interrupts.CPU77.RES:Rescheduling_interrupts
     49188 ± 21%     -74.5%      12521 ± 10%  interrupts.CPU78.CAL:Function_call_interrupts
      3867 ± 42%     -74.5%     986.25 ± 18%  interrupts.CPU78.RES:Rescheduling_interrupts
     45517 ± 19%     -72.6%      12484 ± 19%  interrupts.CPU79.CAL:Function_call_interrupts
      3369 ± 34%     -71.9%     946.25 ± 20%  interrupts.CPU79.RES:Rescheduling_interrupts
     30702 ± 22%     -39.9%      18462 ±  9%  interrupts.CPU8.CAL:Function_call_interrupts
      2580 ± 28%     -39.9%       1550 ± 10%  interrupts.CPU8.RES:Rescheduling_interrupts
     35561 ± 30%     -69.8%      10728 ± 12%  interrupts.CPU80.CAL:Function_call_interrupts
      2675 ± 44%     -70.6%     787.50 ±  7%  interrupts.CPU80.RES:Rescheduling_interrupts
     38762 ± 33%     -73.3%      10349 ± 18%  interrupts.CPU81.CAL:Function_call_interrupts
      2892 ± 48%     -70.5%     853.25 ± 20%  interrupts.CPU81.RES:Rescheduling_interrupts
     46500 ± 39%     -80.2%       9203 ±  6%  interrupts.CPU82.CAL:Function_call_interrupts
      3726 ± 41%     -83.3%     622.75 ± 12%  interrupts.CPU82.RES:Rescheduling_interrupts
     42125 ± 25%     -76.0%      10103 ±  7%  interrupts.CPU83.CAL:Function_call_interrupts
      3275 ± 40%     -75.4%     804.50 ±  6%  interrupts.CPU83.RES:Rescheduling_interrupts
     37359 ± 28%     -74.7%       9436 ±  7%  interrupts.CPU84.CAL:Function_call_interrupts
      2762 ± 45%     -71.1%     797.50 ± 17%  interrupts.CPU84.RES:Rescheduling_interrupts
     38900 ± 13%     -76.2%       9272 ±  8%  interrupts.CPU85.CAL:Function_call_interrupts
      2704 ± 27%     -77.0%     622.25 ± 10%  interrupts.CPU85.RES:Rescheduling_interrupts
     40662 ± 24%     -77.2%       9274 ± 14%  interrupts.CPU86.CAL:Function_call_interrupts
      3139 ± 39%     -79.5%     643.00 ± 28%  interrupts.CPU86.RES:Rescheduling_interrupts
     33538 ± 23%     -71.7%       9484 ± 14%  interrupts.CPU87.CAL:Function_call_interrupts
      2406 ± 40%     -73.5%     638.25 ± 21%  interrupts.CPU87.RES:Rescheduling_interrupts
     36240 ± 26%     -73.8%       9499 ± 10%  interrupts.CPU88.CAL:Function_call_interrupts
      2450 ± 39%     -70.5%     721.75 ±  5%  interrupts.CPU88.RES:Rescheduling_interrupts
     41267 ± 29%     -77.1%       9463 ± 11%  interrupts.CPU89.CAL:Function_call_interrupts
      3286 ± 34%     -73.2%     879.50 ± 17%  interrupts.CPU89.RES:Rescheduling_interrupts
     36038 ± 18%     -50.6%      17796 ±  3%  interrupts.CPU9.CAL:Function_call_interrupts
      3140 ± 28%     -48.3%       1622 ±  9%  interrupts.CPU9.RES:Rescheduling_interrupts
     38534 ± 25%     -77.5%       8675 ±  9%  interrupts.CPU90.CAL:Function_call_interrupts
      3008 ± 27%     -79.1%     629.50 ± 16%  interrupts.CPU90.RES:Rescheduling_interrupts
     38422 ± 29%     -77.2%       8741 ± 14%  interrupts.CPU91.CAL:Function_call_interrupts
      3095 ± 51%     -78.7%     658.75 ± 14%  interrupts.CPU91.RES:Rescheduling_interrupts
     38120 ± 45%     -73.6%      10059 ± 10%  interrupts.CPU92.CAL:Function_call_interrupts
      2711 ± 61%     -73.3%     722.75 ± 10%  interrupts.CPU92.RES:Rescheduling_interrupts
     37155 ± 19%     -74.1%       9628 ± 12%  interrupts.CPU93.CAL:Function_call_interrupts
      2724 ± 32%     -70.4%     806.00 ± 32%  interrupts.CPU93.RES:Rescheduling_interrupts
     43458 ± 15%     -77.1%       9936 ± 10%  interrupts.CPU94.CAL:Function_call_interrupts
      2832 ± 25%     -76.8%     655.75 ± 18%  interrupts.CPU94.RES:Rescheduling_interrupts
     54226 ± 22%     -76.8%      12596 ± 17%  interrupts.CPU95.CAL:Function_call_interrupts
      4437 ± 37%     -80.6%     860.50 ± 11%  interrupts.CPU95.RES:Rescheduling_interrupts
    302853 ±  2%     -58.2%     126676 ±  2%  interrupts.RES:Rescheduling_interrupts


                                                                                
                           will-it-scale.per_thread_ops                         
                                                                                
  2300 +--------------------------------------------------------------------+   
       |                                                                    |   
  2250 |-+  O    O O                   O  O                                 |   
  2200 |-+                 O O       O                                 O    |   
       |                        O           O      O  O      O  O O  O      |   
  2150 |-O    O         O         O           O         O                   |   
       |              O                          O         O              O |   
  2100 |-+    +                                                             |   
       |     : +        +              +..      .+.                         |   
  2050 |.+   :  + .+.. + :             :  +   +.   +..  +                   |   
  2000 |-+..+    +    +   :           :    + +         +                    |   
       |                  :           :     +         +                     |   
  1950 |-+                 +.+..  +..+                                      |   
       |                         +                                          |   
  1900 +--------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


[-- Attachment #2: config-5.8.0-00001-g698ac7610f7928 --]
[-- Type: text/plain, Size: 169434 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 5.8.0 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc-9 (Debian 9.3.0-15) 9.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=90300
CONFIG_LD_VERSION=235000000
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_CONTEXT_TRACKING=y
# CONFIG_CONTEXT_TRACKING_FORCE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_SCHED_AVG_IRQ=y
# CONFIG_SCHED_THERMAL_PRESSURE is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_NOCB_CPU=y
# end of RCU Subsystem

CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=20
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_HAVE_ARCH_USERFAULTFD_WP=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
# CONFIG_BPF_LSM is not set
CONFIG_BPF_SYSCALL=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
CONFIG_USERFAULTFD=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_RSEQ=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLAB_MERGE_DEFAULT=y
CONFIG_SLAB_FREELIST_RANDOM=y
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_FILTER_PGPROT=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DYNAMIC_PHYSICAL_MASK=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
CONFIG_ZONE_DMA=y
CONFIG_SMP=y
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_X2APIC=y
CONFIG_X86_MPPARSE=y
# CONFIG_GOLDFISH is not set
CONFIG_RETPOLINE=y
CONFIG_X86_CPU_RESCTRL=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_NUMACHIP is not set
# CONFIG_X86_VSMP is not set
CONFIG_X86_UV=y
# CONFIG_X86_GOLDFISH is not set
# CONFIG_X86_INTEL_MID is not set
CONFIG_X86_INTEL_LPSS=y
CONFIG_X86_AMD_PLATFORM_DEVICE=y
CONFIG_IOSF_MBI=y
# CONFIG_IOSF_MBI_DEBUG is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_DEBUG is not set
CONFIG_PARAVIRT_SPINLOCKS=y
CONFIG_X86_HV_CALLBACK_VECTOR=y
CONFIG_XEN=y
# CONFIG_XEN_PV is not set
CONFIG_XEN_PVHVM=y
CONFIG_XEN_PVHVM_SMP=y
CONFIG_XEN_SAVE_RESTORE=y
# CONFIG_XEN_DEBUG_FS is not set
# CONFIG_XEN_PVH is not set
CONFIG_KVM_GUEST=y
CONFIG_ARCH_CPUIDLE_HALTPOLL=y
# CONFIG_PVH is not set
CONFIG_PARAVIRT_TIME_ACCOUNTING=y
CONFIG_PARAVIRT_CLOCK=y
# CONFIG_JAILHOUSE_GUEST is not set
# CONFIG_ACRN_GUEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_GART_IOMMU is not set
CONFIG_MAXSMP=y
CONFIG_NR_CPUS_RANGE_BEGIN=8192
CONFIG_NR_CPUS_RANGE_END=8192
CONFIG_NR_CPUS_DEFAULT=8192
CONFIG_NR_CPUS=8192
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_MC_PRIO=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
CONFIG_X86_MCELOG_LEGACY=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=m
CONFIG_X86_THERMAL_VECTOR=y

#
# Performance monitoring
#
CONFIG_PERF_EVENTS_INTEL_UNCORE=m
CONFIG_PERF_EVENTS_INTEL_RAPL=m
CONFIG_PERF_EVENTS_INTEL_CSTATE=m
CONFIG_PERF_EVENTS_AMD_POWER=m
# end of Performance monitoring

CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_X86_IOPL_IOPERM=y
CONFIG_I8K=m
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_X86_5LEVEL=y
CONFIG_X86_DIRECT_GBPAGES=y
# CONFIG_X86_CPA_STATISTICS is not set
CONFIG_AMD_MEM_ENCRYPT=y
# CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is not set
CONFIG_NUMA=y
CONFIG_AMD_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=10
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
# CONFIG_ARCH_MEMORY_PROBE is not set
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_X86_PMEM_LEGACY_DEVICE=y
CONFIG_X86_PMEM_LEGACY=m
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
# CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
CONFIG_X86_UMIP=y
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y
CONFIG_X86_INTEL_TSX_MODE_OFF=y
# CONFIG_X86_INTEL_TSX_MODE_ON is not set
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
CONFIG_EFI=y
CONFIG_EFI_STUB=y
CONFIG_EFI_MIXED=y
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
# CONFIG_KEXEC_SIG is not set
CONFIG_CRASH_DUMP=y
CONFIG_KEXEC_JUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_DYNAMIC_MEMORY_LAYOUT=y
CONFIG_RANDOMIZE_MEMORY=y
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING=0xa
CONFIG_HOTPLUG_CPU=y
CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
# CONFIG_COMPAT_VDSO is not set
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_XONLY is not set
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_MODIFY_LDT_SYSCALL=y
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y
# end of Processor type and features

CONFIG_ARCH_HAS_ADD_PAGES=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_ARCH_ENABLE_THP_MIGRATION=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_HIBERNATION_SNAPSHOT_DEV=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_ADVANCED_DEBUG is not set
# CONFIG_PM_TEST_SUSPEND is not set
CONFIG_PM_SLEEP_DEBUG=y
# CONFIG_PM_TRACE_RTC is not set
CONFIG_PM_CLK=y
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
# CONFIG_ENERGY_MODEL is not set
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_SPCR_TABLE=y
CONFIG_ACPI_LPIT=y
CONFIG_ACPI_SLEEP=y
# CONFIG_ACPI_PROCFS_POWER is not set
CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
CONFIG_ACPI_EC_DEBUGFS=m
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_TAD=m
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_CPU_FREQ_PSS=y
CONFIG_ACPI_PROCESSOR_CSTATE=y
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_ACPI_CPPC_LIB=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_IPMI=m
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_PROCESSOR_AGGREGATOR=m
CONFIG_ACPI_THERMAL=y
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
CONFIG_ACPI_TABLE_UPGRADE=y
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_PCI_SLOT=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
CONFIG_ACPI_SBS=m
CONFIG_ACPI_HED=y
# CONFIG_ACPI_CUSTOM_METHOD is not set
CONFIG_ACPI_BGRT=y
CONFIG_ACPI_NFIT=m
# CONFIG_NFIT_SECURITY_DEBUG is not set
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_HMAT is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_ACPI_APEI_MEMORY_FAILURE=y
CONFIG_ACPI_APEI_EINJ=m
CONFIG_ACPI_APEI_ERST_DEBUG=y
CONFIG_DPTF_POWER=m
CONFIG_ACPI_WATCHDOG=y
CONFIG_ACPI_EXTLOG=m
CONFIG_ACPI_ADXL=y
CONFIG_PMIC_OPREGION=y
# CONFIG_ACPI_CONFIGFS is not set
CONFIG_X86_PM_TIMER=y
CONFIG_SFI=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ATTR_SET=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y

#
# CPU frequency scaling drivers
#
CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_PCC_CPUFREQ is not set
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_X86_ACPI_CPUFREQ_CPB=y
CONFIG_X86_POWERNOW_K8=m
CONFIG_X86_AMD_FREQ_SENSITIVITY=m
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
CONFIG_X86_P4_CLOCKMOD=m

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=m
# end of CPU Frequency scaling

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
# CONFIG_CPU_IDLE_GOV_LADDER is not set
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_CPU_IDLE_GOV_TEO is not set
# CONFIG_CPU_IDLE_GOV_HALTPOLL is not set
CONFIG_HALTPOLL_CPUIDLE=y
# end of CPU Idle

CONFIG_INTEL_IDLE=y
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_XEN=y
CONFIG_MMCONF_FAM10H=y
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
# CONFIG_X86_SYSFB is not set
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
CONFIG_IA32_EMULATION=y
# CONFIG_X86_X32 is not set
CONFIG_COMPAT_32=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
# end of Binary Emulations

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DMIID=y
CONFIG_DMI_SYSFS=y
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# CONFIG_ISCSI_IBFT is not set
CONFIG_FW_CFG_SYSFS=y
# CONFIG_FW_CFG_SYSFS_CMDLINE is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# EFI (Extensible Firmware Interface) Support
#
CONFIG_EFI_VARS=y
CONFIG_EFI_ESRT=y
CONFIG_EFI_VARS_PSTORE=y
CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y
CONFIG_EFI_RUNTIME_MAP=y
# CONFIG_EFI_FAKE_MEMMAP is not set
CONFIG_EFI_RUNTIME_WRAPPERS=y
CONFIG_EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER=y
# CONFIG_EFI_BOOTLOADER_CONTROL is not set
# CONFIG_EFI_CAPSULE_LOADER is not set
# CONFIG_EFI_TEST is not set
CONFIG_APPLE_PROPERTIES=y
# CONFIG_RESET_ATTACK_MITIGATION is not set
# CONFIG_EFI_RCI2_TABLE is not set
# CONFIG_EFI_DISABLE_PCI_DMA is not set
# end of EFI (Extensible Firmware Interface) Support

CONFIG_UEFI_CPER=y
CONFIG_UEFI_CPER_X86=y
CONFIG_EFI_DEV_PATH_PARSER=y
CONFIG_EFI_EARLYCON=y
CONFIG_EFI_CUSTOM_SSDT_OVERLAYS=y

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
CONFIG_KVM_COMPAT=y
CONFIG_HAVE_KVM_IRQ_BYPASS=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
CONFIG_KVM_AMD_SEV=y
CONFIG_KVM_MMU_AUDIT=y
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_HOTPLUG_SMT=y
CONFIG_OPROFILE=m
CONFIG_OPROFILE_EVENT_MULTIPLEX=y
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=28
CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES=y
CONFIG_HAVE_COPY_THREAD_TLS=y
CONFIG_HAVE_STACK_VALIDATION=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_VMAP_STACK=y
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_ARCH_USE_MEMREMAP_PROT=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
CONFIG_MODULE_SIG_SHA256=y
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha256"
# CONFIG_MODULE_COMPRESS is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
# CONFIG_UNUSED_SYMBOLS is not set
# CONFIG_TRIM_UNUSED_KSYMS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLK_SCSI_REQUEST=y
CONFIG_BLK_CGROUP_RWSTAT=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=m
CONFIG_BLK_DEV_ZONED=y
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_DEV_THROTTLING_LOW is not set
# CONFIG_BLK_CMDLINE_PARSER is not set
CONFIG_BLK_WBT=y
# CONFIG_BLK_CGROUP_IOLATENCY is not set
# CONFIG_BLK_CGROUP_IOCOST is not set
CONFIG_BLK_WBT_MQ=y
CONFIG_BLK_DEBUG_FS=y
CONFIG_BLK_DEBUG_FS_ZONED=y
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
# end of Partition Types

CONFIG_BLOCK_COMPAT=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_MQ_RDMA=y
CONFIG_BLK_PM=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BFQ_GROUP_IOSCHED=y
# CONFIG_BFQ_CGROUP_DEBUG is not set
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_HAVE_BOOTMEM_INFO_NODE=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
CONFIG_HWPOISON_INJECT=m
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_ARCH_WANTS_THP_SWAP=y
CONFIG_THP_SWAP=y
CONFIG_CLEANCACHE=y
CONFIG_FRONTSWAP=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
# CONFIG_CMA_DEBUGFS is not set
CONFIG_CMA_AREAS=7
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
# CONFIG_ZSWAP_DEFAULT_ON is not set
CONFIG_ZPOOL=y
CONFIG_ZBUD=y
# CONFIG_Z3FOLD is not set
CONFIG_ZSMALLOC=y
# CONFIG_ZSMALLOC_PGTABLE_MAPPING is not set
CONFIG_ZSMALLOC_STAT=y
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_ARCH_HAS_PTE_DEVMAP=y
CONFIG_ZONE_DEVICE=y
CONFIG_DEV_PAGEMAP_OPS=y
CONFIG_DEVICE_PRIVATE=y
CONFIG_FRAME_VECTOR=y
CONFIG_ARCH_USES_HIGH_VMA_FLAGS=y
CONFIG_ARCH_HAS_PKEYS=y
# CONFIG_PERCPU_STATS is not set
# CONFIG_GUP_BENCHMARK is not set
# CONFIG_READ_ONLY_THP_FOR_FS is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_MAPPING_DIRTY_HELPERS=y
# end of Memory Management options

CONFIG_NET=y
CONFIG_COMPAT_NETLINK_MESSAGES=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_UNIX_DIAG=m
CONFIG_TLS=m
CONFIG_TLS_DEVICE=y
# CONFIG_TLS_TOE is not set
CONFIG_XFRM=y
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_INTERFACE is not set
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_AH=m
CONFIG_XFRM_ESP=m
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
CONFIG_NET_KEY_MIGRATE=y
# CONFIG_SMC is not set
CONFIG_XDP_SOCKETS=y
# CONFIG_XDP_SOCKETS_DIAG is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE_COMMON=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_ESP_OFFLOAD=m
# CONFIG_INET_ESPINTCP is not set
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
CONFIG_INET_RAW_DIAG=m
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_NV=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
CONFIG_TCP_CONG_DCTCP=m
# CONFIG_TCP_CONG_CDG is not set
CONFIG_TCP_CONG_BBR=m
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_ESP_OFFLOAD=m
# CONFIG_INET6_ESPINTCP is not set
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
# CONFIG_IPV6_ILA is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
CONFIG_IPV6_SIT_6RD=y
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
CONFIG_IPV6_PIMSM_V2=y
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
# CONFIG_IPV6_RPL_LWTUNNEL is not set
CONFIG_NETLABEL=y
# CONFIG_MPTCP is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
CONFIG_NETWORK_PHY_TIMESTAMPING=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=m

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
# CONFIG_NETFILTER_NETLINK_ACCT is not set
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NETFILTER_NETLINK_OSF=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_COMMON=m
CONFIG_NF_LOG_NETDEV=m
CONFIG_NETFILTER_CONNCOUNT=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_LABELS=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
CONFIG_NF_CT_NETLINK_HELPER=m
CONFIG_NETFILTER_NETLINK_GLUE_CT=y
CONFIG_NF_NAT=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_REDIRECT=y
CONFIG_NF_NAT_MASQUERADE=y
CONFIG_NETFILTER_SYNPROXY=m
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_COUNTER=m
CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
# CONFIG_NFT_TUNNEL is not set
CONFIG_NFT_OBJREF=m
CONFIG_NFT_QUEUE=m
CONFIG_NFT_QUOTA=m
CONFIG_NFT_REJECT=m
CONFIG_NFT_REJECT_INET=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB=m
CONFIG_NFT_FIB_INET=m
# CONFIG_NFT_XFRM is not set
CONFIG_NFT_SOCKET=m
# CONFIG_NFT_OSF is not set
# CONFIG_NFT_TPROXY is not set
# CONFIG_NFT_SYNPROXY is not set
CONFIG_NF_DUP_NETDEV=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
# CONFIG_NF_FLOW_TABLE is not set
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
# CONFIG_NETFILTER_XT_TARGET_LED is not set
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
# CONFIG_NETFILTER_XT_MATCH_L2TP is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
# CONFIG_NETFILTER_XT_MATCH_NFACCT is not set
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
# end of Core Netfilter Configuration

CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPMARK=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_IPMAC=m
CONFIG_IP_SET_HASH_MAC=m
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
CONFIG_IP_VS_IPV6=y
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
CONFIG_IP_VS_PROTO_SCTP=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_FO=m
CONFIG_IP_VS_OVF=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
# CONFIG_IP_VS_MH is not set
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS MH scheduler
#
CONFIG_IP_VS_MH_TAB_INDEX=12

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y
CONFIG_IP_VS_PE_SIP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_SOCKET_IPV4=m
CONFIG_NF_TPROXY_IPV4=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_REJECT_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_DUP_IPV4=m
CONFIG_NF_LOG_ARP=m
CONFIG_NF_LOG_IPV4=m
CONFIG_NF_REJECT_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_SYNPROXY=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_MANGLE=m
# CONFIG_IP_NF_TARGET_CLUSTERIP is not set
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_SOCKET_IPV6=m
CONFIG_NF_TPROXY_IPV6=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_REJECT_IPV6=m
CONFIG_NFT_DUP_IPV6=m
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_DUP_IPV6=m
CONFIG_NF_REJECT_IPV6=m
CONFIG_NF_LOG_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
# CONFIG_IP6_NF_MATCH_SRH is not set
# CONFIG_IP6_NF_TARGET_HL is not set
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_TARGET_SYNPROXY=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_IP6_NF_NAT=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_TABLES_BRIDGE=m
# CONFIG_NFT_BRIDGE_META is not set
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
# CONFIG_NF_CONNTRACK_BRIDGE is not set
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
CONFIG_BRIDGE_EBT_IP6=m
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_NFLOG=m
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5 is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
CONFIG_SCTP_COOKIE_HMAC_SHA1=y
CONFIG_INET_SCTP_DIAG=m
# CONFIG_RDS is not set
CONFIG_TIPC=m
# CONFIG_TIPC_MEDIA_IB is not set
CONFIG_TIPC_MEDIA_UDP=y
CONFIG_TIPC_CRYPTO=y
CONFIG_TIPC_DIAG=m
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_MRP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
CONFIG_BRIDGE_VLAN_FILTERING=y
# CONFIG_BRIDGE_MRP is not set
CONFIG_HAVE_NET_DSA=y
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_VLAN_8021Q_MVRP=y
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
CONFIG_6LOWPAN=m
# CONFIG_6LOWPAN_DEBUGFS is not set
# CONFIG_6LOWPAN_NHC is not set
CONFIG_IEEE802154=m
# CONFIG_IEEE802154_NL802154_EXPERIMENTAL is not set
CONFIG_IEEE802154_SOCKET=m
CONFIG_IEEE802154_6LOWPAN=m
CONFIG_MAC802154=m
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
# CONFIG_NET_SCH_CBS is not set
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=y
# CONFIG_NET_SCH_CAKE is not set
CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_HHF=m
CONFIG_NET_SCH_PIE=m
# CONFIG_NET_SCH_FQ_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m
# CONFIG_NET_SCH_ETS is not set
CONFIG_NET_SCH_DEFAULT=y
# CONFIG_DEFAULT_FQ is not set
# CONFIG_DEFAULT_CODEL is not set
CONFIG_DEFAULT_FQ_CODEL=y
# CONFIG_DEFAULT_SFQ is not set
# CONFIG_DEFAULT_PFIFO_FAST is not set
CONFIG_DEFAULT_NET_SCH="fq_codel"

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_CLS_MATCHALL=m
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
# CONFIG_NET_EMATCH_CANID is not set
CONFIG_NET_EMATCH_IPSET=m
# CONFIG_NET_EMATCH_IPT is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_SAMPLE=m
# CONFIG_NET_ACT_IPT is not set
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_MPLS is not set
CONFIG_NET_ACT_VLAN=m
CONFIG_NET_ACT_BPF=m
# CONFIG_NET_ACT_CONNMARK is not set
# CONFIG_NET_ACT_CTINFO is not set
CONFIG_NET_ACT_SKBMOD=m
# CONFIG_NET_ACT_IFE is not set
CONFIG_NET_ACT_TUNNEL_KEY=m
# CONFIG_NET_ACT_GATE is not set
# CONFIG_NET_TC_SKB_EXT is not set
CONFIG_NET_SCH_FIFO=y
CONFIG_DCB=y
CONFIG_DNS_RESOLVER=m
# CONFIG_BATMAN_ADV is not set
CONFIG_OPENVSWITCH=m
CONFIG_OPENVSWITCH_GRE=m
CONFIG_VSOCKETS=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VMWARE_VMCI_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS_COMMON=m
CONFIG_HYPERV_VSOCKETS=m
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=y
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m
CONFIG_NET_NSH=y
# CONFIG_HSR is not set
CONFIG_NET_SWITCHDEV=y
CONFIG_NET_L3_MASTER_DEV=y
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_DROP_MONITOR=y
# end of Network testing
# end of Networking options

# CONFIG_HAMRADIO is not set
CONFIG_CAN=m
CONFIG_CAN_RAW=m
CONFIG_CAN_BCM=m
CONFIG_CAN_GW=m
# CONFIG_CAN_J1939 is not set

#
# CAN Device Drivers
#
CONFIG_CAN_VCAN=m
# CONFIG_CAN_VXCAN is not set
CONFIG_CAN_SLCAN=m
CONFIG_CAN_DEV=m
CONFIG_CAN_CALC_BITTIMING=y
# CONFIG_CAN_KVASER_PCIEFD is not set
CONFIG_CAN_C_CAN=m
CONFIG_CAN_C_CAN_PLATFORM=m
CONFIG_CAN_C_CAN_PCI=m
CONFIG_CAN_CC770=m
# CONFIG_CAN_CC770_ISA is not set
CONFIG_CAN_CC770_PLATFORM=m
# CONFIG_CAN_IFI_CANFD is not set
# CONFIG_CAN_M_CAN is not set
# CONFIG_CAN_PEAK_PCIEFD is not set
CONFIG_CAN_SJA1000=m
CONFIG_CAN_EMS_PCI=m
# CONFIG_CAN_F81601 is not set
CONFIG_CAN_KVASER_PCI=m
CONFIG_CAN_PEAK_PCI=m
CONFIG_CAN_PEAK_PCIEC=y
CONFIG_CAN_PLX_PCI=m
# CONFIG_CAN_SJA1000_ISA is not set
CONFIG_CAN_SJA1000_PLATFORM=m
CONFIG_CAN_SOFTING=m

#
# CAN SPI interfaces
#
# CONFIG_CAN_HI311X is not set
# CONFIG_CAN_MCP251X is not set
# end of CAN SPI interfaces

#
# CAN USB interfaces
#
# CONFIG_CAN_8DEV_USB is not set
# CONFIG_CAN_EMS_USB is not set
# CONFIG_CAN_ESD_USB2 is not set
# CONFIG_CAN_GS_USB is not set
# CONFIG_CAN_KVASER_USB is not set
# CONFIG_CAN_MCBA_USB is not set
# CONFIG_CAN_PEAK_USB is not set
# CONFIG_CAN_UCAN is not set
# end of CAN USB interfaces

# CONFIG_CAN_DEBUG_DEVICES is not set
# end of CAN Device Drivers

CONFIG_BT=m
CONFIG_BT_BREDR=y
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_BT_BNEP_MC_FILTER=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_HIDP=m
CONFIG_BT_HS=y
CONFIG_BT_LE=y
# CONFIG_BT_6LOWPAN is not set
# CONFIG_BT_LEDS is not set
# CONFIG_BT_MSFTEXT is not set
CONFIG_BT_DEBUGFS=y
# CONFIG_BT_SELFTEST is not set

#
# Bluetooth device drivers
#
# CONFIG_BT_HCIBTUSB is not set
# CONFIG_BT_HCIBTSDIO is not set
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
CONFIG_BT_HCIUART_ATH3K=y
# CONFIG_BT_HCIUART_INTEL is not set
# CONFIG_BT_HCIUART_AG6XX is not set
# CONFIG_BT_HCIBCM203X is not set
# CONFIG_BT_HCIBPA10X is not set
# CONFIG_BT_HCIBFUSB is not set
CONFIG_BT_HCIVHCI=m
CONFIG_BT_MRVL=m
# CONFIG_BT_MRVL_SDIO is not set
# CONFIG_BT_MTKSDIO is not set
# end of Bluetooth device drivers

# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
CONFIG_STREAM_PARSER=y
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
CONFIG_WEXT_CORE=y
CONFIG_WEXT_PROC=y
CONFIG_CFG80211=m
# CONFIG_NL80211_TESTMODE is not set
# CONFIG_CFG80211_DEVELOPER_WARNINGS is not set
CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
CONFIG_CFG80211_DEFAULT_PS=y
# CONFIG_CFG80211_DEBUGFS is not set
CONFIG_CFG80211_CRDA_SUPPORT=y
CONFIG_CFG80211_WEXT=y
CONFIG_MAC80211=m
CONFIG_MAC80211_HAS_RC=y
CONFIG_MAC80211_RC_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
CONFIG_MAC80211_MESH=y
CONFIG_MAC80211_LEDS=y
CONFIG_MAC80211_DEBUGFS=y
# CONFIG_MAC80211_MESSAGE_TRACING is not set
# CONFIG_MAC80211_DEBUG_MENU is not set
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
# CONFIG_WIMAX is not set
CONFIG_RFKILL=m
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_INPUT=y
# CONFIG_RFKILL_GPIO is not set
CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=y
# CONFIG_NET_9P_XEN is not set
# CONFIG_NET_9P_RDMA is not set
# CONFIG_NET_9P_DEBUG is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=m
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
# CONFIG_NFC is not set
CONFIG_PSAMPLE=m
# CONFIG_NET_IFE is not set
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_SOCK_VALIDATE_XMIT=y
CONFIG_NET_SOCK_MSG=y
CONFIG_NET_DEVLINK=y
CONFIG_FAILOVER=m
CONFIG_ETHTOOL_NETLINK=y
CONFIG_HAVE_EBPF_JIT=y

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
# CONFIG_EISA is not set
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_PCIEAER=y
CONFIG_PCIEAER_INJECT=m
CONFIG_PCIE_ECRC=y
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEFAULT=y
# CONFIG_PCIEASPM_POWERSAVE is not set
# CONFIG_PCIEASPM_POWER_SUPERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set
CONFIG_PCIE_PME=y
CONFIG_PCIE_DPC=y
# CONFIG_PCIE_PTM is not set
# CONFIG_PCIE_BW is not set
# CONFIG_PCIE_EDR is not set
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_QUIRKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
CONFIG_PCI_STUB=y
CONFIG_PCI_PF_STUB=m
# CONFIG_XEN_PCIDEV_FRONTEND is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_LOCKLESS_CONFIG=y
CONFIG_PCI_IOV=y
CONFIG_PCI_PRI=y
CONFIG_PCI_PASID=y
# CONFIG_PCI_P2PDMA is not set
CONFIG_PCI_LABEL=y
CONFIG_PCI_HYPERV=m
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=y

#
# PCI controller drivers
#
CONFIG_VMD=y
CONFIG_PCI_HYPERV_INTERFACE=m

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_PCCARD is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
# CONFIG_FW_LOADER_COMPRESS is not set
CONFIG_FW_CACHE=y
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_PM_QOS_KUNIT_TEST is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_KUNIT_DRIVER_PE_TEST=y
CONFIG_SYS_HYPERVISOR=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=m
CONFIG_REGMAP_SPI=m
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# end of Bus devices

CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_NULL_BLK=m
CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION=y
# CONFIG_BLK_DEV_FD is not set
CONFIG_CDROM=m
# CONFIG_PARIDE is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_ZRAM is not set
# CONFIG_BLK_DEV_UMEM is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_DRBD is not set
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SKD is not set
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
# CONFIG_ATA_OVER_ETH is not set
CONFIG_XEN_BLKDEV_FRONTEND=m
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_RBD=m
# CONFIG_BLK_DEV_RSXX is not set

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
CONFIG_NVME_MULTIPATH=y
# CONFIG_NVME_HWMON is not set
CONFIG_NVME_FABRICS=m
# CONFIG_NVME_RDMA is not set
CONFIG_NVME_FC=m
# CONFIG_NVME_TCP is not set
CONFIG_NVME_TARGET=m
CONFIG_NVME_TARGET_LOOP=m
# CONFIG_NVME_TARGET_RDMA is not set
CONFIG_NVME_TARGET_FC=m
CONFIG_NVME_TARGET_FCLOOP=m
# CONFIG_NVME_TARGET_TCP is not set
# end of NVME Support

#
# Misc devices
#
CONFIG_SENSORS_LIS3LV02D=m
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
CONFIG_TIFM_CORE=m
CONFIG_TIFM_7XX1=m
# CONFIG_ICS932S401 is not set
CONFIG_ENCLOSURE_SERVICES=m
CONFIG_SGI_XP=m
CONFIG_HP_ILO=m
CONFIG_SGI_GRU=m
# CONFIG_SGI_GRU_DEBUG is not set
CONFIG_APDS9802ALS=m
CONFIG_ISL29003=m
CONFIG_ISL29020=m
CONFIG_SENSORS_TSL2550=m
CONFIG_SENSORS_BH1770=m
CONFIG_SENSORS_APDS990X=m
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
CONFIG_VMWARE_BALLOON=m
# CONFIG_LATTICE_ECP3_CONFIG is not set
# CONFIG_SRAM is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
CONFIG_MISC_RTSX=m
CONFIG_PVPANIC=y
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_AT25 is not set
CONFIG_EEPROM_LEGACY=m
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=m
# CONFIG_EEPROM_93XX46 is not set
# CONFIG_EEPROM_IDT_89HPESX is not set
# CONFIG_EEPROM_EE1004 is not set
# end of EEPROM support

CONFIG_CB710_CORE=m
# CONFIG_CB710_DEBUG is not set
CONFIG_CB710_DEBUG_ASSUMPTIONS=y

#
# Texas Instruments shared transport line discipline
#
# CONFIG_TI_ST is not set
# end of Texas Instruments shared transport line discipline

CONFIG_SENSORS_LIS3_I2C=m
CONFIG_ALTERA_STAPL=m
CONFIG_INTEL_MEI=m
CONFIG_INTEL_MEI_ME=m
# CONFIG_INTEL_MEI_TXE is not set
# CONFIG_INTEL_MEI_HDCP is not set
CONFIG_VMWARE_VMCI=m

#
# Intel MIC & related support
#
# CONFIG_INTEL_MIC_BUS is not set
# CONFIG_SCIF_BUS is not set
# CONFIG_VOP_BUS is not set
# end of Intel MIC & related support

# CONFIG_GENWQE is not set
# CONFIG_ECHO is not set
# CONFIG_MISC_ALCOR_PCI is not set
CONFIG_MISC_RTSX_PCI=m
# CONFIG_MISC_RTSX_USB is not set
# CONFIG_HABANA_AI is not set
# CONFIG_UACCE is not set
# end of Misc devices

CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
# CONFIG_SCSI_SAS_ATA is not set
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
CONFIG_SCSI_MPT3SAS=m
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
CONFIG_SCSI_MPT3SAS_MAX_SGE=128
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_SMARTPQI is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_MYRB is not set
# CONFIG_SCSI_MYRS is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_XEN_SCSI_FRONTEND is not set
CONFIG_HYPERV_STORAGE=m
# CONFIG_LIBFC is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FDOMAIN_PCI is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_ISCI is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_PPA is not set
# CONFIG_SCSI_IMM is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
# CONFIG_SCSI_VIRTIO is not set
# CONFIG_SCSI_CHELSIO_FCOE is not set
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=y
CONFIG_SCSI_DH_HP_SW=y
CONFIG_SCSI_DH_EMC=y
CONFIG_SCSI_DH_ALUA=y
# end of SCSI device support

CONFIG_ATA=m
CONFIG_SATA_HOST=y
CONFIG_PATA_TIMINGS=y
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_FORCE=y
CONFIG_ATA_ACPI=y
# CONFIG_SATA_ZPODD is not set
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=m
CONFIG_SATA_MOBILE_LPM_POLICY=0
CONFIG_SATA_AHCI_PLATFORM=m
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=m
# CONFIG_SATA_DWC is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_SVW is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SCH is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
CONFIG_ATA_GENERIC=m
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=m
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
# CONFIG_DM_UNSTRIPED is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_DM_CACHE=m
CONFIG_DM_CACHE_SMQ=m
CONFIG_DM_WRITECACHE=m
# CONFIG_DM_EBS is not set
CONFIG_DM_ERA=m
# CONFIG_DM_CLONE is not set
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
# CONFIG_DM_MULTIPATH_HST is not set
CONFIG_DM_DELAY=m
# CONFIG_DM_DUST is not set
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
# CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG is not set
# CONFIG_DM_VERITY_FEC is not set
CONFIG_DM_SWITCH=m
CONFIG_DM_LOG_WRITES=m
CONFIG_DM_INTEGRITY=m
# CONFIG_DM_ZONED is not set
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
CONFIG_TCM_FILEIO=m
CONFIG_TCM_PSCSI=m
CONFIG_TCM_USER2=m
CONFIG_LOOPBACK_TARGET=m
CONFIG_ISCSI_TARGET=m
# CONFIG_SBP_TARGET is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_SBP2=m
CONFIG_FIREWIRE_NET=m
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

CONFIG_MACINTOSH_DRIVERS=y
CONFIG_MAC_EMUMOUSEBTN=y
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
# CONFIG_DUMMY is not set
# CONFIG_WIREGUARD is not set
# CONFIG_EQUALIZER is not set
# CONFIG_NET_FC is not set
# CONFIG_IFB is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
# CONFIG_GENEVE is not set
# CONFIG_BAREUDP is not set
# CONFIG_GTP is not set
# CONFIG_MACSEC is not set
CONFIG_NETCONSOLE=m
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_TUN is not set
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
# CONFIG_NLMON is not set
# CONFIG_NET_VRF is not set
# CONFIG_VSOCKMON is not set
# CONFIG_ARCNET is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
# CONFIG_ATM_TCP is not set
# CONFIG_ATM_LANAI is not set
# CONFIG_ATM_ENI is not set
# CONFIG_ATM_FIRESTREAM is not set
# CONFIG_ATM_ZATM is not set
# CONFIG_ATM_NICSTAR is not set
# CONFIG_ATM_IDT77252 is not set
# CONFIG_ATM_AMBASSADOR is not set
# CONFIG_ATM_HORIZON is not set
# CONFIG_ATM_IA is not set
# CONFIG_ATM_FORE200E is not set
# CONFIG_ATM_HE is not set
# CONFIG_ATM_SOLOS is not set

#
# Distributed Switch Architecture drivers
#
# end of Distributed Switch Architecture drivers

CONFIG_ETHERNET=y
CONFIG_MDIO=y
CONFIG_NET_VENDOR_3COM=y
# CONFIG_VORTEX is not set
# CONFIG_TYPHOON is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_AGERE=y
# CONFIG_ET131X is not set
CONFIG_NET_VENDOR_ALACRITECH=y
# CONFIG_SLICOSS is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMAZON=y
# CONFIG_ENA_ETHERNET is not set
CONFIG_NET_VENDOR_AMD=y
# CONFIG_AMD8111_ETH is not set
# CONFIG_PCNET32 is not set
# CONFIG_AMD_XGBE is not set
CONFIG_NET_VENDOR_AQUANTIA=y
# CONFIG_AQTION is not set
CONFIG_NET_VENDOR_ARC=y
CONFIG_NET_VENDOR_ATHEROS=y
# CONFIG_ATL2 is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_ALX is not set
# CONFIG_NET_VENDOR_AURORA is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BCMGENET is not set
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
CONFIG_TIGON3=y
CONFIG_TIGON3_HWMON=y
# CONFIG_BNX2X is not set
# CONFIG_SYSTEMPORT is not set
# CONFIG_BNXT is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_CADENCE=y
# CONFIG_MACB is not set
CONFIG_NET_VENDOR_CAVIUM=y
# CONFIG_THUNDER_NIC_PF is not set
# CONFIG_THUNDER_NIC_VF is not set
# CONFIG_THUNDER_NIC_BGX is not set
# CONFIG_THUNDER_NIC_RGX is not set
CONFIG_CAVIUM_PTP=y
# CONFIG_LIQUIDIO is not set
# CONFIG_LIQUIDIO_VF is not set
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
CONFIG_NET_VENDOR_CORTINA=y
# CONFIG_CX_ECAT is not set
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
# CONFIG_NET_TULIP is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_EZCHIP=y
CONFIG_NET_VENDOR_GOOGLE=y
# CONFIG_GVE is not set
CONFIG_NET_VENDOR_HUAWEI=y
# CONFIG_HINIC is not set
CONFIG_NET_VENDOR_I825XX=y
CONFIG_NET_VENDOR_INTEL=y
# CONFIG_E100 is not set
CONFIG_E1000=y
CONFIG_E1000E=y
CONFIG_E1000E_HWTS=y
CONFIG_IGB=y
CONFIG_IGB_HWMON=y
# CONFIG_IGBVF is not set
# CONFIG_IXGB is not set
CONFIG_IXGBE=y
CONFIG_IXGBE_HWMON=y
# CONFIG_IXGBE_DCB is not set
CONFIG_IXGBE_IPSEC=y
# CONFIG_IXGBEVF is not set
CONFIG_I40E=y
# CONFIG_I40E_DCB is not set
# CONFIG_I40EVF is not set
# CONFIG_ICE is not set
# CONFIG_FM10K is not set
# CONFIG_IGC is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
CONFIG_SKGE=y
# CONFIG_SKGE_DEBUG is not set
# CONFIG_SKGE_GENESIS is not set
# CONFIG_SKY2 is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX5_CORE is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8842 is not set
# CONFIG_KS8851 is not set
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MICROCHIP=y
# CONFIG_ENC28J60 is not set
# CONFIG_ENCX24J600 is not set
# CONFIG_LAN743X is not set
CONFIG_NET_VENDOR_MICROSEMI=y
# CONFIG_MSCC_OCELOT_SWITCH is not set
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_NETERION=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP is not set
CONFIG_NET_VENDOR_NI=y
# CONFIG_NI_XGE_MANAGEMENT_ENET is not set
CONFIG_NET_VENDOR_8390=y
# CONFIG_NE2K_PCI is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_ETHOC is not set
CONFIG_NET_VENDOR_PACKET_ENGINES=y
# CONFIG_HAMACHI is not set
CONFIG_YELLOWFIN=m
CONFIG_NET_VENDOR_PENSANDO=y
# CONFIG_IONIC is not set
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_QED is not set
CONFIG_NET_VENDOR_QUALCOMM=y
# CONFIG_QCOM_EMAC is not set
# CONFIG_RMNET is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_ATP is not set
CONFIG_8139CP=y
CONFIG_8139TOO=y
CONFIG_8139TOO_PIO=y
# CONFIG_8139TOO_TUNE_TWISTER is not set
# CONFIG_8139TOO_8129 is not set
# CONFIG_8139_OLD_RX_RESET is not set
CONFIG_R8169=y
CONFIG_NET_VENDOR_RENESAS=y
CONFIG_NET_VENDOR_ROCKER=y
# CONFIG_ROCKER is not set
CONFIG_NET_VENDOR_SAMSUNG=y
# CONFIG_SXGBE_ETH is not set
CONFIG_NET_VENDOR_SEEQ=y
CONFIG_NET_VENDOR_SOLARFLARE=y
# CONFIG_SFC is not set
# CONFIG_SFC_FALCON is not set
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
# CONFIG_SIS190 is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_SOCIONEXT=y
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_SYNOPSYS=y
# CONFIG_DWC_XLGMAC is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TI_CPSW_PHY_SEL is not set
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XILINX=y
# CONFIG_XILINX_AXI_EMAC is not set
# CONFIG_XILINX_LL_TEMAC is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_MDIO_DEVICE=y
CONFIG_MDIO_BUS=y
# CONFIG_MDIO_BCM_UNIMAC is not set
# CONFIG_MDIO_BITBANG is not set
# CONFIG_MDIO_MSCC_MIIM is not set
# CONFIG_MDIO_MVUSB is not set
# CONFIG_MDIO_THUNDER is not set
# CONFIG_MDIO_XPCS is not set
CONFIG_PHYLIB=y
# CONFIG_LED_TRIGGER_PHY is not set

#
# MII PHY device drivers
#
# CONFIG_ADIN_PHY is not set
# CONFIG_AMD_PHY is not set
# CONFIG_AQUANTIA_PHY is not set
# CONFIG_AX88796B_PHY is not set
# CONFIG_BCM7XXX_PHY is not set
# CONFIG_BCM87XX_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM54140_PHY is not set
# CONFIG_BCM84881_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_CORTINA_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_DP83822_PHY is not set
# CONFIG_DP83TC811_PHY is not set
# CONFIG_DP83848_PHY is not set
# CONFIG_DP83867_PHY is not set
# CONFIG_DP83869_PHY is not set
# CONFIG_FIXED_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_INTEL_XWAY_PHY is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_MARVELL_PHY is not set
# CONFIG_MARVELL_10G_PHY is not set
# CONFIG_MICREL_PHY is not set
# CONFIG_MICROCHIP_PHY is not set
# CONFIG_MICROCHIP_T1_PHY is not set
# CONFIG_MICROSEMI_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_NXP_TJA11XX_PHY is not set
# CONFIG_QSEMI_PHY is not set
CONFIG_REALTEK_PHY=y
# CONFIG_RENESAS_PHY is not set
# CONFIG_ROCKCHIP_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_TERANETICS_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_XILINX_GMII2RGMII is not set
# CONFIG_MICREL_KS8995MA is not set
# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
CONFIG_USB_NET_DRIVERS=y
CONFIG_USB_CATC=y
CONFIG_USB_KAWETH=y
CONFIG_USB_PEGASUS=y
CONFIG_USB_RTL8150=y
# CONFIG_USB_RTL8152 is not set
# CONFIG_USB_LAN78XX is not set
CONFIG_USB_USBNET=y
CONFIG_USB_NET_AX8817X=y
CONFIG_USB_NET_AX88179_178A=y
CONFIG_USB_NET_CDCETHER=y
CONFIG_USB_NET_CDC_EEM=y
CONFIG_USB_NET_CDC_NCM=y
# CONFIG_USB_NET_HUAWEI_CDC_NCM is not set
# CONFIG_USB_NET_CDC_MBIM is not set
CONFIG_USB_NET_DM9601=y
# CONFIG_USB_NET_SR9700 is not set
# CONFIG_USB_NET_SR9800 is not set
CONFIG_USB_NET_SMSC75XX=y
CONFIG_USB_NET_SMSC95XX=y
CONFIG_USB_NET_GL620A=y
CONFIG_USB_NET_NET1080=y
CONFIG_USB_NET_PLUSB=y
CONFIG_USB_NET_MCS7830=y
CONFIG_USB_NET_RNDIS_HOST=y
CONFIG_USB_NET_CDC_SUBSET_ENABLE=y
CONFIG_USB_NET_CDC_SUBSET=y
# CONFIG_USB_ALI_M5632 is not set
# CONFIG_USB_AN2720 is not set
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
# CONFIG_USB_EPSON2888 is not set
# CONFIG_USB_KC2190 is not set
CONFIG_USB_NET_ZAURUS=y
# CONFIG_USB_NET_CX82310_ETH is not set
# CONFIG_USB_NET_KALMIA is not set
# CONFIG_USB_NET_QMI_WWAN is not set
# CONFIG_USB_HSO is not set
CONFIG_USB_NET_INT51X1=y
CONFIG_USB_IPHETH=y
CONFIG_USB_SIERRA_NET=y
# CONFIG_USB_VL600 is not set
# CONFIG_USB_NET_CH9200 is not set
# CONFIG_USB_NET_AQC111 is not set
CONFIG_WLAN=y
CONFIG_WLAN_VENDOR_ADMTEK=y
# CONFIG_ADM8211 is not set
CONFIG_WLAN_VENDOR_ATH=y
# CONFIG_ATH_DEBUG is not set
# CONFIG_ATH5K is not set
# CONFIG_ATH5K_PCI is not set
# CONFIG_ATH9K is not set
# CONFIG_ATH9K_HTC is not set
# CONFIG_CARL9170 is not set
# CONFIG_ATH6KL is not set
# CONFIG_AR5523 is not set
# CONFIG_WIL6210 is not set
# CONFIG_ATH10K is not set
# CONFIG_WCN36XX is not set
CONFIG_WLAN_VENDOR_ATMEL=y
# CONFIG_ATMEL is not set
# CONFIG_AT76C50X_USB is not set
CONFIG_WLAN_VENDOR_BROADCOM=y
# CONFIG_B43 is not set
# CONFIG_B43LEGACY is not set
# CONFIG_BRCMSMAC is not set
# CONFIG_BRCMFMAC is not set
CONFIG_WLAN_VENDOR_CISCO=y
# CONFIG_AIRO is not set
CONFIG_WLAN_VENDOR_INTEL=y
# CONFIG_IPW2100 is not set
# CONFIG_IPW2200 is not set
# CONFIG_IWL4965 is not set
# CONFIG_IWL3945 is not set
# CONFIG_IWLWIFI is not set
CONFIG_WLAN_VENDOR_INTERSIL=y
# CONFIG_HOSTAP is not set
# CONFIG_HERMES is not set
# CONFIG_P54_COMMON is not set
# CONFIG_PRISM54 is not set
CONFIG_WLAN_VENDOR_MARVELL=y
# CONFIG_LIBERTAS is not set
# CONFIG_LIBERTAS_THINFIRM is not set
# CONFIG_MWIFIEX is not set
# CONFIG_MWL8K is not set
CONFIG_WLAN_VENDOR_MEDIATEK=y
# CONFIG_MT7601U is not set
# CONFIG_MT76x0U is not set
# CONFIG_MT76x0E is not set
# CONFIG_MT76x2E is not set
# CONFIG_MT76x2U is not set
# CONFIG_MT7603E is not set
# CONFIG_MT7615E is not set
# CONFIG_MT7663U is not set
# CONFIG_MT7915E is not set
CONFIG_WLAN_VENDOR_RALINK=y
# CONFIG_RT2X00 is not set
CONFIG_WLAN_VENDOR_REALTEK=y
# CONFIG_RTL8180 is not set
# CONFIG_RTL8187 is not set
CONFIG_RTL_CARDS=m
# CONFIG_RTL8192CE is not set
# CONFIG_RTL8192SE is not set
# CONFIG_RTL8192DE is not set
# CONFIG_RTL8723AE is not set
# CONFIG_RTL8723BE is not set
# CONFIG_RTL8188EE is not set
# CONFIG_RTL8192EE is not set
# CONFIG_RTL8821AE is not set
# CONFIG_RTL8192CU is not set
# CONFIG_RTL8XXXU is not set
# CONFIG_RTW88 is not set
CONFIG_WLAN_VENDOR_RSI=y
# CONFIG_RSI_91X is not set
CONFIG_WLAN_VENDOR_ST=y
# CONFIG_CW1200 is not set
CONFIG_WLAN_VENDOR_TI=y
# CONFIG_WL1251 is not set
# CONFIG_WL12XX is not set
# CONFIG_WL18XX is not set
# CONFIG_WLCORE is not set
CONFIG_WLAN_VENDOR_ZYDAS=y
# CONFIG_USB_ZD1201 is not set
# CONFIG_ZD1211RW is not set
CONFIG_WLAN_VENDOR_QUANTENNA=y
# CONFIG_QTNFMAC_PCIE is not set
CONFIG_MAC80211_HWSIM=m
# CONFIG_USB_NET_RNDIS_WLAN is not set
# CONFIG_VIRT_WIFI is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m
# CONFIG_IEEE802154_FAKELB is not set
# CONFIG_IEEE802154_AT86RF230 is not set
# CONFIG_IEEE802154_MRF24J40 is not set
# CONFIG_IEEE802154_CC2520 is not set
# CONFIG_IEEE802154_ATUSB is not set
# CONFIG_IEEE802154_ADF7242 is not set
# CONFIG_IEEE802154_CA8210 is not set
# CONFIG_IEEE802154_MCR20A is not set
# CONFIG_IEEE802154_HWSIM is not set
CONFIG_XEN_NETDEV_FRONTEND=y
# CONFIG_VMXNET3 is not set
# CONFIG_FUJITSU_ES is not set
# CONFIG_HYPERV_NET is not set
CONFIG_NETDEVSIM=m
CONFIG_NET_FAILOVER=m
# CONFIG_ISDN is not set
CONFIG_NVM=y
# CONFIG_NVM_PBLK is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=y
CONFIG_INPUT_FF_MEMLESS=m
CONFIG_INPUT_POLLDEV=m
CONFIG_INPUT_SPARSEKMAP=m
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
# CONFIG_KEYBOARD_APPLESPI is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1050 is not set
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_DLINK_DIR685 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_MATRIX is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_SAMSUNG is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_TM2_TOUCHKEY is not set
# CONFIG_KEYBOARD_XTKBD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_BYD=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_MOUSE_PS2_ELANTECH=y
CONFIG_MOUSE_PS2_ELANTECH_SMBUS=y
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_PS2_VMMOUSE=y
CONFIG_MOUSE_PS2_SMBUS=y
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
CONFIG_MOUSE_CYAPA=m
CONFIG_MOUSE_ELAN_I2C=m
CONFIG_MOUSE_ELAN_I2C_I2C=y
CONFIG_MOUSE_ELAN_I2C_SMBUS=y
CONFIG_MOUSE_VSXXXAA=m
# CONFIG_MOUSE_GPIO is not set
CONFIG_MOUSE_SYNAPTICS_I2C=m
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
CONFIG_RMI4_CORE=m
CONFIG_RMI4_I2C=m
CONFIG_RMI4_SPI=m
CONFIG_RMI4_SMB=m
CONFIG_RMI4_F03=y
CONFIG_RMI4_F03_SERIO=m
CONFIG_RMI4_2D_SENSOR=y
CONFIG_RMI4_F11=y
CONFIG_RMI4_F12=y
CONFIG_RMI4_F30=y
CONFIG_RMI4_F34=y
# CONFIG_RMI4_F54 is not set
CONFIG_RMI4_F55=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_SERIO_ALTERA_PS2=m
# CONFIG_SERIO_PS2MULT is not set
CONFIG_SERIO_ARC_PS2=m
CONFIG_HYPERV_KEYBOARD=m
# CONFIG_SERIO_GPIO_PS2 is not set
# CONFIG_USERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
CONFIG_SERIAL_8250_PNP=y
# CONFIG_SERIAL_8250_16550A_VARIANTS is not set
# CONFIG_SERIAL_8250_FINTEK is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_EXAR=y
CONFIG_SERIAL_8250_NR_UARTS=64
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
CONFIG_SERIAL_8250_RSA=y
CONFIG_SERIAL_8250_DWLIB=y
CONFIG_SERIAL_8250_DW=y
# CONFIG_SERIAL_8250_RT288X is not set
CONFIG_SERIAL_8250_LPSS=y
CONFIG_SERIAL_8250_MID=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MAX3100 is not set
# CONFIG_SERIAL_MAX310X is not set
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_IFX6X60 is not set
CONFIG_SERIAL_ARC=m
CONFIG_SERIAL_ARC_NR_PORTS=1
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_SPRD is not set
# end of Serial drivers

CONFIG_SERIAL_MCTRL_GPIO=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=m
# CONFIG_CYZ_INTR is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
CONFIG_SYNCLINK=m
CONFIG_SYNCLINKMP=m
CONFIG_SYNCLINK_GT=m
# CONFIG_ISI is not set
CONFIG_N_HDLC=m
CONFIG_N_GSM=m
CONFIG_NOZOMI=m
# CONFIG_NULL_TTY is not set
# CONFIG_TRACE_SINK is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IRQ=y
CONFIG_HVC_XEN=y
CONFIG_HVC_XEN_FRONTEND=y
# CONFIG_SERIAL_DEV_BUS is not set
CONFIG_PRINTER=m
# CONFIG_LP_CONSOLE is not set
CONFIG_PPDEV=m
CONFIG_VIRTIO_CONSOLE=y
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_DMI_DECODE=y
CONFIG_IPMI_PLAT_DATA=y
CONFIG_IPMI_PANIC_EVENT=y
CONFIG_IPMI_PANIC_STRING=y
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_HW_RANDOM_VIA=m
CONFIG_HW_RANDOM_VIRTIO=y
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
CONFIG_DEVMEM=y
# CONFIG_DEVKMEM is not set
CONFIG_NVRAM=y
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
CONFIG_DEVPORT=y
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HPET_MMAP_DEFAULT is not set
CONFIG_HANGCHECK_TIMER=m
CONFIG_UV_MMTIMER=m
CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_SPI is not set
CONFIG_TCG_TIS_I2C_ATMEL=m
CONFIG_TCG_TIS_I2C_INFINEON=m
CONFIG_TCG_TIS_I2C_NUVOTON=m
CONFIG_TCG_NSC=m
CONFIG_TCG_ATMEL=m
CONFIG_TCG_INFINEON=m
# CONFIG_TCG_XEN is not set
CONFIG_TCG_CRB=y
# CONFIG_TCG_VTPM_PROXY is not set
CONFIG_TCG_TIS_ST33ZP24=m
CONFIG_TCG_TIS_ST33ZP24_I2C=m
# CONFIG_TCG_TIS_ST33ZP24_SPI is not set
CONFIG_TELCLOCK=m
# CONFIG_XILLYBUS is not set
# end of Character devices

# CONFIG_RANDOM_TRUST_CPU is not set
# CONFIG_RANDOM_TRUST_BOOTLOADER is not set

#
# I2C support
#
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_MUX=m

#
# Multiplexer I2C Chip support
#
# CONFIG_I2C_MUX_GPIO is not set
# CONFIG_I2C_MUX_LTC4306 is not set
# CONFIG_I2C_MUX_PCA9541 is not set
# CONFIG_I2C_MUX_PCA954x is not set
# CONFIG_I2C_MUX_REG is not set
CONFIG_I2C_MUX_MLXCPLD=m
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
# CONFIG_I2C_AMD_MP2 is not set
CONFIG_I2C_I801=y
CONFIG_I2C_ISCH=m
CONFIG_I2C_ISMT=m
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
CONFIG_I2C_NFORCE2_S4985=m
# CONFIG_I2C_NVIDIA_GPU is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# ACPI drivers
#
CONFIG_I2C_SCMI=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_CBUS_GPIO is not set
CONFIG_I2C_DESIGNWARE_CORE=m
# CONFIG_I2C_DESIGNWARE_SLAVE is not set
CONFIG_I2C_DESIGNWARE_PLATFORM=m
CONFIG_I2C_DESIGNWARE_BAYTRAIL=y
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EMEV2 is not set
# CONFIG_I2C_GPIO is not set
# CONFIG_I2C_OCORES is not set
CONFIG_I2C_PCA_PLATFORM=m
CONFIG_I2C_SIMTEC=m
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
CONFIG_I2C_PARPORT=m
# CONFIG_I2C_ROBOTFUZZ_OSIF is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_MLXCPLD=m
# end of I2C Hardware Bus support

CONFIG_I2C_STUB=m
# CONFIG_I2C_SLAVE is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# end of I2C support

# CONFIG_I3C is not set
CONFIG_SPI=y
# CONFIG_SPI_DEBUG is not set
CONFIG_SPI_MASTER=y
# CONFIG_SPI_MEM is not set

#
# SPI Master Controller Drivers
#
# CONFIG_SPI_ALTERA is not set
# CONFIG_SPI_AXI_SPI_ENGINE is not set
# CONFIG_SPI_BITBANG is not set
# CONFIG_SPI_BUTTERFLY is not set
# CONFIG_SPI_CADENCE is not set
# CONFIG_SPI_DESIGNWARE is not set
# CONFIG_SPI_NXP_FLEXSPI is not set
# CONFIG_SPI_GPIO is not set
# CONFIG_SPI_LM70_LLP is not set
# CONFIG_SPI_OC_TINY is not set
# CONFIG_SPI_PXA2XX is not set
# CONFIG_SPI_ROCKCHIP is not set
# CONFIG_SPI_SC18IS602 is not set
# CONFIG_SPI_SIFIVE is not set
# CONFIG_SPI_MXIC is not set
# CONFIG_SPI_XCOMM is not set
# CONFIG_SPI_XILINX is not set
# CONFIG_SPI_ZYNQMP_GQSPI is not set
# CONFIG_SPI_AMD is not set

#
# SPI Multiplexer support
#
# CONFIG_SPI_MUX is not set

#
# SPI Protocol Masters
#
# CONFIG_SPI_SPIDEV is not set
# CONFIG_SPI_LOOPBACK_TEST is not set
# CONFIG_SPI_TLE62X0 is not set
# CONFIG_SPI_SLAVE is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
CONFIG_PPS_CLIENT_LDISC=m
CONFIG_PPS_CLIENT_PARPORT=m
CONFIG_PPS_CLIENT_GPIO=m

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=y
# CONFIG_DP83640_PHY is not set
# CONFIG_PTP_1588_CLOCK_INES is not set
CONFIG_PTP_1588_CLOCK_KVM=m
# CONFIG_PTP_1588_CLOCK_IDT82P33 is not set
# CONFIG_PTP_1588_CLOCK_IDTCM is not set
# CONFIG_PTP_1588_CLOCK_VMW is not set
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_PINMUX=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
# CONFIG_DEBUG_PINCTRL is not set
CONFIG_PINCTRL_AMD=m
# CONFIG_PINCTRL_MCP23S08 is not set
# CONFIG_PINCTRL_SX150X is not set
CONFIG_PINCTRL_BAYTRAIL=y
# CONFIG_PINCTRL_CHERRYVIEW is not set
# CONFIG_PINCTRL_LYNXPOINT is not set
CONFIG_PINCTRL_INTEL=m
CONFIG_PINCTRL_BROXTON=m
CONFIG_PINCTRL_CANNONLAKE=m
CONFIG_PINCTRL_CEDARFORK=m
CONFIG_PINCTRL_DENVERTON=m
CONFIG_PINCTRL_GEMINILAKE=m
# CONFIG_PINCTRL_ICELAKE is not set
# CONFIG_PINCTRL_JASPERLAKE is not set
CONFIG_PINCTRL_LEWISBURG=m
CONFIG_PINCTRL_SUNRISEPOINT=m
# CONFIG_PINCTRL_TIGERLAKE is not set
CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_GPIO_ACPI=y
CONFIG_GPIOLIB_IRQCHIP=y
# CONFIG_DEBUG_GPIO is not set
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_GENERIC=m

#
# Memory mapped GPIO drivers
#
CONFIG_GPIO_AMDPT=m
# CONFIG_GPIO_DWAPB is not set
# CONFIG_GPIO_EXAR is not set
# CONFIG_GPIO_GENERIC_PLATFORM is not set
CONFIG_GPIO_ICH=m
# CONFIG_GPIO_MB86S7X is not set
# CONFIG_GPIO_VX855 is not set
# CONFIG_GPIO_XILINX is not set
# CONFIG_GPIO_AMD_FCH is not set
# end of Memory mapped GPIO drivers

#
# Port-mapped I/O GPIO drivers
#
# CONFIG_GPIO_F7188X is not set
# CONFIG_GPIO_IT87 is not set
# CONFIG_GPIO_SCH is not set
# CONFIG_GPIO_SCH311X is not set
# CONFIG_GPIO_WINBOND is not set
# CONFIG_GPIO_WS16C48 is not set
# end of Port-mapped I/O GPIO drivers

#
# I2C GPIO expanders
#
# CONFIG_GPIO_ADP5588 is not set
# CONFIG_GPIO_MAX7300 is not set
# CONFIG_GPIO_MAX732X is not set
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCF857X is not set
# CONFIG_GPIO_TPIC2810 is not set
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
# end of MFD GPIO expanders

#
# PCI GPIO expanders
#
# CONFIG_GPIO_AMD8111 is not set
# CONFIG_GPIO_BT8XX is not set
# CONFIG_GPIO_ML_IOH is not set
# CONFIG_GPIO_PCI_IDIO_16 is not set
# CONFIG_GPIO_PCIE_IDIO_24 is not set
# CONFIG_GPIO_RDC321X is not set
# end of PCI GPIO expanders

#
# SPI GPIO expanders
#
# CONFIG_GPIO_MAX3191X is not set
# CONFIG_GPIO_MAX7301 is not set
# CONFIG_GPIO_MC33880 is not set
# CONFIG_GPIO_PISOSR is not set
# CONFIG_GPIO_XRA1403 is not set
# end of SPI GPIO expanders

#
# USB GPIO expanders
#
# end of USB GPIO expanders

# CONFIG_GPIO_AGGREGATOR is not set
# CONFIG_GPIO_MOCKUP is not set
# CONFIG_W1 is not set
# CONFIG_POWER_AVS is not set
CONFIG_POWER_RESET=y
# CONFIG_POWER_RESET_RESTART is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_POWER_SUPPLY_HWMON=y
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_CHARGER_ADP5061 is not set
# CONFIG_BATTERY_CW2015 is not set
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_CHARGER_SBS is not set
# CONFIG_MANAGER_SBS is not set
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_LP8727 is not set
# CONFIG_CHARGER_GPIO is not set
# CONFIG_CHARGER_LT3651 is not set
# CONFIG_CHARGER_BQ2415X is not set
# CONFIG_CHARGER_BQ24257 is not set
# CONFIG_CHARGER_BQ24735 is not set
# CONFIG_CHARGER_BQ25890 is not set
CONFIG_CHARGER_SMB347=m
# CONFIG_BATTERY_GAUGE_LTC2941 is not set
# CONFIG_CHARGER_RT9455 is not set
# CONFIG_CHARGER_BD99954 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
CONFIG_SENSORS_ABITUGURU=m
CONFIG_SENSORS_ABITUGURU3=m
# CONFIG_SENSORS_AD7314 is not set
CONFIG_SENSORS_AD7414=m
CONFIG_SENSORS_AD7418=m
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
CONFIG_SENSORS_ADM1029=m
CONFIG_SENSORS_ADM1031=m
# CONFIG_SENSORS_ADM1177 is not set
CONFIG_SENSORS_ADM9240=m
CONFIG_SENSORS_ADT7X10=m
# CONFIG_SENSORS_ADT7310 is not set
CONFIG_SENSORS_ADT7410=m
CONFIG_SENSORS_ADT7411=m
CONFIG_SENSORS_ADT7462=m
CONFIG_SENSORS_ADT7470=m
CONFIG_SENSORS_ADT7475=m
# CONFIG_SENSORS_AS370 is not set
CONFIG_SENSORS_ASC7621=m
# CONFIG_SENSORS_AXI_FAN_CONTROL is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_K10TEMP=m
CONFIG_SENSORS_FAM15H_POWER=m
# CONFIG_SENSORS_AMD_ENERGY is not set
CONFIG_SENSORS_APPLESMC=m
CONFIG_SENSORS_ASB100=m
# CONFIG_SENSORS_ASPEED is not set
CONFIG_SENSORS_ATXP1=m
# CONFIG_SENSORS_DRIVETEMP is not set
CONFIG_SENSORS_DS620=m
CONFIG_SENSORS_DS1621=m
CONFIG_SENSORS_DELL_SMM=m
CONFIG_SENSORS_I5K_AMB=m
CONFIG_SENSORS_F71805F=m
CONFIG_SENSORS_F71882FG=m
CONFIG_SENSORS_F75375S=m
CONFIG_SENSORS_FSCHMD=m
# CONFIG_SENSORS_FTSTEUTATES is not set
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
CONFIG_SENSORS_G760A=m
# CONFIG_SENSORS_G762 is not set
# CONFIG_SENSORS_HIH6130 is not set
CONFIG_SENSORS_IBMAEM=m
CONFIG_SENSORS_IBMPEX=m
CONFIG_SENSORS_I5500=m
CONFIG_SENSORS_CORETEMP=m
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=m
# CONFIG_SENSORS_POWR1220 is not set
CONFIG_SENSORS_LINEAGE=m
# CONFIG_SENSORS_LTC2945 is not set
# CONFIG_SENSORS_LTC2947_I2C is not set
# CONFIG_SENSORS_LTC2947_SPI is not set
# CONFIG_SENSORS_LTC2990 is not set
CONFIG_SENSORS_LTC4151=m
CONFIG_SENSORS_LTC4215=m
# CONFIG_SENSORS_LTC4222 is not set
CONFIG_SENSORS_LTC4245=m
# CONFIG_SENSORS_LTC4260 is not set
CONFIG_SENSORS_LTC4261=m
# CONFIG_SENSORS_MAX1111 is not set
CONFIG_SENSORS_MAX16065=m
CONFIG_SENSORS_MAX1619=m
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
# CONFIG_SENSORS_MAX31722 is not set
# CONFIG_SENSORS_MAX31730 is not set
# CONFIG_SENSORS_MAX6621 is not set
CONFIG_SENSORS_MAX6639=m
CONFIG_SENSORS_MAX6642=m
CONFIG_SENSORS_MAX6650=m
CONFIG_SENSORS_MAX6697=m
# CONFIG_SENSORS_MAX31790 is not set
CONFIG_SENSORS_MCP3021=m
# CONFIG_SENSORS_MLXREG_FAN is not set
# CONFIG_SENSORS_TC654 is not set
# CONFIG_SENSORS_ADCXX is not set
CONFIG_SENSORS_LM63=m
# CONFIG_SENSORS_LM70 is not set
CONFIG_SENSORS_LM73=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
CONFIG_SENSORS_LM93=m
CONFIG_SENSORS_LM95234=m
CONFIG_SENSORS_LM95241=m
CONFIG_SENSORS_LM95245=m
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
CONFIG_SENSORS_NTC_THERMISTOR=m
# CONFIG_SENSORS_NCT6683 is not set
CONFIG_SENSORS_NCT6775=m
# CONFIG_SENSORS_NCT7802 is not set
# CONFIG_SENSORS_NCT7904 is not set
# CONFIG_SENSORS_NPCM7XX is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_PMBUS=m
CONFIG_SENSORS_PMBUS=m
CONFIG_SENSORS_ADM1275=m
# CONFIG_SENSORS_BEL_PFE is not set
# CONFIG_SENSORS_IBM_CFFPS is not set
# CONFIG_SENSORS_INSPUR_IPSPS is not set
# CONFIG_SENSORS_IR35221 is not set
# CONFIG_SENSORS_IR38064 is not set
# CONFIG_SENSORS_IRPS5401 is not set
# CONFIG_SENSORS_ISL68137 is not set
CONFIG_SENSORS_LM25066=m
CONFIG_SENSORS_LTC2978=m
# CONFIG_SENSORS_LTC3815 is not set
CONFIG_SENSORS_MAX16064=m
# CONFIG_SENSORS_MAX16601 is not set
# CONFIG_SENSORS_MAX20730 is not set
# CONFIG_SENSORS_MAX20751 is not set
# CONFIG_SENSORS_MAX31785 is not set
CONFIG_SENSORS_MAX34440=m
CONFIG_SENSORS_MAX8688=m
# CONFIG_SENSORS_PXE1610 is not set
# CONFIG_SENSORS_TPS40422 is not set
# CONFIG_SENSORS_TPS53679 is not set
CONFIG_SENSORS_UCD9000=m
CONFIG_SENSORS_UCD9200=m
# CONFIG_SENSORS_XDPE122 is not set
CONFIG_SENSORS_ZL6100=m
CONFIG_SENSORS_SHT15=m
CONFIG_SENSORS_SHT21=m
# CONFIG_SENSORS_SHT3x is not set
# CONFIG_SENSORS_SHTC1 is not set
CONFIG_SENSORS_SIS5595=m
CONFIG_SENSORS_DME1737=m
CONFIG_SENSORS_EMC1403=m
# CONFIG_SENSORS_EMC2103 is not set
CONFIG_SENSORS_EMC6W201=m
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
CONFIG_SENSORS_SCH56XX_COMMON=m
CONFIG_SENSORS_SCH5627=m
CONFIG_SENSORS_SCH5636=m
# CONFIG_SENSORS_STTS751 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_ADC128D818 is not set
CONFIG_SENSORS_ADS7828=m
# CONFIG_SENSORS_ADS7871 is not set
CONFIG_SENSORS_AMC6821=m
CONFIG_SENSORS_INA209=m
CONFIG_SENSORS_INA2XX=m
# CONFIG_SENSORS_INA3221 is not set
# CONFIG_SENSORS_TC74 is not set
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_TMP102=m
# CONFIG_SENSORS_TMP103 is not set
# CONFIG_SENSORS_TMP108 is not set
CONFIG_SENSORS_TMP401=m
CONFIG_SENSORS_TMP421=m
# CONFIG_SENSORS_TMP513 is not set
CONFIG_SENSORS_VIA_CPUTEMP=m
CONFIG_SENSORS_VIA686A=m
CONFIG_SENSORS_VT1211=m
CONFIG_SENSORS_VT8231=m
# CONFIG_SENSORS_W83773G is not set
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83795=m
# CONFIG_SENSORS_W83795_FANCTRL is not set
CONFIG_SENSORS_W83L785TS=m
CONFIG_SENSORS_W83L786NG=m
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
# CONFIG_SENSORS_XGENE is not set

#
# ACPI drivers
#
CONFIG_SENSORS_ACPI_POWER=m
CONFIG_SENSORS_ATK0110=m
CONFIG_THERMAL=y
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
CONFIG_THERMAL_GOV_FAIR_SHARE=y
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_THERMAL_EMULATION is not set

#
# Intel thermal drivers
#
CONFIG_INTEL_POWERCLAMP=m
CONFIG_X86_PKG_TEMP_THERMAL=m
CONFIG_INTEL_SOC_DTS_IOSF_CORE=m
# CONFIG_INTEL_SOC_DTS_THERMAL is not set

#
# ACPI INT340X thermal drivers
#
CONFIG_INT340X_THERMAL=m
CONFIG_ACPI_THERMAL_REL=m
# CONFIG_INT3406_THERMAL is not set
CONFIG_PROC_THERMAL_MMIO_RAPL=y
# end of ACPI INT340X thermal drivers

CONFIG_INTEL_PCH_THERMAL=m
# end of Intel thermal drivers

CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
CONFIG_WATCHDOG_OPEN_TIMEOUT=0
CONFIG_WATCHDOG_SYSFS=y

#
# Watchdog Pretimeout Governors
#
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
CONFIG_WDAT_WDT=m
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_ZIIRAVE_WATCHDOG is not set
# CONFIG_MLX_WDT is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
# CONFIG_EBC_C384_WDT is not set
CONFIG_F71808E_WDT=m
CONFIG_SP5100_TCO=m
CONFIG_SBC_FITPC2_WATCHDOG=m
# CONFIG_EUROTECH_WDT is not set
CONFIG_IB700_WDT=m
CONFIG_IBMASR=m
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=y
CONFIG_IE6XX_WDT=m
CONFIG_ITCO_WDT=y
CONFIG_ITCO_VENDOR_SUPPORT=y
CONFIG_IT8712F_WDT=m
CONFIG_IT87_WDT=m
CONFIG_HP_WATCHDOG=m
CONFIG_HPWDT_NMI_DECODING=y
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
CONFIG_NV_TCO=m
# CONFIG_60XX_WDT is not set
# CONFIG_CPU5_WDT is not set
CONFIG_SMSC_SCH311X_WDT=m
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_TQMX86_WDT is not set
CONFIG_VIA_WDT=m
CONFIG_W83627HF_WDT=m
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
CONFIG_MACHZ_WDT=m
# CONFIG_SBC_EPX_C3_WATCHDOG is not set
CONFIG_INTEL_MEI_WDT=m
# CONFIG_NI903X_WDT is not set
# CONFIG_NIC7018_WDT is not set
# CONFIG_MEN_A21_WDT is not set
CONFIG_XEN_WDT=m

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
CONFIG_BCMA=m
CONFIG_BCMA_HOST_PCI_POSSIBLE=y
CONFIG_BCMA_HOST_PCI=y
# CONFIG_BCMA_HOST_SOC is not set
CONFIG_BCMA_DRIVER_PCI=y
CONFIG_BCMA_DRIVER_GMAC_CMN=y
CONFIG_BCMA_DRIVER_GPIO=y
# CONFIG_BCMA_DEBUG is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
# CONFIG_MFD_AS3711 is not set
# CONFIG_PMIC_ADP5520 is not set
# CONFIG_MFD_AAT2870_CORE is not set
# CONFIG_MFD_BCM590XX is not set
# CONFIG_MFD_BD9571MWV is not set
# CONFIG_MFD_AXP20X_I2C is not set
# CONFIG_MFD_MADERA is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_DA9052_SPI is not set
# CONFIG_MFD_DA9052_I2C is not set
# CONFIG_MFD_DA9055 is not set
# CONFIG_MFD_DA9062 is not set
# CONFIG_MFD_DA9063 is not set
# CONFIG_MFD_DA9150 is not set
# CONFIG_MFD_DLN2 is not set
# CONFIG_MFD_MC13XXX_SPI is not set
# CONFIG_MFD_MC13XXX_I2C is not set
# CONFIG_MFD_MP2629 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_HTC_I2CPLD is not set
# CONFIG_MFD_INTEL_QUARK_I2C_GPIO is not set
CONFIG_LPC_ICH=y
CONFIG_LPC_SCH=m
# CONFIG_INTEL_SOC_PMIC_CHTDC_TI is not set
CONFIG_MFD_INTEL_LPSS=y
CONFIG_MFD_INTEL_LPSS_ACPI=y
CONFIG_MFD_INTEL_LPSS_PCI=y
# CONFIG_MFD_INTEL_PMC_BXT is not set
# CONFIG_MFD_IQS62X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
# CONFIG_MFD_88PM805 is not set
# CONFIG_MFD_88PM860X is not set
# CONFIG_MFD_MAX14577 is not set
# CONFIG_MFD_MAX77693 is not set
# CONFIG_MFD_MAX77843 is not set
# CONFIG_MFD_MAX8907 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_MT6360 is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_MENF21BMC is not set
# CONFIG_EZX_PCAP is not set
# CONFIG_MFD_VIPERBOARD is not set
# CONFIG_MFD_RETU is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RT5033 is not set
# CONFIG_MFD_RC5T583 is not set
# CONFIG_MFD_SEC_CORE is not set
# CONFIG_MFD_SI476X_CORE is not set
CONFIG_MFD_SM501=m
CONFIG_MFD_SM501_GPIO=y
# CONFIG_MFD_SKY81452 is not set
# CONFIG_MFD_SMSC is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_LP3943 is not set
# CONFIG_MFD_LP8788 is not set
# CONFIG_MFD_TI_LMU is not set
# CONFIG_MFD_PALMAS is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS65010 is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TPS65086 is not set
# CONFIG_MFD_TPS65090 is not set
# CONFIG_MFD_TI_LP873X is not set
# CONFIG_MFD_TPS6586X is not set
# CONFIG_MFD_TPS65910 is not set
# CONFIG_MFD_TPS65912_I2C is not set
# CONFIG_MFD_TPS65912_SPI is not set
# CONFIG_MFD_TPS80031 is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_TWL6040_CORE is not set
# CONFIG_MFD_WL1273_CORE is not set
# CONFIG_MFD_LM3533 is not set
# CONFIG_MFD_TQMX86 is not set
CONFIG_MFD_VX855=m
# CONFIG_MFD_ARIZONA_I2C is not set
# CONFIG_MFD_ARIZONA_SPI is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM831X_SPI is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
CONFIG_RC_CORE=m
CONFIG_RC_MAP=m
CONFIG_LIRC=y
CONFIG_RC_DECODERS=y
CONFIG_IR_NEC_DECODER=m
CONFIG_IR_RC5_DECODER=m
CONFIG_IR_RC6_DECODER=m
CONFIG_IR_JVC_DECODER=m
CONFIG_IR_SONY_DECODER=m
CONFIG_IR_SANYO_DECODER=m
# CONFIG_IR_SHARP_DECODER is not set
CONFIG_IR_MCE_KBD_DECODER=m
# CONFIG_IR_XMP_DECODER is not set
CONFIG_IR_IMON_DECODER=m
# CONFIG_IR_RCMM_DECODER is not set
CONFIG_RC_DEVICES=y
# CONFIG_RC_ATI_REMOTE is not set
CONFIG_IR_ENE=m
# CONFIG_IR_IMON is not set
# CONFIG_IR_IMON_RAW is not set
# CONFIG_IR_MCEUSB is not set
CONFIG_IR_ITE_CIR=m
CONFIG_IR_FINTEK=m
CONFIG_IR_NUVOTON=m
# CONFIG_IR_REDRAT3 is not set
# CONFIG_IR_STREAMZAP is not set
CONFIG_IR_WINBOND_CIR=m
# CONFIG_IR_IGORPLUGUSB is not set
# CONFIG_IR_IGUANA is not set
# CONFIG_IR_TTUSBIR is not set
# CONFIG_RC_LOOPBACK is not set
CONFIG_IR_SERIAL=m
CONFIG_IR_SERIAL_TRANSMITTER=y
CONFIG_IR_SIR=m
# CONFIG_RC_XBOX_DVD is not set
CONFIG_MEDIA_CEC_SUPPORT=y
# CONFIG_CEC_SECO is not set
# CONFIG_USB_PULSE8_CEC is not set
# CONFIG_USB_RAINSHADOW_CEC is not set
CONFIG_MEDIA_SUPPORT=m
# CONFIG_MEDIA_SUPPORT_FILTER is not set
# CONFIG_MEDIA_SUBDRV_AUTOSELECT is not set

#
# Media device types
#
CONFIG_MEDIA_CAMERA_SUPPORT=y
CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
CONFIG_MEDIA_RADIO_SUPPORT=y
CONFIG_MEDIA_SDR_SUPPORT=y
CONFIG_MEDIA_PLATFORM_SUPPORT=y
CONFIG_MEDIA_TEST_SUPPORT=y
# end of Media device types

#
# Media core support
#
CONFIG_VIDEO_DEV=m
CONFIG_MEDIA_CONTROLLER=y
CONFIG_DVB_CORE=m
# end of Media core support

#
# Video4Linux options
#
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L2_I2C=y
CONFIG_VIDEO_V4L2_SUBDEV_API=y
# CONFIG_VIDEO_ADV_DEBUG is not set
# CONFIG_VIDEO_FIXED_MINOR_RANGES is not set
# end of Video4Linux options

#
# Media controller options
#
# CONFIG_MEDIA_CONTROLLER_DVB is not set
# end of Media controller options

#
# Digital TV options
#
# CONFIG_DVB_MMAP is not set
CONFIG_DVB_NET=y
CONFIG_DVB_MAX_ADAPTERS=16
CONFIG_DVB_DYNAMIC_MINORS=y
# CONFIG_DVB_DEMUX_SECTION_LOSS_LOG is not set
# CONFIG_DVB_ULE_DEBUG is not set
# end of Digital TV options

#
# Media drivers
#
# CONFIG_MEDIA_USB_SUPPORT is not set
# CONFIG_MEDIA_PCI_SUPPORT is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_SI470X is not set
# CONFIG_RADIO_SI4713 is not set
# CONFIG_USB_MR800 is not set
# CONFIG_USB_DSBR is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_SHARK is not set
# CONFIG_RADIO_SHARK2 is not set
# CONFIG_USB_KEENE is not set
# CONFIG_USB_RAREMONO is not set
# CONFIG_USB_MA901 is not set
# CONFIG_RADIO_TEA5764 is not set
# CONFIG_RADIO_SAA7706H is not set
# CONFIG_RADIO_TEF6862 is not set
# CONFIG_RADIO_WL1273 is not set
CONFIG_VIDEOBUF2_CORE=m
CONFIG_VIDEOBUF2_V4L2=m
CONFIG_VIDEOBUF2_MEMOPS=m
CONFIG_VIDEOBUF2_VMALLOC=m
# CONFIG_V4L_PLATFORM_DRIVERS is not set
# CONFIG_V4L_MEM2MEM_DRIVERS is not set
# CONFIG_DVB_PLATFORM_DRIVERS is not set
# CONFIG_SDR_PLATFORM_DRIVERS is not set

#
# MMC/SDIO DVB adapters
#
# CONFIG_SMS_SDIO_DRV is not set
# CONFIG_V4L_TEST_DRIVERS is not set

#
# FireWire (IEEE 1394) Adapters
#
# CONFIG_DVB_FIREDTV is not set
# end of Media drivers

#
# Media ancillary drivers
#
CONFIG_MEDIA_ATTACH=y
CONFIG_VIDEO_IR_I2C=m

#
# Audio decoders, processors and mixers
#
# CONFIG_VIDEO_TVAUDIO is not set
# CONFIG_VIDEO_TDA7432 is not set
# CONFIG_VIDEO_TDA9840 is not set
# CONFIG_VIDEO_TEA6415C is not set
# CONFIG_VIDEO_TEA6420 is not set
# CONFIG_VIDEO_MSP3400 is not set
# CONFIG_VIDEO_CS3308 is not set
# CONFIG_VIDEO_CS5345 is not set
# CONFIG_VIDEO_CS53L32A is not set
# CONFIG_VIDEO_TLV320AIC23B is not set
# CONFIG_VIDEO_UDA1342 is not set
# CONFIG_VIDEO_WM8775 is not set
# CONFIG_VIDEO_WM8739 is not set
# CONFIG_VIDEO_VP27SMPX is not set
# CONFIG_VIDEO_SONY_BTF_MPX is not set
# end of Audio decoders, processors and mixers

#
# RDS decoders
#
# CONFIG_VIDEO_SAA6588 is not set
# end of RDS decoders

#
# Video decoders
#
# CONFIG_VIDEO_ADV7180 is not set
# CONFIG_VIDEO_ADV7183 is not set
# CONFIG_VIDEO_ADV7604 is not set
# CONFIG_VIDEO_ADV7842 is not set
# CONFIG_VIDEO_BT819 is not set
# CONFIG_VIDEO_BT856 is not set
# CONFIG_VIDEO_BT866 is not set
# CONFIG_VIDEO_KS0127 is not set
# CONFIG_VIDEO_ML86V7667 is not set
# CONFIG_VIDEO_SAA7110 is not set
# CONFIG_VIDEO_SAA711X is not set
# CONFIG_VIDEO_TC358743 is not set
# CONFIG_VIDEO_TVP514X is not set
# CONFIG_VIDEO_TVP5150 is not set
# CONFIG_VIDEO_TVP7002 is not set
# CONFIG_VIDEO_TW2804 is not set
# CONFIG_VIDEO_TW9903 is not set
# CONFIG_VIDEO_TW9906 is not set
# CONFIG_VIDEO_TW9910 is not set
# CONFIG_VIDEO_VPX3220 is not set

#
# Video and audio decoders
#
# CONFIG_VIDEO_SAA717X is not set
# CONFIG_VIDEO_CX25840 is not set
# end of Video decoders

#
# Video encoders
#
# CONFIG_VIDEO_SAA7127 is not set
# CONFIG_VIDEO_SAA7185 is not set
# CONFIG_VIDEO_ADV7170 is not set
# CONFIG_VIDEO_ADV7175 is not set
# CONFIG_VIDEO_ADV7343 is not set
# CONFIG_VIDEO_ADV7393 is not set
# CONFIG_VIDEO_ADV7511 is not set
# CONFIG_VIDEO_AD9389B is not set
# CONFIG_VIDEO_AK881X is not set
# CONFIG_VIDEO_THS8200 is not set
# end of Video encoders

#
# Video improvement chips
#
# CONFIG_VIDEO_UPD64031A is not set
# CONFIG_VIDEO_UPD64083 is not set
# end of Video improvement chips

#
# Audio/Video compression chips
#
# CONFIG_VIDEO_SAA6752HS is not set
# end of Audio/Video compression chips

#
# SDR tuner chips
#
# CONFIG_SDR_MAX2175 is not set
# end of SDR tuner chips

#
# Miscellaneous helper chips
#
# CONFIG_VIDEO_THS7303 is not set
# CONFIG_VIDEO_M52790 is not set
# CONFIG_VIDEO_I2C is not set
# CONFIG_VIDEO_ST_MIPID02 is not set
# end of Miscellaneous helper chips

#
# Camera sensor devices
#
# CONFIG_VIDEO_HI556 is not set
# CONFIG_VIDEO_IMX219 is not set
# CONFIG_VIDEO_IMX258 is not set
# CONFIG_VIDEO_IMX274 is not set
# CONFIG_VIDEO_IMX290 is not set
# CONFIG_VIDEO_IMX319 is not set
# CONFIG_VIDEO_IMX355 is not set
# CONFIG_VIDEO_OV2640 is not set
# CONFIG_VIDEO_OV2659 is not set
# CONFIG_VIDEO_OV2680 is not set
# CONFIG_VIDEO_OV2685 is not set
# CONFIG_VIDEO_OV2740 is not set
# CONFIG_VIDEO_OV5647 is not set
# CONFIG_VIDEO_OV6650 is not set
# CONFIG_VIDEO_OV5670 is not set
# CONFIG_VIDEO_OV5675 is not set
# CONFIG_VIDEO_OV5695 is not set
# CONFIG_VIDEO_OV7251 is not set
# CONFIG_VIDEO_OV772X is not set
# CONFIG_VIDEO_OV7640 is not set
# CONFIG_VIDEO_OV7670 is not set
# CONFIG_VIDEO_OV7740 is not set
# CONFIG_VIDEO_OV8856 is not set
# CONFIG_VIDEO_OV9640 is not set
# CONFIG_VIDEO_OV9650 is not set
# CONFIG_VIDEO_OV13858 is not set
# CONFIG_VIDEO_VS6624 is not set
# CONFIG_VIDEO_MT9M001 is not set
# CONFIG_VIDEO_MT9M032 is not set
# CONFIG_VIDEO_MT9M111 is not set
# CONFIG_VIDEO_MT9P031 is not set
# CONFIG_VIDEO_MT9T001 is not set
# CONFIG_VIDEO_MT9T112 is not set
# CONFIG_VIDEO_MT9V011 is not set
# CONFIG_VIDEO_MT9V032 is not set
# CONFIG_VIDEO_MT9V111 is not set
# CONFIG_VIDEO_SR030PC30 is not set
# CONFIG_VIDEO_NOON010PC30 is not set
# CONFIG_VIDEO_M5MOLS is not set
# CONFIG_VIDEO_RJ54N1 is not set
# CONFIG_VIDEO_S5K6AA is not set
# CONFIG_VIDEO_S5K6A3 is not set
# CONFIG_VIDEO_S5K4ECGX is not set
# CONFIG_VIDEO_S5K5BAF is not set
# CONFIG_VIDEO_SMIAPP is not set
# CONFIG_VIDEO_ET8EK8 is not set
# CONFIG_VIDEO_S5C73M3 is not set
# end of Camera sensor devices

#
# Lens drivers
#
# CONFIG_VIDEO_AD5820 is not set
# CONFIG_VIDEO_AK7375 is not set
# CONFIG_VIDEO_DW9714 is not set
# CONFIG_VIDEO_DW9807_VCM is not set
# end of Lens drivers

#
# Flash devices
#
# CONFIG_VIDEO_ADP1653 is not set
# CONFIG_VIDEO_LM3560 is not set
# CONFIG_VIDEO_LM3646 is not set
# end of Flash devices

#
# SPI helper chips
#
# CONFIG_VIDEO_GS1662 is not set
# end of SPI helper chips

#
# Media SPI Adapters
#
CONFIG_CXD2880_SPI_DRV=m
# end of Media SPI Adapters

CONFIG_MEDIA_TUNER=m

#
# Customize TV tuners
#
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA18250=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA827X=m
CONFIG_MEDIA_TUNER_TDA18271=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MSI001=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_MT2060=m
CONFIG_MEDIA_TUNER_MT2063=m
CONFIG_MEDIA_TUNER_MT2266=m
CONFIG_MEDIA_TUNER_MT2131=m
CONFIG_MEDIA_TUNER_QT1010=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_MEDIA_TUNER_XC4000=m
CONFIG_MEDIA_TUNER_MXL5005S=m
CONFIG_MEDIA_TUNER_MXL5007T=m
CONFIG_MEDIA_TUNER_MC44S803=m
CONFIG_MEDIA_TUNER_MAX2165=m
CONFIG_MEDIA_TUNER_TDA18218=m
CONFIG_MEDIA_TUNER_FC0011=m
CONFIG_MEDIA_TUNER_FC0012=m
CONFIG_MEDIA_TUNER_FC0013=m
CONFIG_MEDIA_TUNER_TDA18212=m
CONFIG_MEDIA_TUNER_E4000=m
CONFIG_MEDIA_TUNER_FC2580=m
CONFIG_MEDIA_TUNER_M88RS6000T=m
CONFIG_MEDIA_TUNER_TUA9001=m
CONFIG_MEDIA_TUNER_SI2157=m
CONFIG_MEDIA_TUNER_IT913X=m
CONFIG_MEDIA_TUNER_R820T=m
CONFIG_MEDIA_TUNER_MXL301RF=m
CONFIG_MEDIA_TUNER_QM1D1C0042=m
CONFIG_MEDIA_TUNER_QM1D1B0004=m
# end of Customize TV tuners

#
# Customise DVB Frontends
#

#
# Multistandard (satellite) frontends
#
CONFIG_DVB_STB0899=m
CONFIG_DVB_STB6100=m
CONFIG_DVB_STV090x=m
CONFIG_DVB_STV0910=m
CONFIG_DVB_STV6110x=m
CONFIG_DVB_STV6111=m
CONFIG_DVB_MXL5XX=m
CONFIG_DVB_M88DS3103=m

#
# Multistandard (cable + terrestrial) frontends
#
CONFIG_DVB_DRXK=m
CONFIG_DVB_TDA18271C2DD=m
CONFIG_DVB_SI2165=m
CONFIG_DVB_MN88472=m
CONFIG_DVB_MN88473=m

#
# DVB-S (satellite) frontends
#
CONFIG_DVB_CX24110=m
CONFIG_DVB_CX24123=m
CONFIG_DVB_MT312=m
CONFIG_DVB_ZL10036=m
CONFIG_DVB_ZL10039=m
CONFIG_DVB_S5H1420=m
CONFIG_DVB_STV0288=m
CONFIG_DVB_STB6000=m
CONFIG_DVB_STV0299=m
CONFIG_DVB_STV6110=m
CONFIG_DVB_STV0900=m
CONFIG_DVB_TDA8083=m
CONFIG_DVB_TDA10086=m
CONFIG_DVB_TDA8261=m
CONFIG_DVB_VES1X93=m
CONFIG_DVB_TUNER_ITD1000=m
CONFIG_DVB_TUNER_CX24113=m
CONFIG_DVB_TDA826X=m
CONFIG_DVB_TUA6100=m
CONFIG_DVB_CX24116=m
CONFIG_DVB_CX24117=m
CONFIG_DVB_CX24120=m
CONFIG_DVB_SI21XX=m
CONFIG_DVB_TS2020=m
CONFIG_DVB_DS3000=m
CONFIG_DVB_MB86A16=m
CONFIG_DVB_TDA10071=m

#
# DVB-T (terrestrial) frontends
#
CONFIG_DVB_SP8870=m
CONFIG_DVB_SP887X=m
CONFIG_DVB_CX22700=m
CONFIG_DVB_CX22702=m
CONFIG_DVB_S5H1432=m
CONFIG_DVB_DRXD=m
CONFIG_DVB_L64781=m
CONFIG_DVB_TDA1004X=m
CONFIG_DVB_NXT6000=m
CONFIG_DVB_MT352=m
CONFIG_DVB_ZL10353=m
CONFIG_DVB_DIB3000MB=m
CONFIG_DVB_DIB3000MC=m
CONFIG_DVB_DIB7000M=m
CONFIG_DVB_DIB7000P=m
CONFIG_DVB_DIB9000=m
CONFIG_DVB_TDA10048=m
CONFIG_DVB_AF9013=m
CONFIG_DVB_EC100=m
CONFIG_DVB_STV0367=m
CONFIG_DVB_CXD2820R=m
CONFIG_DVB_CXD2841ER=m
CONFIG_DVB_RTL2830=m
CONFIG_DVB_RTL2832=m
CONFIG_DVB_RTL2832_SDR=m
CONFIG_DVB_SI2168=m
CONFIG_DVB_ZD1301_DEMOD=m
CONFIG_DVB_CXD2880=m

#
# DVB-C (cable) frontends
#
CONFIG_DVB_VES1820=m
CONFIG_DVB_TDA10021=m
CONFIG_DVB_TDA10023=m
CONFIG_DVB_STV0297=m

#
# ATSC (North American/Korean Terrestrial/Cable DTV) frontends
#
CONFIG_DVB_NXT200X=m
CONFIG_DVB_OR51211=m
CONFIG_DVB_OR51132=m
CONFIG_DVB_BCM3510=m
CONFIG_DVB_LGDT330X=m
CONFIG_DVB_LGDT3305=m
CONFIG_DVB_LGDT3306A=m
CONFIG_DVB_LG2160=m
CONFIG_DVB_S5H1409=m
CONFIG_DVB_AU8522=m
CONFIG_DVB_AU8522_DTV=m
CONFIG_DVB_AU8522_V4L=m
CONFIG_DVB_S5H1411=m

#
# ISDB-T (terrestrial) frontends
#
CONFIG_DVB_S921=m
CONFIG_DVB_DIB8000=m
CONFIG_DVB_MB86A20S=m

#
# ISDB-S (satellite) & ISDB-T (terrestrial) frontends
#
CONFIG_DVB_TC90522=m
CONFIG_DVB_MN88443X=m

#
# Digital terrestrial only tuners/PLL
#
CONFIG_DVB_PLL=m
CONFIG_DVB_TUNER_DIB0070=m
CONFIG_DVB_TUNER_DIB0090=m

#
# SEC control devices for DVB-S
#
CONFIG_DVB_DRX39XYJ=m
CONFIG_DVB_LNBH25=m
CONFIG_DVB_LNBH29=m
CONFIG_DVB_LNBP21=m
CONFIG_DVB_LNBP22=m
CONFIG_DVB_ISL6405=m
CONFIG_DVB_ISL6421=m
CONFIG_DVB_ISL6423=m
CONFIG_DVB_A8293=m
CONFIG_DVB_LGS8GL5=m
CONFIG_DVB_LGS8GXX=m
CONFIG_DVB_ATBM8830=m
CONFIG_DVB_TDA665x=m
CONFIG_DVB_IX2505V=m
CONFIG_DVB_M88RS2000=m
CONFIG_DVB_AF9033=m
CONFIG_DVB_HORUS3A=m
CONFIG_DVB_ASCOT2E=m
CONFIG_DVB_HELENE=m

#
# Common Interface (EN50221) controller drivers
#
CONFIG_DVB_CXD2099=m
CONFIG_DVB_SP2=m
# end of Customise DVB Frontends

#
# Tools to develop new frontends
#
# CONFIG_DVB_DUMMY_FE is not set
# end of Media ancillary drivers

#
# Graphics support
#
# CONFIG_AGP is not set
CONFIG_INTEL_GTT=m
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=64
CONFIG_VGA_SWITCHEROO=y
CONFIG_DRM=m
CONFIG_DRM_MIPI_DSI=y
CONFIG_DRM_DP_AUX_CHARDEV=y
# CONFIG_DRM_DEBUG_SELFTEST is not set
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_KMS_FB_HELPER=y
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
CONFIG_DRM_LOAD_EDID_FIRMWARE=y
# CONFIG_DRM_DP_CEC is not set
CONFIG_DRM_TTM=m
CONFIG_DRM_TTM_DMA_PAGE_POOL=y
CONFIG_DRM_VRAM_HELPER=m
CONFIG_DRM_TTM_HELPER=m
CONFIG_DRM_GEM_SHMEM_HELPER=y

#
# I2C encoder or helper chips
#
CONFIG_DRM_I2C_CH7006=m
CONFIG_DRM_I2C_SIL164=m
# CONFIG_DRM_I2C_NXP_TDA998X is not set
# CONFIG_DRM_I2C_NXP_TDA9950 is not set
# end of I2C encoder or helper chips

#
# ARM devices
#
# end of ARM devices

# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_AMDGPU is not set
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I915=m
CONFIG_DRM_I915_FORCE_PROBE=""
CONFIG_DRM_I915_CAPTURE_ERROR=y
CONFIG_DRM_I915_COMPRESS_ERROR=y
CONFIG_DRM_I915_USERPTR=y
CONFIG_DRM_I915_GVT=y
CONFIG_DRM_I915_GVT_KVMGT=m
CONFIG_DRM_I915_FENCE_TIMEOUT=10000
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=250
CONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT=8000
CONFIG_DRM_I915_STOP_TIMEOUT=100
CONFIG_DRM_I915_TIMESLICE_DURATION=1
CONFIG_DRM_VGEM=m
# CONFIG_DRM_VKMS is not set
CONFIG_DRM_VMWGFX=m
CONFIG_DRM_VMWGFX_FBCON=y
CONFIG_DRM_GMA500=m
CONFIG_DRM_GMA600=y
CONFIG_DRM_GMA3600=y
# CONFIG_DRM_UDL is not set
CONFIG_DRM_AST=m
CONFIG_DRM_MGAG200=m
CONFIG_DRM_QXL=m
CONFIG_DRM_BOCHS=m
CONFIG_DRM_VIRTIO_GPU=m
CONFIG_DRM_PANEL=y

#
# Display Panels
#
# CONFIG_DRM_PANEL_RASPBERRYPI_TOUCHSCREEN is not set
# end of Display Panels

CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y

#
# Display Interface Bridges
#
# CONFIG_DRM_ANALOGIX_ANX78XX is not set
# end of Display Interface Bridges

# CONFIG_DRM_ETNAVIV is not set
CONFIG_DRM_CIRRUS_QEMU=m
# CONFIG_DRM_GM12U320 is not set
# CONFIG_TINYDRM_HX8357D is not set
# CONFIG_TINYDRM_ILI9225 is not set
# CONFIG_TINYDRM_ILI9341 is not set
# CONFIG_TINYDRM_ILI9486 is not set
# CONFIG_TINYDRM_MI0283QT is not set
# CONFIG_TINYDRM_REPAPER is not set
# CONFIG_TINYDRM_ST7586 is not set
# CONFIG_TINYDRM_ST7735R is not set
# CONFIG_DRM_XEN is not set
# CONFIG_DRM_VBOXVIDEO is not set
# CONFIG_DRM_LEGACY is not set
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_MODE_HELPERS is not set
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_SM501 is not set
# CONFIG_FB_SMSCUFX is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_IBM_GXT4500 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_XEN_FBDEV_FRONTEND is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
CONFIG_FB_HYPERV=m
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SM712 is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=m
# CONFIG_LCD_L4F00242T03 is not set
# CONFIG_LCD_LMS283GF05 is not set
# CONFIG_LCD_LTV350QV is not set
# CONFIG_LCD_ILI922X is not set
# CONFIG_LCD_ILI9320 is not set
# CONFIG_LCD_TDO24M is not set
# CONFIG_LCD_VGG2432A4 is not set
CONFIG_LCD_PLATFORM=m
# CONFIG_LCD_AMS369FG06 is not set
# CONFIG_LCD_LMS501KF03 is not set
# CONFIG_LCD_HX8357 is not set
# CONFIG_LCD_OTM3225A is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_BACKLIGHT_PWM is not set
CONFIG_BACKLIGHT_APPLE=m
# CONFIG_BACKLIGHT_QCOM_WLED is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
# CONFIG_BACKLIGHT_LM3630A is not set
# CONFIG_BACKLIGHT_LM3639 is not set
CONFIG_BACKLIGHT_LP855X=m
# CONFIG_BACKLIGHT_GPIO is not set
# CONFIG_BACKLIGHT_LV5207LP is not set
# CONFIG_BACKLIGHT_BD6107 is not set
# CONFIG_BACKLIGHT_ARCXCNN is not set
# end of Backlight & LCD device support

CONFIG_HDMI=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
# CONFIG_VGACON_SOFT_SCROLLBACK_PERSISTENT_ENABLE_BY_DEFAULT is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER is not set
# end of Console display driver support

CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
CONFIG_HID=y
CONFIG_HID_BATTERY_STRENGTH=y
CONFIG_HIDRAW=y
CONFIG_UHID=m
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=m
# CONFIG_HID_ACCUTOUCH is not set
CONFIG_HID_ACRUX=m
# CONFIG_HID_ACRUX_FF is not set
CONFIG_HID_APPLE=m
# CONFIG_HID_APPLEIR is not set
CONFIG_HID_ASUS=m
CONFIG_HID_AUREAL=m
CONFIG_HID_BELKIN=m
# CONFIG_HID_BETOP_FF is not set
# CONFIG_HID_BIGBEN_FF is not set
CONFIG_HID_CHERRY=m
CONFIG_HID_CHICONY=m
# CONFIG_HID_CORSAIR is not set
# CONFIG_HID_COUGAR is not set
# CONFIG_HID_MACALLY is not set
CONFIG_HID_CMEDIA=m
# CONFIG_HID_CP2112 is not set
# CONFIG_HID_CREATIVE_SB0540 is not set
CONFIG_HID_CYPRESS=m
CONFIG_HID_DRAGONRISE=m
# CONFIG_DRAGONRISE_FF is not set
# CONFIG_HID_EMS_FF is not set
# CONFIG_HID_ELAN is not set
CONFIG_HID_ELECOM=m
# CONFIG_HID_ELO is not set
CONFIG_HID_EZKEY=m
CONFIG_HID_GEMBIRD=m
CONFIG_HID_GFRM=m
# CONFIG_HID_GLORIOUS is not set
# CONFIG_HID_HOLTEK is not set
# CONFIG_HID_GT683R is not set
CONFIG_HID_KEYTOUCH=m
CONFIG_HID_KYE=m
# CONFIG_HID_UCLOGIC is not set
CONFIG_HID_WALTOP=m
# CONFIG_HID_VIEWSONIC is not set
CONFIG_HID_GYRATION=m
CONFIG_HID_ICADE=m
CONFIG_HID_ITE=m
CONFIG_HID_JABRA=m
CONFIG_HID_TWINHAN=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LCPOWER=m
CONFIG_HID_LED=m
CONFIG_HID_LENOVO=m
CONFIG_HID_LOGITECH=m
CONFIG_HID_LOGITECH_DJ=m
CONFIG_HID_LOGITECH_HIDPP=m
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_LOGIG940_FF is not set
# CONFIG_LOGIWHEELS_FF is not set
CONFIG_HID_MAGICMOUSE=y
# CONFIG_HID_MALTRON is not set
# CONFIG_HID_MAYFLASH is not set
# CONFIG_HID_REDRAGON is not set
CONFIG_HID_MICROSOFT=m
CONFIG_HID_MONTEREY=m
CONFIG_HID_MULTITOUCH=m
CONFIG_HID_NTI=m
# CONFIG_HID_NTRIG is not set
CONFIG_HID_ORTEK=m
CONFIG_HID_PANTHERLORD=m
# CONFIG_PANTHERLORD_FF is not set
# CONFIG_HID_PENMOUNT is not set
CONFIG_HID_PETALYNX=m
CONFIG_HID_PICOLCD=m
CONFIG_HID_PICOLCD_FB=y
CONFIG_HID_PICOLCD_BACKLIGHT=y
CONFIG_HID_PICOLCD_LCD=y
CONFIG_HID_PICOLCD_LEDS=y
CONFIG_HID_PICOLCD_CIR=y
CONFIG_HID_PLANTRONICS=m
CONFIG_HID_PRIMAX=m
# CONFIG_HID_RETRODE is not set
# CONFIG_HID_ROCCAT is not set
CONFIG_HID_SAITEK=m
CONFIG_HID_SAMSUNG=m
# CONFIG_HID_SONY is not set
CONFIG_HID_SPEEDLINK=m
# CONFIG_HID_STEAM is not set
CONFIG_HID_STEELSERIES=m
CONFIG_HID_SUNPLUS=m
CONFIG_HID_RMI=m
CONFIG_HID_GREENASIA=m
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_HYPERV_MOUSE=m
CONFIG_HID_SMARTJOYPLUS=m
# CONFIG_SMARTJOYPLUS_FF is not set
CONFIG_HID_TIVO=m
CONFIG_HID_TOPSEED=m
CONFIG_HID_THINGM=m
CONFIG_HID_THRUSTMASTER=m
# CONFIG_THRUSTMASTER_FF is not set
# CONFIG_HID_UDRAW_PS3 is not set
# CONFIG_HID_U2FZERO is not set
# CONFIG_HID_WACOM is not set
CONFIG_HID_WIIMOTE=m
CONFIG_HID_XINMO=m
CONFIG_HID_ZEROPLUS=m
# CONFIG_ZEROPLUS_FF is not set
CONFIG_HID_ZYDACRON=m
CONFIG_HID_SENSOR_HUB=y
CONFIG_HID_SENSOR_CUSTOM_SENSOR=m
CONFIG_HID_ALPS=m
# CONFIG_HID_MCP2221 is not set
# end of Special HID drivers

#
# USB HID support
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set
# end of USB HID support

#
# I2C HID support
#
CONFIG_I2C_HID=m
# end of I2C HID support

#
# Intel ISH HID support
#
CONFIG_INTEL_ISH_HID=m
# CONFIG_INTEL_ISH_FIRMWARE_DOWNLOADER is not set
# end of Intel ISH HID support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
# CONFIG_USB_LED_TRIG is not set
# CONFIG_USB_ULPI_BUS is not set
# CONFIG_USB_CONN_GPIO is not set
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
CONFIG_USB_PCI=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEFAULT_PERSIST=y
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_OTG is not set
# CONFIG_USB_OTG_WHITELIST is not set
CONFIG_USB_LEDS_TRIGGER_USBPORT=y
CONFIG_USB_AUTOSUSPEND_DELAY=2
CONFIG_USB_MON=y

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_XHCI_HCD=y
# CONFIG_USB_XHCI_DBGCAP is not set
CONFIG_USB_XHCI_PCI=y
# CONFIG_USB_XHCI_PCI_RENESAS is not set
# CONFIG_USB_XHCI_PLATFORM is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_EHCI_PCI=y
# CONFIG_USB_EHCI_FSL is not set
# CONFIG_USB_EHCI_HCD_PLATFORM is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_FOTG210_HCD is not set
# CONFIG_USB_MAX3421_HCD is not set
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_PCI=y
# CONFIG_USB_OHCI_HCD_PLATFORM is not set
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_HCD_BCMA is not set
# CONFIG_USB_HCD_TEST_MODE is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_REALTEK is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_STORAGE_ENE_UB6250 is not set
# CONFIG_USB_UAS is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set
# CONFIG_USBIP_CORE is not set
# CONFIG_USB_CDNS3 is not set
# CONFIG_USB_MUSB_HDRC is not set
# CONFIG_USB_DWC3 is not set
# CONFIG_USB_DWC2 is not set
# CONFIG_USB_CHIPIDEA is not set
# CONFIG_USB_ISP1760 is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set
CONFIG_USB_SERIAL=m
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_SIMPLE is not set
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
# CONFIG_USB_SERIAL_BELKIN is not set
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_F81232 is not set
# CONFIG_USB_SERIAL_F8153X is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
# CONFIG_USB_SERIAL_KEYSPAN is not set
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
# CONFIG_USB_SERIAL_MCT_U232 is not set
# CONFIG_USB_SERIAL_METRO is not set
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MXUPORT is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
# CONFIG_USB_SERIAL_PL2303 is not set
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QCAUX is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_XIRCOM is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_XSENS_MT is not set
# CONFIG_USB_SERIAL_WISHBONE is not set
# CONFIG_USB_SERIAL_SSU100 is not set
# CONFIG_USB_SERIAL_QT2 is not set
# CONFIG_USB_SERIAL_UPD78F0730 is not set
CONFIG_USB_SERIAL_DEBUG=m

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_APPLE_MFI_FASTCHARGE is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_EHSET_TEST_FIXTURE is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_EZUSB_FX2 is not set
# CONFIG_USB_HUB_USB251XB is not set
# CONFIG_USB_HSIC_USB3503 is not set
# CONFIG_USB_HSIC_USB4604 is not set
# CONFIG_USB_LINK_LAYER_TEST is not set
# CONFIG_USB_CHAOSKEY is not set
# CONFIG_USB_ATM is not set

#
# USB Physical Layer drivers
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_USB_ISP1301 is not set
# end of USB Physical Layer drivers

# CONFIG_USB_GADGET is not set
CONFIG_TYPEC=y
# CONFIG_TYPEC_TCPM is not set
CONFIG_TYPEC_UCSI=y
# CONFIG_UCSI_CCG is not set
CONFIG_UCSI_ACPI=y
# CONFIG_TYPEC_TPS6598X is not set

#
# USB Type-C Multiplexer/DeMultiplexer Switch support
#
# CONFIG_TYPEC_MUX_PI3USB30532 is not set
# end of USB Type-C Multiplexer/DeMultiplexer Switch support

#
# USB Type-C Alternate Mode drivers
#
# CONFIG_TYPEC_DP_ALTMODE is not set
# end of USB Type-C Alternate Mode drivers

# CONFIG_USB_ROLE_SWITCH is not set
CONFIG_MMC=m
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_MINORS=8
CONFIG_SDIO_UART=m
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_DEBUG is not set
CONFIG_MMC_SDHCI=m
CONFIG_MMC_SDHCI_IO_ACCESSORS=y
CONFIG_MMC_SDHCI_PCI=m
CONFIG_MMC_RICOH_MMC=y
CONFIG_MMC_SDHCI_ACPI=m
CONFIG_MMC_SDHCI_PLTFM=m
# CONFIG_MMC_SDHCI_F_SDH30 is not set
# CONFIG_MMC_WBSD is not set
# CONFIG_MMC_TIFM_SD is not set
# CONFIG_MMC_SPI is not set
# CONFIG_MMC_CB710 is not set
# CONFIG_MMC_VIA_SDMMC is not set
# CONFIG_MMC_VUB300 is not set
# CONFIG_MMC_USHC is not set
# CONFIG_MMC_USDHI6ROL0 is not set
# CONFIG_MMC_REALTEK_PCI is not set
CONFIG_MMC_CQHCI=m
# CONFIG_MMC_HSQ is not set
# CONFIG_MMC_TOSHIBA_PCI is not set
# CONFIG_MMC_MTK is not set
# CONFIG_MMC_SDHCI_XENON is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set

#
# LED drivers
#
# CONFIG_LEDS_APU is not set
CONFIG_LEDS_LM3530=m
# CONFIG_LEDS_LM3532 is not set
# CONFIG_LEDS_LM3642 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_GPIO is not set
CONFIG_LEDS_LP3944=m
# CONFIG_LEDS_LP3952 is not set
CONFIG_LEDS_LP55XX_COMMON=m
CONFIG_LEDS_LP5521=m
CONFIG_LEDS_LP5523=m
CONFIG_LEDS_LP5562=m
# CONFIG_LEDS_LP8501 is not set
CONFIG_LEDS_CLEVO_MAIL=m
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_PCA963X is not set
# CONFIG_LEDS_DAC124S085 is not set
# CONFIG_LEDS_PWM is not set
# CONFIG_LEDS_BD2802 is not set
CONFIG_LEDS_INTEL_SS4200=m
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_TLC591XX is not set
# CONFIG_LEDS_LM355x is not set

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=m
CONFIG_LEDS_MLXCPLD=m
# CONFIG_LEDS_MLXREG is not set
# CONFIG_LEDS_USER is not set
# CONFIG_LEDS_NIC78BX is not set
# CONFIG_LEDS_TI_LMU_COMMON is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_ONESHOT=m
# CONFIG_LEDS_TRIGGER_DISK is not set
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_LEDS_TRIGGER_BACKLIGHT=m
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_ACTIVITY is not set
CONFIG_LEDS_TRIGGER_GPIO=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=m
CONFIG_LEDS_TRIGGER_CAMERA=m
# CONFIG_LEDS_TRIGGER_PANIC is not set
# CONFIG_LEDS_TRIGGER_NETDEV is not set
# CONFIG_LEDS_TRIGGER_PATTERN is not set
CONFIG_LEDS_TRIGGER_AUDIO=m
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
# CONFIG_INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI is not set
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
# CONFIG_INFINIBAND_MTHCA is not set
# CONFIG_INFINIBAND_EFA is not set
# CONFIG_INFINIBAND_I40IW is not set
# CONFIG_MLX4_INFINIBAND is not set
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_INFINIBAND_USNIC is not set
# CONFIG_INFINIBAND_BNXT_RE is not set
# CONFIG_INFINIBAND_RDMAVT is not set
CONFIG_RDMA_RXE=m
CONFIG_RDMA_SIW=m
CONFIG_INFINIBAND_IPOIB=m
# CONFIG_INFINIBAND_IPOIB_CM is not set
CONFIG_INFINIBAND_IPOIB_DEBUG=y
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_SRPT=m
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_INFINIBAND_ISERT is not set
# CONFIG_INFINIBAND_RTRS_CLIENT is not set
# CONFIG_INFINIBAND_RTRS_SERVER is not set
# CONFIG_INFINIBAND_OPA_VNIC is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=m
CONFIG_EDAC_GHES=y
CONFIG_EDAC_AMD64=m
# CONFIG_EDAC_AMD64_ERROR_INJECTION is not set
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
CONFIG_EDAC_I3200=m
CONFIG_EDAC_IE31200=m
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=m
CONFIG_EDAC_I7CORE=m
CONFIG_EDAC_I5000=m
CONFIG_EDAC_I5100=m
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=m
CONFIG_EDAC_SKX=m
# CONFIG_EDAC_I10NM is not set
CONFIG_EDAC_PND2=m
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_SYSTOHC is not set
# CONFIG_RTC_DEBUG is not set
CONFIG_RTC_NVMEM=y

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_ABB5ZES3 is not set
# CONFIG_RTC_DRV_ABEOZ9 is not set
# CONFIG_RTC_DRV_ABX80X is not set
CONFIG_RTC_DRV_DS1307=m
# CONFIG_RTC_DRV_DS1307_CENTURY is not set
CONFIG_RTC_DRV_DS1374=m
# CONFIG_RTC_DRV_DS1374_WDT is not set
CONFIG_RTC_DRV_DS1672=m
CONFIG_RTC_DRV_MAX6900=m
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_ISL12022=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8523=m
# CONFIG_RTC_DRV_PCF85063 is not set
# CONFIG_RTC_DRV_PCF85363 is not set
CONFIG_RTC_DRV_PCF8563=m
CONFIG_RTC_DRV_PCF8583=m
CONFIG_RTC_DRV_M41T80=m
CONFIG_RTC_DRV_M41T80_WDT=y
CONFIG_RTC_DRV_BQ32K=m
# CONFIG_RTC_DRV_S35390A is not set
CONFIG_RTC_DRV_FM3130=m
# CONFIG_RTC_DRV_RX8010 is not set
CONFIG_RTC_DRV_RX8581=m
CONFIG_RTC_DRV_RX8025=m
CONFIG_RTC_DRV_EM3027=m
# CONFIG_RTC_DRV_RV3028 is not set
# CONFIG_RTC_DRV_RV8803 is not set
# CONFIG_RTC_DRV_SD3078 is not set

#
# SPI RTC drivers
#
# CONFIG_RTC_DRV_M41T93 is not set
# CONFIG_RTC_DRV_M41T94 is not set
# CONFIG_RTC_DRV_DS1302 is not set
# CONFIG_RTC_DRV_DS1305 is not set
# CONFIG_RTC_DRV_DS1343 is not set
# CONFIG_RTC_DRV_DS1347 is not set
# CONFIG_RTC_DRV_DS1390 is not set
# CONFIG_RTC_DRV_MAX6916 is not set
# CONFIG_RTC_DRV_R9701 is not set
CONFIG_RTC_DRV_RX4581=m
# CONFIG_RTC_DRV_RX6110 is not set
# CONFIG_RTC_DRV_RS5C348 is not set
# CONFIG_RTC_DRV_MAX6902 is not set
# CONFIG_RTC_DRV_PCF2123 is not set
# CONFIG_RTC_DRV_MCP795 is not set
CONFIG_RTC_I2C_AND_SPI=y

#
# SPI and I2C RTC drivers
#
CONFIG_RTC_DRV_DS3232=m
CONFIG_RTC_DRV_DS3232_HWMON=y
# CONFIG_RTC_DRV_PCF2127 is not set
CONFIG_RTC_DRV_RV3029C2=m
# CONFIG_RTC_DRV_RV3029_HWMON is not set

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1286=m
CONFIG_RTC_DRV_DS1511=m
CONFIG_RTC_DRV_DS1553=m
# CONFIG_RTC_DRV_DS1685_FAMILY is not set
CONFIG_RTC_DRV_DS1742=m
CONFIG_RTC_DRV_DS2404=m
CONFIG_RTC_DRV_STK17TA8=m
# CONFIG_RTC_DRV_M48T86 is not set
CONFIG_RTC_DRV_M48T35=m
CONFIG_RTC_DRV_M48T59=m
CONFIG_RTC_DRV_MSM6242=m
CONFIG_RTC_DRV_BQ4802=m
CONFIG_RTC_DRV_RP5C01=m
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_RTC_DRV_FTRTC010 is not set

#
# HID Sensor RTC drivers
#
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_ACPI=y
# CONFIG_ALTERA_MSGDMA is not set
CONFIG_INTEL_IDMA64=m
# CONFIG_INTEL_IDXD is not set
CONFIG_INTEL_IOATDMA=m
# CONFIG_PLX_DMA is not set
# CONFIG_QCOM_HIDMA_MGMT is not set
# CONFIG_QCOM_HIDMA is not set
CONFIG_DW_DMAC_CORE=y
CONFIG_DW_DMAC=m
CONFIG_DW_DMAC_PCI=y
# CONFIG_DW_EDMA is not set
# CONFIG_DW_EDMA_PCIE is not set
CONFIG_HSU_DMA=y
# CONFIG_SF_PDMA is not set

#
# DMA Clients
#
CONFIG_ASYNC_TX_DMA=y
CONFIG_DMATEST=m
CONFIG_DMA_ENGINE_RAID=y

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# end of DMABUF options

CONFIG_DCA=m
# CONFIG_AUXDISPLAY is not set
# CONFIG_PANEL is not set
CONFIG_UIO=m
CONFIG_UIO_CIF=m
CONFIG_UIO_PDRV_GENIRQ=m
# CONFIG_UIO_DMEM_GENIRQ is not set
CONFIG_UIO_AEC=m
CONFIG_UIO_SERCOS3=m
CONFIG_UIO_PCI_GENERIC=m
# CONFIG_UIO_NETX is not set
# CONFIG_UIO_PRUSS is not set
# CONFIG_UIO_MF624 is not set
CONFIG_UIO_HV_GENERIC=m
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=m
CONFIG_VFIO_NOIOMMU=y
CONFIG_VFIO_PCI=m
# CONFIG_VFIO_PCI_VGA is not set
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
# CONFIG_VFIO_PCI_IGD is not set
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_PCI_LEGACY=y
# CONFIG_VIRTIO_PMEM is not set
CONFIG_VIRTIO_BALLOON=y
CONFIG_VIRTIO_MEM=m
CONFIG_VIRTIO_INPUT=m
# CONFIG_VIRTIO_MMIO is not set
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
CONFIG_VHOST_NET=m
# CONFIG_VHOST_SCSI is not set
CONFIG_VHOST_VSOCK=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
CONFIG_HYPERV=m
CONFIG_HYPERV_TIMER=y
CONFIG_HYPERV_UTILS=m
CONFIG_HYPERV_BALLOON=m
# end of Microsoft Hyper-V guest support

#
# Xen driver support
#
# CONFIG_XEN_BALLOON is not set
CONFIG_XEN_DEV_EVTCHN=m
# CONFIG_XEN_BACKEND is not set
CONFIG_XENFS=m
CONFIG_XEN_COMPAT_XENFS=y
CONFIG_XEN_SYS_HYPERVISOR=y
CONFIG_XEN_XENBUS_FRONTEND=y
# CONFIG_XEN_GNTDEV is not set
# CONFIG_XEN_GRANT_DEV_ALLOC is not set
# CONFIG_XEN_GRANT_DMA_ALLOC is not set
CONFIG_SWIOTLB_XEN=y
# CONFIG_XEN_PVCALLS_FRONTEND is not set
CONFIG_XEN_PRIVCMD=m
CONFIG_XEN_EFI=y
CONFIG_XEN_AUTO_XLATE=y
CONFIG_XEN_ACPI=y
# end of Xen driver support

# CONFIG_GREYBUS is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
CONFIG_ACPI_WMI=m
CONFIG_WMI_BMOF=m
# CONFIG_ALIENWARE_WMI is not set
# CONFIG_HUAWEI_WMI is not set
# CONFIG_INTEL_WMI_SBL_FW_UPDATE is not set
CONFIG_INTEL_WMI_THUNDERBOLT=m
CONFIG_MXM_WMI=m
# CONFIG_PEAQ_WMI is not set
# CONFIG_XIAOMI_WMI is not set
CONFIG_ACERHDF=m
# CONFIG_ACER_WIRELESS is not set
CONFIG_ACER_WMI=m
CONFIG_APPLE_GMUX=m
CONFIG_ASUS_LAPTOP=m
# CONFIG_ASUS_WIRELESS is not set
CONFIG_ASUS_WMI=m
CONFIG_ASUS_NB_WMI=m
CONFIG_EEEPC_LAPTOP=m
CONFIG_EEEPC_WMI=m
CONFIG_DCDBAS=m
CONFIG_DELL_SMBIOS=m
CONFIG_DELL_SMBIOS_WMI=y
# CONFIG_DELL_SMBIOS_SMM is not set
CONFIG_DELL_LAPTOP=m
CONFIG_DELL_RBTN=m
CONFIG_DELL_RBU=m
CONFIG_DELL_SMO8800=m
CONFIG_DELL_WMI=m
CONFIG_DELL_WMI_DESCRIPTOR=m
CONFIG_DELL_WMI_AIO=m
CONFIG_DELL_WMI_LED=m
CONFIG_AMILO_RFKILL=m
CONFIG_FUJITSU_LAPTOP=m
CONFIG_FUJITSU_TABLET=m
# CONFIG_GPD_POCKET_FAN is not set
CONFIG_HP_ACCEL=m
CONFIG_HP_WIRELESS=m
CONFIG_HP_WMI=m
# CONFIG_IBM_RTL is not set
CONFIG_IDEAPAD_LAPTOP=m
CONFIG_SENSORS_HDAPS=m
CONFIG_THINKPAD_ACPI=m
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set
# CONFIG_THINKPAD_ACPI_UNSAFE_LEDS is not set
CONFIG_THINKPAD_ACPI_VIDEO=y
CONFIG_THINKPAD_ACPI_HOTKEY_POLL=y
# CONFIG_INTEL_ATOMISP2_PM is not set
CONFIG_INTEL_HID_EVENT=m
# CONFIG_INTEL_INT0002_VGPIO is not set
# CONFIG_INTEL_MENLOW is not set
CONFIG_INTEL_OAKTRAIL=m
CONFIG_INTEL_VBTN=m
# CONFIG_SURFACE3_WMI is not set
# CONFIG_SURFACE_3_POWER_OPREGION is not set
# CONFIG_SURFACE_PRO3_BUTTON is not set
CONFIG_MSI_LAPTOP=m
CONFIG_MSI_WMI=m
# CONFIG_PCENGINES_APU2 is not set
CONFIG_SAMSUNG_LAPTOP=m
CONFIG_SAMSUNG_Q10=m
CONFIG_TOSHIBA_BT_RFKILL=m
# CONFIG_TOSHIBA_HAPS is not set
# CONFIG_TOSHIBA_WMI is not set
CONFIG_ACPI_CMPC=m
CONFIG_COMPAL_LAPTOP=m
# CONFIG_LG_LAPTOP is not set
CONFIG_PANASONIC_LAPTOP=m
CONFIG_SONY_LAPTOP=m
CONFIG_SONYPI_COMPAT=y
# CONFIG_SYSTEM76_ACPI is not set
CONFIG_TOPSTAR_LAPTOP=m
# CONFIG_I2C_MULTI_INSTANTIATE is not set
CONFIG_MLX_PLATFORM=m
CONFIG_INTEL_IPS=m
CONFIG_INTEL_RST=m
# CONFIG_INTEL_SMARTCONNECT is not set

#
# Intel Speed Select Technology interface support
#
# CONFIG_INTEL_SPEED_SELECT_INTERFACE is not set
# end of Intel Speed Select Technology interface support

CONFIG_INTEL_TURBO_MAX_3=y
# CONFIG_INTEL_UNCORE_FREQ_CONTROL is not set
CONFIG_INTEL_PMC_CORE=m
# CONFIG_INTEL_PUNIT_IPC is not set
# CONFIG_INTEL_SCU_PCI is not set
# CONFIG_INTEL_SCU_PLATFORM is not set
CONFIG_PMC_ATOM=y
# CONFIG_MFD_CROS_EC is not set
# CONFIG_CHROME_PLATFORMS is not set
CONFIG_MELLANOX_PLATFORM=y
CONFIG_MLXREG_HOTPLUG=m
# CONFIG_MLXREG_IO is not set
CONFIG_HAVE_CLK=y
CONFIG_CLKDEV_LOOKUP=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
# CONFIG_COMMON_CLK_MAX9485 is not set
# CONFIG_COMMON_CLK_SI5341 is not set
# CONFIG_COMMON_CLK_SI5351 is not set
# CONFIG_COMMON_CLK_SI544 is not set
# CONFIG_COMMON_CLK_CDCE706 is not set
# CONFIG_COMMON_CLK_CS2000_CP is not set
# CONFIG_COMMON_CLK_PWM is not set
CONFIG_HWSPINLOCK=y

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# end of Clock Source drivers

CONFIG_MAILBOX=y
CONFIG_PCC=y
# CONFIG_ALTERA_MBOX is not set
CONFIG_IOMMU_IOVA=y
CONFIG_IOASID=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_IOMMU_DMA=y
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_V2=m
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
# CONFIG_INTEL_IOMMU_SVM is not set
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set
CONFIG_IRQ_REMAP=y
CONFIG_HYPERV_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_QCOM_GLINK_RPM is not set
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

# CONFIG_SOUNDWIRE is not set

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Aspeed SoC drivers
#
# end of Aspeed SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# CONFIG_XILINX_VCU is not set
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
CONFIG_NTB=m
# CONFIG_NTB_MSI is not set
# CONFIG_NTB_AMD is not set
# CONFIG_NTB_IDT is not set
# CONFIG_NTB_INTEL is not set
# CONFIG_NTB_SWITCHTEC is not set
# CONFIG_NTB_PINGPONG is not set
# CONFIG_NTB_TOOL is not set
# CONFIG_NTB_PERF is not set
# CONFIG_NTB_TRANSPORT is not set
# CONFIG_VME_BUS is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
# CONFIG_PWM_DEBUG is not set
CONFIG_PWM_LPSS=m
CONFIG_PWM_LPSS_PCI=m
CONFIG_PWM_LPSS_PLATFORM=m
# CONFIG_PWM_PCA9685 is not set

#
# IRQ chip support
#
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_BCM_KONA_USB2_PHY is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_PHY_INTEL_EMMC is not set
# end of PHY Subsystem

CONFIG_POWERCAP=y
CONFIG_INTEL_RAPL_CORE=m
CONFIG_INTEL_RAPL=m
# CONFIG_IDLE_INJECT is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# end of Performance monitor support

CONFIG_RAS=y
# CONFIG_RAS_CEC is not set
# CONFIG_USB4 is not set

#
# Android
#
# CONFIG_ANDROID is not set
# end of Android

CONFIG_LIBNVDIMM=m
CONFIG_BLK_DEV_PMEM=m
CONFIG_ND_BLK=m
CONFIG_ND_CLAIM=y
CONFIG_ND_BTT=m
CONFIG_BTT=y
CONFIG_ND_PFN=m
CONFIG_NVDIMM_PFN=y
CONFIG_NVDIMM_DAX=y
CONFIG_NVDIMM_KEYS=y
CONFIG_DAX_DRIVER=y
CONFIG_DAX=y
CONFIG_DEV_DAX=m
CONFIG_DEV_DAX_PMEM=m
CONFIG_DEV_DAX_KMEM=m
CONFIG_DEV_DAX_PMEM_COMPAT=m
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y

#
# HW tracing support
#
CONFIG_STM=m
# CONFIG_STM_PROTO_BASIC is not set
# CONFIG_STM_PROTO_SYS_T is not set
CONFIG_STM_DUMMY=m
CONFIG_STM_SOURCE_CONSOLE=m
CONFIG_STM_SOURCE_HEARTBEAT=m
CONFIG_STM_SOURCE_FTRACE=m
CONFIG_INTEL_TH=m
CONFIG_INTEL_TH_PCI=m
CONFIG_INTEL_TH_ACPI=m
CONFIG_INTEL_TH_GTH=m
CONFIG_INTEL_TH_STH=m
CONFIG_INTEL_TH_MSU=m
CONFIG_INTEL_TH_PTI=m
# CONFIG_INTEL_TH_DEBUG is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_TEE is not set
# CONFIG_UNISYS_VISORBUS is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
CONFIG_EXT4_KUNIT_TESTS=m
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_XFS_FS=m
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_XFS_ONLINE_REPAIR=y
CONFIG_XFS_DEBUG=y
CONFIG_XFS_ASSERT_FATAL=y
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
CONFIG_OCFS2_FS_STATS=y
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
# CONFIG_NILFS2_FS is not set
CONFIG_F2FS_FS=m
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_FS_SECURITY=y
# CONFIG_F2FS_CHECK_FS is not set
# CONFIG_F2FS_IO_TRACE is not set
# CONFIG_F2FS_FAULT_INJECTION is not set
# CONFIG_F2FS_FS_COMPRESSION is not set
# CONFIG_ZONEFS_FS is not set
CONFIG_FS_DAX=y
CONFIG_FS_DAX_PMD=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
CONFIG_MANDATORY_FILE_LOCKING=y
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_ENCRYPTION_ALGS=y
# CONFIG_FS_VERITY is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_QUOTACTL_COMPAT=y
CONFIG_AUTOFS4_FS=y
CONFIG_AUTOFS_FS=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
# CONFIG_VIRTIO_FS is not set
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
# CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_XINO_AUTO is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_FSCACHE=m
CONFIG_FSCACHE_STATS=y
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
# CONFIG_FSCACHE_OBJECT_LIST is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_HISTOGRAM is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_FAT_DEFAULT_UTF8 is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_VMCORE_DEVICE_DUMP=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_PROC_CPU_RESCTRL=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_EFIVAR_FS=y
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
CONFIG_CRAMFS_BLOCKDEV=y
CONFIG_SQUASHFS=m
# CONFIG_SQUASHFS_FILE_CACHE is not set
CONFIG_SQUASHFS_FILE_DIRECT=y
# CONFIG_SQUASHFS_DECOMP_SINGLE is not set
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
# CONFIG_SQUASHFS_LZ4 is not set
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
# CONFIG_SQUASHFS_ZSTD is not set
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
CONFIG_MINIX_FS=m
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
# CONFIG_PSTORE_842_COMPRESS is not set
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
# CONFIG_PSTORE_CONSOLE is not set
# CONFIG_PSTORE_PMSG is not set
# CONFIG_PSTORE_FTRACE is not set
CONFIG_PSTORE_RAM=m
# CONFIG_PSTORE_BLK is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EROFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
# CONFIG_NFS_V2 is not set
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
# CONFIG_NFS_SWAP is not set
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_PNFS_FILE_LAYOUT=m
CONFIG_PNFS_BLOCK=m
CONFIG_PNFS_FLEXFILE_LAYOUT=m
CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
# CONFIG_NFS_V4_1_MIGRATION is not set
CONFIG_NFS_V4_SECURITY_LABEL=y
CONFIG_ROOT_NFS=y
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DEBUG=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_PNFS=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
CONFIG_NFSD_SCSILAYOUT=y
# CONFIG_NFSD_FLEXFILELAYOUT is not set
# CONFIG_NFSD_V4_2_INTER_SSC is not set
CONFIG_NFSD_V4_SECURITY_LABEL=y
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_BACKCHANNEL=y
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES is not set
CONFIG_SUNRPC_DEBUG=y
CONFIG_SUNRPC_XPRT_RDMA=m
CONFIG_CEPH_FS=m
# CONFIG_CEPH_FSCACHE is not set
CONFIG_CEPH_FS_POSIX_ACL=y
# CONFIG_CEPH_FS_SECURITY_LABEL is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS2 is not set
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
CONFIG_CIFS_DEBUG=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_DEBUG_DUMP_KEYS is not set
CONFIG_CIFS_DFS_UPCALL=y
# CONFIG_CIFS_SMB_DIRECT is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
CONFIG_9P_FS=y
CONFIG_9P_FS_POSIX_ACL=y
# CONFIG_9P_FS_SECURITY is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
CONFIG_NLS_MAC_CELTIC=m
CONFIG_NLS_MAC_CENTEURO=m
CONFIG_NLS_MAC_CROATIAN=m
CONFIG_NLS_MAC_CYRILLIC=m
CONFIG_NLS_MAC_GAELIC=m
CONFIG_NLS_MAC_GREEK=m
CONFIG_NLS_MAC_ICELAND=m
CONFIG_NLS_MAC_INUIT=m
CONFIG_NLS_MAC_ROMANIAN=m
CONFIG_NLS_MAC_TURKISH=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
CONFIG_TRUSTED_KEYS=y
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_KEY_DH_OPERATIONS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITY_WRITABLE_HOOKS=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_PAGE_TABLE_ISOLATION=y
# CONFIG_SECURITY_INFINIBAND is not set
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_PATH=y
CONFIG_INTEL_TXT=y
CONFIG_LSM_MMAP_MIN_ADDR=65535
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_HARDENED_USERCOPY_FALLBACK=y
CONFIG_FORTIFY_SOURCE=y
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS=9
CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE=256
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_APPARMOR_HASH=y
CONFIG_SECURITY_APPARMOR_HASH_DEFAULT=y
# CONFIG_SECURITY_APPARMOR_DEBUG is not set
# CONFIG_SECURITY_APPARMOR_KUNIT_TEST is not set
# CONFIG_SECURITY_LOADPIN is not set
CONFIG_SECURITY_YAMA=y
# CONFIG_SECURITY_SAFESETID is not set
# CONFIG_SECURITY_LOCKDOWN_LSM is not set
CONFIG_INTEGRITY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
# CONFIG_INTEGRITY_PLATFORM_KEYRING is not set
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_LSM_RULES=y
# CONFIG_IMA_TEMPLATE is not set
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
CONFIG_IMA_DEFAULT_HASH_SHA1=y
# CONFIG_IMA_DEFAULT_HASH_SHA256 is not set
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
CONFIG_IMA_DEFAULT_HASH="sha1"
# CONFIG_IMA_WRITE_POLICY is not set
# CONFIG_IMA_READ_POLICY is not set
CONFIG_IMA_APPRAISE=y
# CONFIG_IMA_ARCH_POLICY is not set
# CONFIG_IMA_APPRAISE_BUILD_POLICY is not set
CONFIG_IMA_APPRAISE_BOOTPARAM=y
# CONFIG_IMA_APPRAISE_MODSIG is not set
CONFIG_IMA_TRUSTED_KEYRING=y
# CONFIG_IMA_BLACKLIST_KEYRING is not set
# CONFIG_IMA_LOAD_X509 is not set
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y
CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
CONFIG_EVM=y
CONFIG_EVM_ATTR_FSUUID=y
# CONFIG_EVM_ADD_XATTRS is not set
# CONFIG_EVM_LOAD_X509 is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_APPARMOR is not set
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
# end of Memory initialization
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=m
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_SIMD=y
CONFIG_CRYPTO_GLUE_HELPER_X86=y

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=m
CONFIG_CRYPTO_ECC=m
CONFIG_CRYPTO_ECDH=m
# CONFIG_CRYPTO_ECRDSA is not set
# CONFIG_CRYPTO_CURVE25519 is not set
# CONFIG_CRYPTO_CURVE25519_X86 is not set

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=m
# CONFIG_CRYPTO_AEGIS128 is not set
# CONFIG_CRYPTO_AEGIS128_AESNI_SSE2 is not set
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=m
# CONFIG_CRYPTO_OFB is not set
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=y
# CONFIG_CRYPTO_KEYWRAP is not set
# CONFIG_CRYPTO_NHPOLY1305_SSE2 is not set
# CONFIG_CRYPTO_NHPOLY1305_AVX2 is not set
# CONFIG_CRYPTO_ADIANTUM is not set
CONFIG_CRYPTO_ESSIV=m

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRC32_PCLMUL=m
CONFIG_CRYPTO_XXHASH=m
CONFIG_CRYPTO_BLAKE2B=m
# CONFIG_CRYPTO_BLAKE2S is not set
# CONFIG_CRYPTO_BLAKE2S_X86 is not set
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_POLY1305=m
CONFIG_CRYPTO_POLY1305_X86_64=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA1_SSSE3=y
CONFIG_CRYPTO_SHA256_SSSE3=y
CONFIG_CRYPTO_SHA512_SSSE3=m
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=m
# CONFIG_CRYPTO_SM3 is not set
# CONFIG_CRYPTO_STREEBOG is not set
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
CONFIG_CRYPTO_AES_NI_INTEL=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_BLOWFISH_X86_64=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAMELLIA_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST5_AVX_X86_64=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_CAST6_AVX_X86_64=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_DES3_EDE_X86_64=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_CHACHA20=m
CONFIG_CRYPTO_CHACHA20_X86_64=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SERPENT_SSE2_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX2_X86_64=m
# CONFIG_CRYPTO_SM4 is not set
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_X86_64=m
CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=m
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=y
# CONFIG_CRYPTO_842 is not set
# CONFIG_CRYPTO_LZ4 is not set
# CONFIG_CRYPTO_LZ4HC is not set
# CONFIG_CRYPTO_ZSTD is not set

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y
CONFIG_CRYPTO_USER_API_RNG=y
CONFIG_CRYPTO_USER_API_AEAD=y
# CONFIG_CRYPTO_STATS is not set
CONFIG_CRYPTO_HASH_INFO=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
# CONFIG_CRYPTO_LIB_BLAKE2S is not set
CONFIG_CRYPTO_ARCH_HAVE_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=m
# CONFIG_CRYPTO_LIB_CHACHA is not set
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_DES=m
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11
CONFIG_CRYPTO_ARCH_HAVE_LIB_POLY1305=m
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=m
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA256=y
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_PADLOCK=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_DEV_PADLOCK_SHA=m
# CONFIG_CRYPTO_DEV_ATMEL_ECC is not set
# CONFIG_CRYPTO_DEV_ATMEL_SHA204A is not set
CONFIG_CRYPTO_DEV_CCP=y
CONFIG_CRYPTO_DEV_CCP_DD=y
CONFIG_CRYPTO_DEV_SP_CCP=y
CONFIG_CRYPTO_DEV_CCP_CRYPTO=m
CONFIG_CRYPTO_DEV_SP_PSP=y
# CONFIG_CRYPTO_DEV_CCP_DEBUGFS is not set
CONFIG_CRYPTO_DEV_QAT=m
CONFIG_CRYPTO_DEV_QAT_DH895xCC=m
CONFIG_CRYPTO_DEV_QAT_C3XXX=m
CONFIG_CRYPTO_DEV_QAT_C62X=m
CONFIG_CRYPTO_DEV_QAT_DH895xCCVF=m
CONFIG_CRYPTO_DEV_QAT_C3XXXVF=m
CONFIG_CRYPTO_DEV_QAT_C62XVF=m
CONFIG_CRYPTO_DEV_NITROX=m
CONFIG_CRYPTO_DEV_NITROX_CNN55XX=m
# CONFIG_CRYPTO_DEV_VIRTIO is not set
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
# CONFIG_ASYMMETRIC_TPM_KEY_SUBTYPE is not set
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
CONFIG_SIGNED_PE_FILE_VERIFICATION=y

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
# CONFIG_SECONDARY_TRUSTED_KEYRING is not set
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_RAID6_PQ_BENCHMARK=y
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_CORDIC=m
# CONFIG_PRIME_NUMBERS is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC64 is not set
# CONFIG_CRC4 is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=m
CONFIG_ZSTD_DECOMPRESS=m
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_ENC8=y
CONFIG_REED_SOLOMON_DEC8=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED=y
CONFIG_DMA_VIRT_OPS=y
CONFIG_SWIOTLB=y
CONFIG_DMA_COHERENT_POOL=y
CONFIG_DMA_CMA=y

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=200
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DMA_API_DEBUG is not set
CONFIG_SGL_ALLOC=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_CPUMASK_OFFSTACK=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_SIGNATURE=y
CONFIG_DIMLIB=y
CONFIG_OID_REGISTRY=y
CONFIG_UCS2_STRING=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SG_POOL=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_MEMREGION=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_HAS_UACCESS_MCSAFE=y
CONFIG_ARCH_STACKWALK=y
CONFIG_SBITMAP=y
# CONFIG_STRING_SELFTEST is not set
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
# CONFIG_PRINTK_CALLER is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_BOOT_PRINTK_DELAY=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DYNAMIC_DEBUG_CORE=y
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_GDB_SCRIPTS is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_STACK_VALIDATION=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
# end of Generic Kernel Debugging Instruments

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_PTDUMP_DEBUGFS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_KASAN_STACK=1
# end of Memory Debugging

CONFIG_DEBUG_SHIRQ=y

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_WQ_WATCHDOG is not set
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_LOCK_TORTURE_TEST=m
# CONFIG_WW_MUTEX_SELFTEST is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_BUG_ON_DATA_CORRUPTION=y
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
CONFIG_TORTURE_TEST=m
CONFIG_RCU_PERF_TEST=m
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
CONFIG_LATENCYTOP=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_BOOTTIME_TRACING is not set
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_STACK_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_HWLAT_TRACER=y
# CONFIG_MMIOTRACE is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_UPROBE_EVENTS=y
CONFIG_BPF_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
# CONFIG_BPF_KPROBE_OVERRIDE is not set
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_TRACING_MAP=y
CONFIG_SYNTH_EVENTS=y
CONFIG_HIST_TRIGGERS=y
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
CONFIG_RING_BUFFER_BENCHMARK=m
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_SYNTH_EVENT_GEN_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_HIST_TRIGGERS_DEBUG is not set
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
CONFIG_STRICT_DEVMEM=y
# CONFIG_IO_STRICT_DEVMEM is not set

#
# x86 Debugging
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_EARLY_PRINTK_USB=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
CONFIG_EARLY_PRINTK_USB_XDBC=y
# CONFIG_EFI_PGT_DUMP is not set
# CONFIG_DEBUG_TLBFLUSH is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_X86_DECODER_SELFTEST=y
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_DEBUG_NMI_SELFTEST is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_PUNIT_ATOM_DEBUG is not set
CONFIG_UNWINDER_ORC=y
# CONFIG_UNWINDER_FRAME_POINTER is not set
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=y
# CONFIG_KUNIT_DEBUGFS is not set
CONFIG_KUNIT_TEST=m
CONFIG_KUNIT_EXAMPLE_TEST=m
# CONFIG_KUNIT_ALL_TESTS is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
# CONFIG_FAIL_PAGE_ALLOC is not set
CONFIG_FAIL_MAKE_REQUEST=y
# CONFIG_FAIL_IO_TIMEOUT is not set
# CONFIG_FAIL_FUTEX is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
# CONFIG_FAIL_FUNCTION is not set
# CONFIG_FAIL_MMC_REQUEST is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
CONFIG_RUNTIME_TESTING_MENU=y
# CONFIG_LKDTM is not set
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_SORT is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
# CONFIG_PERCPU_TEST is not set
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_BITFIELD is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_OVERFLOW is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_HASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
CONFIG_TEST_BPF=m
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
CONFIG_SYSCTL_KUNIT_TEST=m
CONFIG_LIST_KUNIT_TEST=m
# CONFIG_LINEAR_RANGES_TEST is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_MEMCAT_P is not set
# CONFIG_TEST_LIVEPATCH is not set
# CONFIG_TEST_STACKINIT is not set
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_HMM is not set
# CONFIG_MEMTEST is not set
# CONFIG_HYPERV_TESTING is not set
# end of Kernel Testing and Coverage
# end of Kernel hacking

[-- Attachment #3: job-script --]
[-- Type: text/plain, Size: 7989 bytes --]

#!/bin/sh

export_top_env()
{
	export suite='will-it-scale'
	export testcase='will-it-scale'
	export category='benchmark'
	export nr_task=96
	export job_origin='/lkp-src/allot/cyclic:p1:linux-devel:devel-hourly/lkp-csl-2sp4/will-it-scale-100.yaml'
	export queue_cmdline_keys='branch
commit
queue_at_least_once'
	export queue='validate'
	export testbox='lkp-csl-2sp4'
	export tbox_group='lkp-csl-2sp4'
	export kconfig='x86_64-rhel-8.3'
	export submit_id='5f6df14ceac1a12caa79218b'
	export job_file='/lkp/jobs/scheduled/lkp-csl-2sp4/will-it-scale-performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f-debian-10.4-x86_64-20200603.cgz-698ac7610f7928ddfa4-20200925-11434-193ok2k-2.yaml'
	export id='a026c1b517533af77c30af8e82de0249787812c5'
	export queuer_version='/lkp-src'
	export model='Cascade Lake'
	export nr_node=2
	export nr_cpu=96
	export memory='128G'
	export nr_hdd_partitions=1
	export hdd_partitions='/dev/disk/by-id/ata-ST9500620NS_9XF26E30-part1'
	export nr_ssd_partitions=2
	export ssd_partitions='/dev/disk/by-id/ata-INTEL_SSDSC2BG012T4_BTHC427503001P2KGN-part1
/dev/disk/by-id/ata-INTEL_SSDSC2BG012T4_BTHC427503001P2KGN-part2'
	export swap_partitions=
	export rootfs_partition='/dev/disk/by-id/ata-INTEL_SSDSC2BB800G4_CVWL3426000V800RGN-part1'
	export brand='Intel(R) Xeon(R) CPU @ 2.30GHz'
	export commit='698ac7610f7928ddfa44a0736e89d776579d8b82'
	export ucode='0x4002f01'
	export need_kconfig_hw='CONFIG_I40E=y
CONFIG_SATA_AHCI
CONFIG_BLK_DEV_NVME'
	export enqueue_time='2020-09-25 21:31:56 +0800'
	export _id='5f6df14ceac1a12caa79218b'
	export _rt='/result/will-it-scale/performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f/lkp-csl-2sp4/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82'
	export user='lkp'
	export compiler='gcc-9'
	export head_commit='ced6027df965d69d06d53cea5ba8c53782439e6e'
	export base_commit='ba4f184e126b751d1bffad5897f263108befc780'
	export branch='linux-review/Peter-Xu/mm-Break-COW-for-pinned-pages-during-fork/20200922-052211'
	export rootfs='debian-10.4-x86_64-20200603.cgz'
	export monitor_sha='a122c70f'
	export result_root='/result/will-it-scale/performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f/lkp-csl-2sp4/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/3'
	export scheduler_version='/lkp/lkp/.src-20200925-111930'
	export LKP_SERVER='inn'
	export arch='x86_64'
	export max_uptime=3600
	export initrd='/osimage/debian/debian-10.4-x86_64-20200603.cgz'
	export bootloader_append='root=/dev/ram0
user=lkp
job=/lkp/jobs/scheduled/lkp-csl-2sp4/will-it-scale-performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f-debian-10.4-x86_64-20200603.cgz-698ac7610f7928ddfa4-20200925-11434-193ok2k-2.yaml
ARCH=x86_64
kconfig=x86_64-rhel-8.3
branch=linux-review/Peter-Xu/mm-Break-COW-for-pinned-pages-during-fork/20200922-052211
commit=698ac7610f7928ddfa44a0736e89d776579d8b82
BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/vmlinuz-5.8.0-00001-g698ac7610f7928
max_uptime=3600
RESULT_ROOT=/result/will-it-scale/performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f/lkp-csl-2sp4/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/3
LKP_SERVER=inn
nokaslr
selinux=0
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err
ignore_loglevel
console=tty0
earlyprintk=ttyS0,115200
console=ttyS0,115200
vga=normal
rw'
	export modules_initrd='/pkg/linux/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/modules.cgz'
	export bm_initrd='/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20200709.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/will-it-scale_20200619.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/will-it-scale-x86_64-0f26364-1_20200619.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/mpstat_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/perf_20200723.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/perf-x86_64-34d4ddd359db-1_20200909.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/sar-x86_64-34c92ae-1_20200702.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz'
	export ucode_initrd='/osimage/ucode/intel-ucode-20200610.cgz'
	export lkp_initrd='/osimage/user/lkp/lkp-x86_64.cgz'
	export site='inn'
	export LKP_CGI_PORT=80
	export LKP_CIFS_PORT=139
	export last_kernel='5.9.0-rc6'
	export repeat_to=4
	export schedule_notify_address=
	export queue_at_least_once=1
	export kernel='/pkg/linux/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/vmlinuz-5.8.0-00001-g698ac7610f7928'
	export dequeue_time='2020-09-25 21:41:39 +0800'
	export job_initrd='/lkp/jobs/scheduled/lkp-csl-2sp4/will-it-scale-performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f-debian-10.4-x86_64-20200603.cgz-698ac7610f7928ddfa4-20200925-11434-193ok2k-2.cgz'

	[ -n "$LKP_SRC" ] ||
	export LKP_SRC=/lkp/${user:-lkp}/src
}

run_job()
{
	echo $$ > $TMP/run-job.pid

	. $LKP_SRC/lib/http.sh
	. $LKP_SRC/lib/job.sh
	. $LKP_SRC/lib/env.sh

	export_top_env

	run_setup $LKP_SRC/setup/cpufreq_governor 'performance'

	run_monitor $LKP_SRC/monitors/wrapper kmsg
	run_monitor $LKP_SRC/monitors/no-stdout/wrapper boot-time
	run_monitor $LKP_SRC/monitors/wrapper uptime
	run_monitor $LKP_SRC/monitors/wrapper iostat
	run_monitor $LKP_SRC/monitors/wrapper heartbeat
	run_monitor $LKP_SRC/monitors/wrapper vmstat
	run_monitor $LKP_SRC/monitors/wrapper numa-numastat
	run_monitor $LKP_SRC/monitors/wrapper numa-vmstat
	run_monitor $LKP_SRC/monitors/wrapper numa-meminfo
	run_monitor $LKP_SRC/monitors/wrapper proc-vmstat
	run_monitor $LKP_SRC/monitors/wrapper proc-stat
	run_monitor $LKP_SRC/monitors/wrapper meminfo
	run_monitor $LKP_SRC/monitors/wrapper slabinfo
	run_monitor $LKP_SRC/monitors/wrapper interrupts
	run_monitor $LKP_SRC/monitors/wrapper lock_stat
	run_monitor $LKP_SRC/monitors/wrapper latency_stats
	run_monitor $LKP_SRC/monitors/wrapper softirqs
	run_monitor $LKP_SRC/monitors/one-shot/wrapper bdi_dev_mapping
	run_monitor $LKP_SRC/monitors/wrapper diskstats
	run_monitor $LKP_SRC/monitors/wrapper nfsstat
	run_monitor $LKP_SRC/monitors/wrapper cpuidle
	run_monitor $LKP_SRC/monitors/wrapper cpufreq-stats
	run_monitor $LKP_SRC/monitors/wrapper sched_debug
	run_monitor $LKP_SRC/monitors/wrapper perf-stat
	run_monitor $LKP_SRC/monitors/wrapper mpstat
	run_monitor $LKP_SRC/monitors/no-stdout/wrapper perf-profile
	run_monitor $LKP_SRC/monitors/wrapper oom-killer
	run_monitor $LKP_SRC/monitors/plain/watchdog

	run_test mode='thread' test='mmap2' $LKP_SRC/tests/wrapper will-it-scale
}

extract_stats()
{
	export stats_part_begin=
	export stats_part_end=

	$LKP_SRC/stats/wrapper will-it-scale
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper boot-time
	$LKP_SRC/stats/wrapper uptime
	$LKP_SRC/stats/wrapper iostat
	$LKP_SRC/stats/wrapper vmstat
	$LKP_SRC/stats/wrapper numa-numastat
	$LKP_SRC/stats/wrapper numa-vmstat
	$LKP_SRC/stats/wrapper numa-meminfo
	$LKP_SRC/stats/wrapper proc-vmstat
	$LKP_SRC/stats/wrapper meminfo
	$LKP_SRC/stats/wrapper slabinfo
	$LKP_SRC/stats/wrapper interrupts
	$LKP_SRC/stats/wrapper lock_stat
	$LKP_SRC/stats/wrapper latency_stats
	$LKP_SRC/stats/wrapper softirqs
	$LKP_SRC/stats/wrapper diskstats
	$LKP_SRC/stats/wrapper nfsstat
	$LKP_SRC/stats/wrapper cpuidle
	$LKP_SRC/stats/wrapper sched_debug
	$LKP_SRC/stats/wrapper perf-stat
	$LKP_SRC/stats/wrapper mpstat
	$LKP_SRC/stats/wrapper perf-profile

	$LKP_SRC/stats/wrapper time will-it-scale.time
	$LKP_SRC/stats/wrapper dmesg
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper last_state
	$LKP_SRC/stats/wrapper stderr
	$LKP_SRC/stats/wrapper time
}

"$@"

[-- Attachment #4: job.yaml --]
[-- Type: text/plain, Size: 5431 bytes --]

---

#! jobs/will-it-scale-100.yaml
suite: will-it-scale
testcase: will-it-scale
category: benchmark
nr_task: 100%
will-it-scale:
  mode: thread
  test: mmap2
job_origin: "/lkp-src/allot/cyclic:p1:linux-devel:devel-hourly/lkp-csl-2sp4/will-it-scale-100.yaml"

#! queue options
queue_cmdline_keys:
- branch
- commit
- queue_at_least_once
queue: bisect
testbox: lkp-csl-2sp4
tbox_group: lkp-csl-2sp4
kconfig: x86_64-rhel-8.3
submit_id: 5f6dd55c47820a581a57ea97
job_file: "/lkp/jobs/scheduled/lkp-csl-2sp4/will-it-scale-performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f-debian-10.4-x86_64-20200603.cgz-698ac7610f7928ddfa4-20200925-22554-e9sw9j-1.yaml"
id: 4fc76c958f13376d2789d58088356147e7c751bc
queuer_version: "/lkp-src"

#! hosts/lkp-csl-2sp4
model: Cascade Lake
nr_node: 2
nr_cpu: 96
memory: 128G
nr_hdd_partitions: 1
hdd_partitions:
- "/dev/disk/by-id/ata-ST9500620NS_9XF26E30-part1"
nr_ssd_partitions: 2
ssd_partitions:
- "/dev/disk/by-id/ata-INTEL_SSDSC2BG012T4_BTHC427503001P2KGN-part1"
- "/dev/disk/by-id/ata-INTEL_SSDSC2BG012T4_BTHC427503001P2KGN-part2"
swap_partitions: 
rootfs_partition: "/dev/disk/by-id/ata-INTEL_SSDSC2BB800G4_CVWL3426000V800RGN-part1"
brand: Intel(R) Xeon(R) CPU @ 2.30GHz

#! include/category/benchmark
kmsg: 
boot-time: 
uptime: 
iostat: 
heartbeat: 
vmstat: 
numa-numastat: 
numa-vmstat: 
numa-meminfo: 
proc-vmstat: 
proc-stat: 
meminfo: 
slabinfo: 
interrupts: 
lock_stat: 
latency_stats: 
softirqs: 
bdi_dev_mapping: 
diskstats: 
nfsstat: 
cpuidle: 
cpufreq-stats: 
sched_debug: 
perf-stat: 
mpstat: 
perf-profile: 

#! include/category/ALL
cpufreq_governor: performance

#! include/queue/cyclic
commit: 698ac7610f7928ddfa44a0736e89d776579d8b82

#! include/testbox/lkp-csl-2sp4
ucode: '0x4002f01'
need_kconfig_hw:
- CONFIG_I40E=y
- CONFIG_SATA_AHCI
- CONFIG_BLK_DEV_NVME
enqueue_time: 2020-09-25 19:32:45.098013876 +08:00
_id: 5f6de9d147820a581a57ea98
_rt: "/result/will-it-scale/performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f/lkp-csl-2sp4/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82"

#! schedule options
user: lkp
compiler: gcc-9
head_commit: ced6027df965d69d06d53cea5ba8c53782439e6e
base_commit: ba4f184e126b751d1bffad5897f263108befc780
branch: linux-devel/devel-hourly-2020092504
rootfs: debian-10.4-x86_64-20200603.cgz
monitor_sha: a122c70f
result_root: "/result/will-it-scale/performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f/lkp-csl-2sp4/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/0"
scheduler_version: "/lkp/lkp/.src-20200925-111930"
LKP_SERVER: inn
arch: x86_64
max_uptime: 3600
initrd: "/osimage/debian/debian-10.4-x86_64-20200603.cgz"
bootloader_append:
- root=/dev/ram0
- user=lkp
- job=/lkp/jobs/scheduled/lkp-csl-2sp4/will-it-scale-performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f-debian-10.4-x86_64-20200603.cgz-698ac7610f7928ddfa4-20200925-22554-e9sw9j-1.yaml
- ARCH=x86_64
- kconfig=x86_64-rhel-8.3
- branch=linux-devel/devel-hourly-2020092504
- commit=698ac7610f7928ddfa44a0736e89d776579d8b82
- BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/vmlinuz-5.8.0-00001-g698ac7610f7928
- max_uptime=3600
- RESULT_ROOT=/result/will-it-scale/performance-thread-100%-mmap2-ucode=0x4002f01-monitor=a122c70f/lkp-csl-2sp4/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/0
- LKP_SERVER=inn
- nokaslr
- selinux=0
- debug
- apic=debug
- sysrq_always_enabled
- rcupdate.rcu_cpu_stall_timeout=100
- net.ifnames=0
- printk.devkmsg=on
- panic=-1
- softlockup_panic=1
- nmi_watchdog=panic
- oops=panic
- load_ramdisk=2
- prompt_ramdisk=0
- drbd.minor_count=8
- systemd.log_level=err
- ignore_loglevel
- console=tty0
- earlyprintk=ttyS0,115200
- console=ttyS0,115200
- vga=normal
- rw
modules_initrd: "/pkg/linux/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/modules.cgz"
bm_initrd: "/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20200709.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/will-it-scale_20200619.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/will-it-scale-x86_64-0f26364-1_20200619.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/mpstat_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/perf_20200723.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/perf-x86_64-34d4ddd359db-1_20200909.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/sar-x86_64-34c92ae-1_20200702.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz"
ucode_initrd: "/osimage/ucode/intel-ucode-20200610.cgz"
lkp_initrd: "/osimage/user/lkp/lkp-x86_64.cgz"
site: inn

#! /lkp/lkp/.src-20200925-111930/include/site/inn
LKP_CGI_PORT: 80
LKP_CIFS_PORT: 139
oom-killer: 
watchdog: 

#! runtime status
last_kernel: 5.9.0-rc6-intel-next-01500-gab7b1d5de7576
repeat_to: 2
schedule_notify_address: 

#! user overrides
queue_at_least_once: 0
kernel: "/pkg/linux/x86_64-rhel-8.3/gcc-9/698ac7610f7928ddfa44a0736e89d776579d8b82/vmlinuz-5.8.0-00001-g698ac7610f7928"
dequeue_time: 2020-09-25 21:05:46.909492781 +08:00
job_state: finished
loadavg: 74.47 58.11 26.24 1/728 10342
start_time: '1601039195'
end_time: '1601039498'
version: "/lkp/lkp/.src-20200925-112002:62884c4b-dirty:1569b47e2-dirty"

[-- Attachment #5: reproduce --]
[-- Type: text/plain, Size: 335 bytes --]


for cpu_dir in /sys/devices/system/cpu/cpu[0-9]*
do
	online_file="$cpu_dir"/online
	[ -f "$online_file" ] && [ "$(cat "$online_file")" -eq 0 ] && continue

	file="$cpu_dir"/cpufreq/scaling_governor
	[ -f "$file" ] && echo "performance" > "$file"
done

 "/lkp/benchmarks/python3/bin/python3" "./runtest.py" "mmap2" "295" "thread" "96"

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-26 22:28                                       ` Linus Torvalds
@ 2020-09-27  6:23                                         ` Leon Romanovsky
  2020-09-27 18:16                                           ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Leon Romanovsky @ 2020-09-27  6:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, Peter Xu, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Sat, Sep 26, 2020 at 03:28:32PM -0700, Linus Torvalds wrote:
> On Fri, Sep 25, 2020 at 6:15 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I think that over the weekend I'll do Peter's version but with the
> > "page_mapcount() == 1"  check, because I'm starting to like that
> > better than the mm->has_pinned.
>
> Actually, rafter the first read-through, I feel like I'll just apply
> the series as-is.
>
> But I'll look at it some more, and do another read-through and make
> the final decision tomorrow.
>
> If anybody has any concerns about the v2 patch series from Peter, holler.

We won't be able to test the series till Tuesday due to religious holidays.

Thanks

>
>               Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-27  6:23                                         ` Leon Romanovsky
@ 2020-09-27 18:16                                           ` Linus Torvalds
  2020-09-27 18:45                                             ` Linus Torvalds
  2020-09-28 17:13                                             ` Peter Xu
  0 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2020-09-27 18:16 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Peter Xu, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Sat, Sep 26, 2020 at 11:24 PM Leon Romanovsky <leonro@nvidia.com> wrote:
>
> We won't be able to test the series till Tuesday due to religious holidays.

That's fine. I've merged the series up, and it will be in rc7 later
today, and with an rc8 next week we'll have two weeks to find any
issues.

I did edit Peter's patch 3/4 quite a bit:

 - remove the pte_mkyoung(), because there's no _access_ at fork()
time (so it's very different in that respect to the fault case)

 - added the lru_cache_add_inactive_or_unevictable() call that I think
is needed and Peter's patch was missing

 - split up the "copy page" into its own function kind of like I had
done for my minimal patch

 - changed the comments around a bit (mostly due to that split-up
changing some of the flow)

but it should be otherwise the same and I think Peter will still
recognize it as his patch (and I left it with his authorship and
sign-off).

Anyway, it's merged locally in my tree, I'll do some local testing
(and I have a few other pull requests), but I'll push out the end
result soonish assuming nothing shows up (and considering that I don't
have any pinning loads, I seriously doubt it will, since I won't see
any of the interesting cases).

So if we can get the actual rdma cases tested early next week, we'll
be in good shape, I think.

Btw, I'm not convinced about the whole "turn the pte read-only and
then back". If the fork races with another thread doing a pinning
fast-GUP on another CPU, there are memory ordering issues etc too.
That's not necessarily visible on x86 (the "turn read-only being a
locked op will force serialization), but it all looks dodgy as heck.

I'm also not convinced we really want to support that kind of insane
racy load, but whatever. I left the code in place, but it's something
where we might want to rethink things.

              Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-27 18:16                                           ` Linus Torvalds
@ 2020-09-27 18:45                                             ` Linus Torvalds
  2020-09-28 12:49                                               ` Jason Gunthorpe
  2020-09-28 17:13                                             ` Peter Xu
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-27 18:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Peter Xu, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Sun, Sep 27, 2020 at 11:16 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Btw, I'm not convinced about the whole "turn the pte read-only and
> then back". If the fork races with another thread doing a pinning
> fast-GUP on another CPU, there are memory ordering issues etc too.
> That's not necessarily visible on x86 (the "turn read-only being a
> locked op will force serialization), but it all looks dodgy as heck.

.. looking at it more, I also think it could possibly lose the dirty
bit for the case where another CPU did a HW dirty/accessed bit update
in between the original read of the pte, and then us writing back the
writable pte again.

Us holding the page table lock means that no _software_ accesses will
happen to the PTE, but dirty/accessed bits can be modified by hardware
despite the lock.

That is, of course, a completely crazy case, and I think that since we
only do this for a COW mapping, and only do the PTE changes if the pte
was writable, the pte will always have been dirty already.

So I don't think it's an _actual_ bug, but it's another "this looks
dodgy as heck" marker. It may _work_, but it sure ain't pretty.

But despite having looked at this quite a bit, I don't see anything
that looks actively wrong, so I think the series is fine. This is more
of a note for people to perhaps think about.

                Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-27 18:45                                             ` Linus Torvalds
@ 2020-09-28 12:49                                               ` Jason Gunthorpe
  2020-09-28 16:17                                                 ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-28 12:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Leon Romanovsky, Peter Xu, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Sun, Sep 27, 2020 at 11:45:30AM -0700, Linus Torvalds wrote:
> On Sun, Sep 27, 2020 at 11:16 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Btw, I'm not convinced about the whole "turn the pte read-only and
> > then back". If the fork races with another thread doing a pinning
> > fast-GUP on another CPU, there are memory ordering issues etc too.
> > That's not necessarily visible on x86 (the "turn read-only being a
> > locked op will force serialization), but it all looks dodgy as heck.

Oh. Yes, looking again the atomics in the final arrangement of
copy_present_page() aren't going to be strong enough to order this.

Sorry for missing, wasn't able to look very deeply during the weekend.

Not seeing an obvious option besides adding a smp_mb() before
page_maybe_dma_pinned() as Peter once suggested.

> .. looking at it more, I also think it could possibly lose the dirty
> bit for the case where another CPU did a HW dirty/accessed bit update
> in between the original read of the pte, and then us writing back the
> writable pte again.

Ah, I see:

               set_pte_at(src_mm, addr, src_pte, pte);

wants to be some arch specific single bit ptep_clear_wrprotect()..

Thanks,
Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 12:49                                               ` Jason Gunthorpe
@ 2020-09-28 16:17                                                 ` Linus Torvalds
  2020-09-28 17:22                                                   ` Peter Xu
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-28 16:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Peter Xu, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 5:49 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> Not seeing an obvious option besides adding a smp_mb() before
> page_maybe_dma_pinned() as Peter once suggested.

That is going to be prohibitively expensive - needing it for each pte
whether it's pinned or not.

I really think the better option is a "don't do that then". This has
_never_ worked before either except by pure luck.

I also doubt anybody does it. forking with threads is a bad idea to
begin with. Doing so while pinning pages even more so.

               Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-27 18:16                                           ` Linus Torvalds
  2020-09-27 18:45                                             ` Linus Torvalds
@ 2020-09-28 17:13                                             ` Peter Xu
  1 sibling, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-28 17:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Leon Romanovsky, Jason Gunthorpe, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Sun, Sep 27, 2020 at 11:16:34AM -0700, Linus Torvalds wrote:
>  - split up the "copy page" into its own function kind of like I had
> done for my minimal patch

I didn't do that majorly because of the wrprotect() (of the fast-gup race) that
can potentially be kept if it's a normal cow.  Maybe we'd add one more comment
above the caller of copy_present_page() (even if we have a "NOTE!" section
already inside the helper) since it'll change *src_pte and it's hard to notice
from the function name "copy_present_page()".  Not that important, though.

Thanks for doing the changes.  I went over the whole patch and it looks indeed
cleaner than before (I also didn't spot anything wrong either).

Regarding the other even rarer "hardware race on dirty/access bits" - maybe we
can simply set all these two bits always when page copy happens and we want to
recover the ptes?  I also agree it's trivial enough, so that maybe we even
don't need to care about that.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 16:17                                                 ` Linus Torvalds
@ 2020-09-28 17:22                                                   ` Peter Xu
  2020-09-28 17:54                                                     ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Peter Xu @ 2020-09-28 17:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, Leon Romanovsky, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 09:17:09AM -0700, Linus Torvalds wrote:
> On Mon, Sep 28, 2020 at 5:49 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > Not seeing an obvious option besides adding a smp_mb() before
> > page_maybe_dma_pinned() as Peter once suggested.
> 
> That is going to be prohibitively expensive - needing it for each pte
> whether it's pinned or not.
> 
> I really think the better option is a "don't do that then". This has
> _never_ worked before either except by pure luck.

Yes...  Actually I am also thinking about the complete solution to cover
read-only fast-gups too, but now I start to doubt this, at least for the fork()
path.  E.g. if we'd finally like to use pte_protnone() to replace the current
pte_wrprotect(), we'll be able to also block the read gups, but we'll suffer
the same degradation on normal fork()s, or even more.  Seems unacceptable.

The other question is, whether we should emphasize and document somewhere that
MADV_DONTFORK is still (and should always be) the preferred way, because
changes like this series can potentially encourage the other way.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 17:22                                                   ` Peter Xu
@ 2020-09-28 17:54                                                     ` Linus Torvalds
  2020-09-28 18:39                                                       ` Jason Gunthorpe
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-28 17:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jason Gunthorpe, Leon Romanovsky, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 10:23 AM Peter Xu <peterx@redhat.com> wrote:
>
> Yes...  Actually I am also thinking about the complete solution to cover
> read-only fast-gups too, but now I start to doubt this, at least for the fork()
> path.  E.g. if we'd finally like to use pte_protnone() to replace the current
> pte_wrprotect(), we'll be able to also block the read gups, but we'll suffer
> the same degradation on normal fork()s, or even more.  Seems unacceptable.

So I think the real question about pinned read gups is what semantics
they should have.

Because honestly, I think we have two options:

 - the current "it gets a shared copy from the page tables"

 - the "this is an exclusive pin, and it _will_ follow the source VM
changes, and never break"

because honestly, if we get a shared copy at the time of the pinning
(like we do now), then "fork()" is entirely immaterial. The fork() can
have happened ages ago, that page is shared with other processes, and
anybody process writing to it - including very much the pinning one -
will cause a copy-on-write and get a copy of the page.

IOW, the current - and past - semantics for read pinning is that you
get a copy of the page, but any changes made by the pinning process
may OR MAY NOT show up in your pinned copy.

Again: doing a concurrent fork() is entirely immaterial, because the
page can have been made a read-only COW page by _previous_ fork()
calls (or KSM logic or whatever).

In other words: read pinning gets a page efficiently, but there is
zero guarantee of any future coherence with the process doing
subsequent writes.

That has always been the semantics, and FOLL_PIN didn't change that at
all. You may have had things that worked almost by accident (ie you
had made the page private by writing to it after the fork, so the read
pinning _effectively_ gave you a page that was coherent), but even
that was always accidental rather than anything else. Afaik it could
easily be broken by KSM, for example.

In other words, a read pin isn't really any different from a read GUP.
You get a reference to a page that is valid at the time of the page
lookup, and absolutely nothing more.

Now, the alternative is to make a read pin have the same guarantees as
a write pin, and say "this will stay attached to this MM until unmap
or unpin".

But honestly, that is largely going to _be_ the same as a write pin,
because it absolutely needs to do a page COW at the time of the
pinning to get that initial exclusive guarantee in the first place.
Without that initial exclusivity, you cannot avoid future COW events
breaking the wrong way.

So I think the "you get a reference to the page at the time of the
pin, and the page _may_ or may not change under you if the original
process writes to it" are really the only relevant semantics. Because
if you need those exclusive semantics, you might as well just use a
write pin.

The downside of a write pin is that it not only makes that page
exclusive, it also (a) marks it dirty and (b) requires write access.
That can matter particularly for shared mappings. So if you know
you're doing the pin on a shared mmap, then a read pin is the right
thing, because the page will stay around - not because of the VM it
happens in, but because of the underlying file mapping!

See the difference?

> The other question is, whether we should emphasize and document somewhere that
> MADV_DONTFORK is still (and should always be) the preferred way, because
> changes like this series can potentially encourage the other way.

I really suspect that the concurrent fork() case is fundamentally hard
to handle.

Is it impossible? No. Even without any real locking, we could change
the code to do a seqcount_t, for example. The fastgup code wouldn't
take a lock, but it would just fail and fall back to the slow code if
the sequence count fails.

So the copy_page_range() code would do a write count around the copy:

    write_seqcount_begin(&mm->seq);
    .. do the copy ..
    write_seqcount_end(&mm->seq);

and the fast-gup code would do a

    seq = raw_read_seqcount(&mm->seq);
    if (seq & 1)
        return -EAGAIN;

at the top, and do a

    if (__read_seqcount_t_retry(&mm->seq, seq) {
       .. Uhhuh, that failed, drop the ref to the page again ..
        return -EAGAIN;
    }

after getting the pin reference.

We could make this conditional on FOLL_PIN, or maybe even a new flag
("FOLL_FORK_CONSISTENT").

So I think we can serialize with fork() without serializing each and every PTE.

If we want to and really need to.

Hmm?

               Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 17:54                                                     ` Linus Torvalds
@ 2020-09-28 18:39                                                       ` Jason Gunthorpe
  2020-09-28 19:29                                                         ` Linus Torvalds
  2020-09-28 19:36                                                         ` Linus Torvalds
  0 siblings, 2 replies; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-28 18:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, Leon Romanovsky, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 10:54:28AM -0700, Linus Torvalds wrote:
> On Mon, Sep 28, 2020 at 10:23 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > Yes...  Actually I am also thinking about the complete solution to cover
> > read-only fast-gups too, but now I start to doubt this, at least for the fork()
> > path.  E.g. if we'd finally like to use pte_protnone() to replace the current
> > pte_wrprotect(), we'll be able to also block the read gups, but we'll suffer
> > the same degradation on normal fork()s, or even more.  Seems unacceptable.
> 
> So I think the real question about pinned read gups is what semantics
> they should have.
> 
> Because honestly, I think we have two options:
> 
>  - the current "it gets a shared copy from the page tables"
>
>  - the "this is an exclusive pin, and it _will_ follow the source VM
> changes, and never break"

The requirement depends on what the driver is doing. Consider a simple
ring-to-device scheme:

   ring = mmap()
   pin_user_pages(FOLL_LONGTERM)
   ring[0] = [..]
   trigger_kernel()

Sort of like iouring. Here the kernel will pin the zero page and will
never see any user modifications to the buffer. We must do #2.

While something like read(O_DIRECT) which only needs the buffer to be
stable during a system call would be fine with #1 (and data
incoherence in general)

> In other words, a read pin isn't really any different from a read GUP.
> You get a reference to a page that is valid at the time of the page
> lookup, and absolutely nothing more.

Yes, so far all this new pin stuff has really focused on the write
side - the motivating issue was the set_page_dirty() oops
 
> But honestly, that is largely going to _be_ the same as a write pin,
> because it absolutely needs to do a page COW at the time of the
> pinning to get that initial exclusive guarantee in the first place.
> Without that initial exclusivity, you cannot avoid future COW events
> breaking the wrong way.

Right, I see places using FOLL_WRITE when they only need read. It is a
bit ugly, IMHO.

> The downside of a write pin is that it not only makes that page
> exclusive, it also (a) marks it dirty and (b) requires write access.

RDMA adds FOLL_FORCE because people complained they couldn't do
read-only transfers from .rodata - uglyier still :\

> So the copy_page_range() code would do a write count around the copy:
> 
>     write_seqcount_begin(&mm->seq);
>     .. do the copy ..
>     write_seqcount_end(&mm->seq);

All of gup_fast and copy_mm could be wrappered in a seq count so that
gup_fast always goes to the slow path if fork is concurrent. 

That doesn't sound too expensive and avoids all the problems you
pointed with the WP scheme.

As you say fork & pin & threads is already questionable, so an
unconditional slow path on race should be OK.

> If we want to and really need to.

I would like to see some reasonable definition for the
read-side. Having drivers do FOLL_FORCE | FOLL_WRITE for read is just
confusing and weird for a driver facing API.

It may also help focus the remaining discussion for solving
set_page_dirty() if pin_user_pages() had a solid definition.

I prefer the version where read pin and write pin are symmetric. The
PTE in the MM should not change once pinned.

This is useful and if something only needs the original GUP semantics
then GUP is still there.

Jason



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 18:39                                                       ` Jason Gunthorpe
@ 2020-09-28 19:29                                                         ` Linus Torvalds
  2020-09-28 23:57                                                           ` Jason Gunthorpe
  2020-09-28 19:36                                                         ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-28 19:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, Leon Romanovsky, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 11:39 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> I prefer the version where read pin and write pin are symmetric. The
> PTE in the MM should not change once pinned.

The thing is, I don't really see how to do that.

Right now the write pin fastpath part depends on the PTE being
writable. That implies "this VM has access to this page".

For a read pin there simply is no other way to do it.

So we'd basically say "fast read pin only works on writable pages",
and then we'd have to go to the slow path if it isn't dirty and
writable.

And the slow path would then do whatever COW is required, but it
wouldn't mark the result dirty (and in the case of a shared mapping,
couldn't mark it writable).

So a read pin action would basically never work for the fast-path for
a few cases, notably a shared read-only mapping - because we could
never mark it in the page tables as "fast pin accessible"

See the problem? A read-only pin is fundamentally different from a
write one, because a write one has that fundamental mark of "I have
private access to this page" in ways a read one simply does not.

So we could make the requirement be that a pinned page is either

 (a) from a shared mapping (so the pinning depends on the page cache
association). But we can't test this in the fast path.

or

 (b) for a private mapping we require page_mapcount() == 1 and that
it's writable.

but since (a) requires the mapping type, we can't check in the fast
path - we only have the PTE and the page. So the fast-path can only
"emulate" it by that "writable", which is a proper subset of (a) or
(b), but it's not something that is in any way guaranteed.

End result: FOLL_PIN would really only work on private pages, and only
if you don't want to share with the page cache.

And it would basically have no advantages over a writable FOLL_PIN. It
would break the association with any backing store for private pages,
because otherwise it can't follow future writes.

                   Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 18:39                                                       ` Jason Gunthorpe
  2020-09-28 19:29                                                         ` Linus Torvalds
@ 2020-09-28 19:36                                                         ` Linus Torvalds
  2020-09-28 19:50                                                           ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2020-09-28 19:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, Leon Romanovsky, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 11:39 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> All of gup_fast and copy_mm could be wrappered in a seq count so that
> gup_fast always goes to the slow path if fork is concurrent.
>
> That doesn't sound too expensive and avoids all the problems you
> pointed with the WP scheme.

Ok, I'll start by just removing the "write protect early trick". It
really doesn't work reliably anyway due to memory ordering, and while
I think the dirty bit is ok (and we could probably also set it
unconditionally to make _sure_ it's not dropped like Peter says) it
just makes me convinced it's the wrong approach.

Fixing it at a per-pte level is too expensive, and yeah, if we really
really care about the fork consistency, the sequence count approach
should be much simpler and more obvious.

So I'll do the pte wrprotect/restore removal. Anybody willing to do
and test the sequence count approach?

            Linus


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 19:36                                                         ` Linus Torvalds
@ 2020-09-28 19:50                                                           ` Linus Torvalds
  2020-09-28 22:51                                                             ` Jason Gunthorpe
  2020-10-08  5:49                                                             ` Leon Romanovsky
  0 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2020-09-28 19:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, Leon Romanovsky, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

On Mon, Sep 28, 2020 at 12:36 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I'll do the pte wrprotect/restore removal. Anybody willing to do
> and test the sequence count approach?

So the wrprotect removal is trivial, with most of it being about the comments.

However, when I look at this, I am - once again - tempted to just add a

        if (__page_mapcount(page) > 1)
                return 1;

there too. Because we know it's a private mapping (shared mappings we
checked for with the "is_cow_mapping()" earlier), and the only case we
really care about is the one where the page is only mapped in the
current mm (because that's what a write pinning will have done, and as
mentioned, a read pinning doesn't do anything wrt fork() right now
anyway).

So if it's mapped in another mm, the COW clearly hasn't been broken by
a pin, and a read pinned page had already gone through a fork.

But the more I look at this code, the more I go "ok, I want somebody
to actually test this with the rdma case".

So I'll attach my suggested patch, but I won't actually commit it. I'd
really like to have this tested, possibly _together_ with the sequence
count addition..

               Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 2530 bytes --]

 mm/memory.c | 46 ++++++++++------------------------------------
 1 file changed, 10 insertions(+), 36 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..4a7e89d35ecf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -806,8 +806,6 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		return 1;
 
 	/*
-	 * The trick starts.
-	 *
 	 * What we want to do is to check whether this page may
 	 * have been pinned by the parent process.  If so,
 	 * instead of wrprotect the pte on both sides, we copy
@@ -815,46 +813,22 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * the pinned page won't be randomly replaced in the
 	 * future.
 	 *
-	 * To achieve this, we do the following:
-	 *
-	 * 1. Write-protect the pte if it's writable.  This is
-	 *    to protect concurrent write fast-gup with
-	 *    FOLL_PIN, so that we'll fail the fast-gup with
-	 *    the write bit removed.
-	 *
-	 * 2. Check page_maybe_dma_pinned() to see whether this
-	 *    page may have been pinned.
+	 * The page pinning checks are just "has this mm ever
+	 * seen pinning", along with the (inexact) check of
+	 * the page count. That might give false positives for
+	 * for pinning, but it will work correctly.
 	 *
-	 * The order of these steps is important to serialize
-	 * against the fast-gup code (gup_pte_range()) on the
-	 * pte check and try_grab_compound_head(), so that
-	 * we'll make sure either we'll capture that fast-gup
-	 * so we'll copy the pinned page here, or we'll fail
-	 * that fast-gup.
-	 *
-	 * NOTE! Even if we don't end up copying the page,
-	 * we won't undo this wrprotect(), because the normal
-	 * reference copy will need it anyway.
-	 */
-	if (pte_write(pte))
-		ptep_set_wrprotect(src_mm, addr, src_pte);
-
-	/*
-	 * These are the "normally we can just copy by reference"
-	 * checks.
+	 * Another heuristic is to just check the mapcount for
+	 * this page. If it is mapped elsewhere, it already is
+	 * not an exclusively pinned page, and doing another
+	 * "copy by reference" isn't going to matter.
 	 */
 	if (likely(!atomic_read(&src_mm->has_pinned)))
 		return 1;
 	if (likely(!page_maybe_dma_pinned(page)))
 		return 1;
-
-	/*
-	 * Uhhuh. It looks like the page might be a pinned page,
-	 * and we actually need to copy it. Now we can set the
-	 * source pte back to being writable.
-	 */
-	if (pte_write(pte))
-		set_pte_at(src_mm, addr, src_pte, pte);
+	if (__page_mapcount(page) > 1)
+		return 1;
 
 	new_page = *prealloc;
 	if (!new_page)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 19:50                                                           ` Linus Torvalds
@ 2020-09-28 22:51                                                             ` Jason Gunthorpe
  2020-09-29  0:30                                                               ` Peter Xu
  2020-10-08  5:49                                                             ` Leon Romanovsky
  1 sibling, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-28 22:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, Leon Romanovsky, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 12:50:03PM -0700, Linus Torvalds wrote:

> But the more I look at this code, the more I go "ok, I want somebody
> to actually test this with the rdma case".

I suspect it will be OK as our immediate test suite that touches this
will be using malloc memory.

It seems some mmap trick will be needed to get higher map counts?

> So I'll attach my suggested patch, but I won't actually commit it. I'd
> really like to have this tested, possibly _together_ with the sequence
> count addition..

Ok, we will do this once the holidays are over. If Peter doesn't take
it I can look at the seqcount too.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 19:29                                                         ` Linus Torvalds
@ 2020-09-28 23:57                                                           ` Jason Gunthorpe
  2020-09-29  0:18                                                             ` John Hubbard
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Gunthorpe @ 2020-09-28 23:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, Leon Romanovsky, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 12:29:55PM -0700, Linus Torvalds wrote:
> So a read pin action would basically never work for the fast-path for
> a few cases, notably a shared read-only mapping - because we could
> never mark it in the page tables as "fast pin accessible"

Agree, I was assuming we'd loose more of the fast path to create this
thing. It would only still be fast if the pages are already writable.

I strongly suspect the case of DMA'ing actual read-only data is the
minority here, the usual case is probably filling a writable buffer
with something interesting and then triggering the DMA. The DMA just
happens to be read from the driver view so the driver doesn't set
FOLL_WRITE.

Looking at the FOLL_LONGTERM users, which should be the banner usecase
for this, there are very few that do a read pin and use fast.

> And it would basically have no advantages over a writable FOLL_PIN. It
> would break the association with any backing store for private pages,
> because otherwise it can't follow future writes.

Yes, I wasn't clear enough, I'm looking at this from a driver API
perspective. We have this API

  pin_user_pages(FOLL_LONGTERM | FOLL_WRITE)

Which now has no decoherence issues with the MM. If the driver
naturally wants to do read-only access it might be tempted to do:

  pin_user_pages(FOLL_LONGTERM)

Which is now NOT the same thing and brings all these really surprising
mm coherence issues back.

The driver author might discover this in testing, then be tempted to
hardwire 'FOLL_LONGTERM | FOLL_WRITE'. Now their uAPI is broken for
things that are actually read-only like .rodata.

If they discover this then they add a FOLL_FORCE to the mix.

When someone comes along to read this later it is a big leap to see
  pin_user_pages(FOLL_LONGTERM | FOLL_FORCE | FOLL_WRITE)

and realize this is code for "read only mapping". At least it took me
a while to decipher it the first time I saw it.

I think this is really hard to use and ugly. My thinking has been to
just stick:

   if (flags & FOLL_LONGTERM)
       flags |= FOLL_FORCE | FOLL_WRITE

In pin_user_pages(). It would make the driver API cleaner. If we can
do a bit better somehow by not COW'ing for certain VMA's as you
explained then all the better, but not my primary goal..

Basically, I think if a driver is using FOLL_LONGTERM | FOLL_PIN we
should guarentee that driver a consistent MM and take the gup_fast
performance hit to do it.

AFAICT the giant wack of other cases not using FOLL_LONGTERM really
shouldn't care about read-decoherence. For those cases the user should
really not be racing write's with data under read-only pin, and the
new COW logic looks like it solves the other issues with this.

I know Jann/John have been careful to not have special behaviors for
the DMA case, but I think it makes sense here. It is actually different.

Jason


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 23:57                                                           ` Jason Gunthorpe
@ 2020-09-29  0:18                                                             ` John Hubbard
  0 siblings, 0 replies; 110+ messages in thread
From: John Hubbard @ 2020-09-29  0:18 UTC (permalink / raw)
  To: Jason Gunthorpe, Linus Torvalds
  Cc: Peter Xu, Leon Romanovsky, Linux-MM, Linux Kernel Mailing List,
	Andrew Morton, Jan Kara, Michal Hocko, Kirill Tkhai,
	Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On 9/28/20 4:57 PM, Jason Gunthorpe wrote:
> On Mon, Sep 28, 2020 at 12:29:55PM -0700, Linus Torvalds wrote:
...
> I think this is really hard to use and ugly. My thinking has been to
> just stick:
> 
>     if (flags & FOLL_LONGTERM)
>         flags |= FOLL_FORCE | FOLL_WRITE
> 
> In pin_user_pages(). It would make the driver API cleaner. If we can

+1, yes. The other choices so far are, as you say, really difficult to figure
out.

> do a bit better somehow by not COW'ing for certain VMA's as you
> explained then all the better, but not my primary goal..
> 
> Basically, I think if a driver is using FOLL_LONGTERM | FOLL_PIN we
> should guarentee that driver a consistent MM and take the gup_fast
> performance hit to do it.
> 
> AFAICT the giant wack of other cases not using FOLL_LONGTERM really
> shouldn't care about read-decoherence. For those cases the user should
> really not be racing write's with data under read-only pin, and the
> new COW logic looks like it solves the other issues with this.

I hope this doesn't kill the seqcount() idea, though. That was my favorite
part of the discussion, because it neatly separates out the two racing domains
(fork, gup/pup) and allows easy reasoning about them--without really impacting
performance.

Truly elegant. We should go there.

> 
> I know Jann/John have been careful to not have special behaviors for
> the DMA case, but I think it makes sense here. It is actually different.
> 

I think that makes sense. Everyone knew that DMA/FOLL_LONGTERM call sites
were at least potentially special, despite the spirited debates in at least
two conferences about the meaning and implications of "long term". :)

And here we are seeing an example of such a special case, which I think is
natural enough.


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 22:51                                                             ` Jason Gunthorpe
@ 2020-09-29  0:30                                                               ` Peter Xu
  0 siblings, 0 replies; 110+ messages in thread
From: Peter Xu @ 2020-09-29  0:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Linus Torvalds, Leon Romanovsky, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 07:51:07PM -0300, Jason Gunthorpe wrote:
> > So I'll attach my suggested patch, but I won't actually commit it. I'd
> > really like to have this tested, possibly _together_ with the sequence
> > count addition..
> 
> Ok, we will do this once the holidays are over. If Peter doesn't take
> it I can look at the seqcount too.

Please feel free to.  I can definitely test against the vfio test I used to
run, but assuming the rdma test would be more ideal.  Just let me know if
there's anything else I can help.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
  2020-09-28 19:50                                                           ` Linus Torvalds
  2020-09-28 22:51                                                             ` Jason Gunthorpe
@ 2020-10-08  5:49                                                             ` Leon Romanovsky
  1 sibling, 0 replies; 110+ messages in thread
From: Leon Romanovsky @ 2020-10-08  5:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, Peter Xu, John Hubbard, Linux-MM,
	Linux Kernel Mailing List, Andrew Morton, Jan Kara, Michal Hocko,
	Kirill Tkhai, Kirill Shutemov, Hugh Dickins, Christoph Hellwig,
	Andrea Arcangeli, Oleg Nesterov, Jann Horn

On Mon, Sep 28, 2020 at 12:50:03PM -0700, Linus Torvalds wrote:
> On Mon, Sep 28, 2020 at 12:36 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So I'll do the pte wrprotect/restore removal. Anybody willing to do
> > and test the sequence count approach?
>
> So the wrprotect removal is trivial, with most of it being about the comments.
>
> However, when I look at this, I am - once again - tempted to just add a
>
>         if (__page_mapcount(page) > 1)
>                 return 1;
>
> there too. Because we know it's a private mapping (shared mappings we
> checked for with the "is_cow_mapping()" earlier), and the only case we
> really care about is the one where the page is only mapped in the
> current mm (because that's what a write pinning will have done, and as
> mentioned, a read pinning doesn't do anything wrt fork() right now
> anyway).
>
> So if it's mapped in another mm, the COW clearly hasn't been broken by
> a pin, and a read pinned page had already gone through a fork.
>
> But the more I look at this code, the more I go "ok, I want somebody
> to actually test this with the rdma case".
>
> So I'll attach my suggested patch, but I won't actually commit it. I'd
> really like to have this tested, possibly _together_ with the sequence
> count addition..

Hi Linus,

We tested the suggested patch for last two weeks in our nightly regressions
and didn't experience any new failures. It looks like it is safe to use
it, but better to take the patch during/after merge window to minimize risk
of delaying v5.9.

Thanks

>
>                Linus




^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2020-10-08  5:49 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-21 21:17 [PATCH 0/5] mm: Break COW for pinned pages during fork() Peter Xu
2020-09-21 21:17 ` [PATCH 1/5] mm: Introduce mm_struct.has_pinned Peter Xu
2020-09-21 21:43   ` Jann Horn
2020-09-21 22:30     ` Peter Xu
2020-09-21 22:47       ` Jann Horn
2020-09-22 11:54         ` Jason Gunthorpe
2020-09-22 14:28           ` Peter Xu
2020-09-22 15:56             ` Jason Gunthorpe
2020-09-22 16:25               ` Linus Torvalds
2020-09-21 23:53   ` John Hubbard
2020-09-22  0:01     ` John Hubbard
2020-09-22 15:17     ` Peter Xu
2020-09-22 16:10       ` Jason Gunthorpe
2020-09-22 17:54         ` Peter Xu
2020-09-22 19:11           ` Jason Gunthorpe
2020-09-23  0:27             ` Peter Xu
2020-09-23 13:10               ` Peter Xu
2020-09-23 14:20                 ` Jan Kara
2020-09-23 17:12                   ` Jason Gunthorpe
2020-09-24  7:44                     ` Jan Kara
2020-09-24 14:02                       ` Jason Gunthorpe
2020-09-24 14:45                         ` Jan Kara
2020-09-23 17:07               ` Jason Gunthorpe
2020-09-24 14:35                 ` Peter Xu
2020-09-24 16:51                   ` Jason Gunthorpe
2020-09-24 17:55                     ` Peter Xu
2020-09-24 18:15                       ` Jason Gunthorpe
2020-09-24 18:34                         ` Peter Xu
2020-09-24 18:39                           ` Jason Gunthorpe
2020-09-24 21:30                             ` Peter Xu
2020-09-25 19:56                               ` Linus Torvalds
2020-09-25 21:06                                 ` Linus Torvalds
2020-09-26  0:41                                   ` Jason Gunthorpe
2020-09-26  1:15                                     ` Linus Torvalds
2020-09-26 22:28                                       ` Linus Torvalds
2020-09-27  6:23                                         ` Leon Romanovsky
2020-09-27 18:16                                           ` Linus Torvalds
2020-09-27 18:45                                             ` Linus Torvalds
2020-09-28 12:49                                               ` Jason Gunthorpe
2020-09-28 16:17                                                 ` Linus Torvalds
2020-09-28 17:22                                                   ` Peter Xu
2020-09-28 17:54                                                     ` Linus Torvalds
2020-09-28 18:39                                                       ` Jason Gunthorpe
2020-09-28 19:29                                                         ` Linus Torvalds
2020-09-28 23:57                                                           ` Jason Gunthorpe
2020-09-29  0:18                                                             ` John Hubbard
2020-09-28 19:36                                                         ` Linus Torvalds
2020-09-28 19:50                                                           ` Linus Torvalds
2020-09-28 22:51                                                             ` Jason Gunthorpe
2020-09-29  0:30                                                               ` Peter Xu
2020-10-08  5:49                                                             ` Leon Romanovsky
2020-09-28 17:13                                             ` Peter Xu
2020-09-25 21:13                                 ` Peter Xu
2020-09-25 22:08                                   ` Linus Torvalds
2020-09-22 18:02       ` John Hubbard
2020-09-22 18:15         ` Peter Xu
2020-09-22 19:11       ` John Hubbard
2020-09-27  0:41   ` [mm] 698ac7610f: will-it-scale.per_thread_ops 8.2% improvement kernel test robot
2020-09-21 21:17 ` [PATCH 2/5] mm/fork: Pass new vma pointer into copy_page_range() Peter Xu
2020-09-21 21:17 ` [PATCH 3/5] mm: Rework return value for copy_one_pte() Peter Xu
2020-09-22  7:11   ` John Hubbard
2020-09-22 15:29     ` Peter Xu
2020-09-22 10:08   ` Oleg Nesterov
2020-09-22 10:18     ` Oleg Nesterov
2020-09-22 15:36       ` Peter Xu
2020-09-22 15:48         ` Oleg Nesterov
2020-09-22 16:03           ` Peter Xu
2020-09-22 16:53             ` Oleg Nesterov
2020-09-22 18:13               ` Peter Xu
2020-09-22 18:23                 ` Oleg Nesterov
2020-09-22 18:49                   ` Peter Xu
2020-09-23  6:52                     ` Oleg Nesterov
2020-09-23 17:16   ` Linus Torvalds
2020-09-23 21:24     ` Linus Torvalds
2020-09-21 21:20 ` [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes Peter Xu
2020-09-21 21:55   ` Jann Horn
2020-09-21 22:18     ` John Hubbard
2020-09-21 22:27       ` Jann Horn
2020-09-22  0:08         ` John Hubbard
2020-09-21 22:27     ` Peter Xu
2020-09-22 11:48   ` Oleg Nesterov
2020-09-22 12:40     ` Oleg Nesterov
2020-09-22 15:58       ` Peter Xu
2020-09-22 16:52         ` Oleg Nesterov
2020-09-22 18:34           ` Peter Xu
2020-09-22 18:44             ` Oleg Nesterov
2020-09-23  1:03               ` Peter Xu
2020-09-23 20:25                 ` Linus Torvalds
2020-09-24 15:08                   ` Peter Xu
2020-09-24 11:48   ` Kirill Tkhai
2020-09-24 15:16     ` Peter Xu
2020-09-21 21:20 ` [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork() Peter Xu
2020-09-22  6:41   ` John Hubbard
2020-09-22 10:33     ` Jan Kara
2020-09-22 20:01       ` John Hubbard
2020-09-23  9:22         ` Jan Kara
2020-09-23 13:50           ` Peter Xu
2020-09-23 14:01             ` Jan Kara
2020-09-23 15:44               ` Peter Xu
2020-09-23 20:19                 ` John Hubbard
2020-09-24 18:49                   ` Peter Xu
2020-09-23 16:06     ` Peter Xu
2020-09-22 12:05   ` Jason Gunthorpe
2020-09-23 15:24     ` Peter Xu
2020-09-23 16:07       ` Yang Shi
2020-09-24 15:47         ` Peter Xu
2020-09-24 17:29           ` Yang Shi
2020-09-23 17:17       ` Jason Gunthorpe
2020-09-23 10:21 ` [PATCH 0/5] mm: Break COW for pinned pages during fork() Leon Romanovsky
2020-09-23 15:37   ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).