All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups
@ 2022-04-28 15:42 Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 02/14] mm: fix missing cache flush for all tail pages of compound page Greg Kroah-Hartman
                   ` (12 more replies)
  0 siblings, 13 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Peter Xu, John Hubbard, Claudio Imbrenda, Alex Williamson,
	Christoph Hellwig, Jan Kara, Andrea Arcangeli,
	Kirill A . Shutemov, Jason Gunthorpe, David Hildenbrand,
	Lukas Bulwahn, Matthew Wilcox, Jason Gunthorpe, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman

From: Peter Xu <peterx@redhat.com>

commit 7196040e19ad634293acd3eff7083149d7669031 upstream.

Patch series "mm/gup: some cleanups", v5.

This patch (of 5):

Alex reported invalid page pointer returned with pin_user_pages_remote()
from vfio after upstream commit 4b6c33b32296 ("vfio/type1: Prepare for
batched pinning with struct vfio_batch").

It turns out that it's not the fault of the vfio commit; however after
vfio switches to a full page buffer to store the page pointers it starts
to expose the problem easier.

The problem is for VM_PFNMAP vmas we should normally fail with an
-EFAULT then vfio will carry on to handle the MMIO regions.  However
when the bug triggered, follow_page_mask() returned -EEXIST for such a
page, which will jump over the current page, leaving that entry in
**pages untouched.  However the caller is not aware of it, hence the
caller will reference the page as usual even if the pointer data can be
anything.

We had that -EEXIST logic since commit 1027e4436b6a ("mm: make GUP
handle pfn mapping unless FOLL_GET is requested") which seems very
reasonable.  It could be that when we reworked GUP with FOLL_PIN we
could have overlooked that special path in commit 3faa52c03f44 ("mm/gup:
track FOLL_PIN pages"), even if that commit rightfully touched up
follow_devmap_pud() on checking FOLL_PIN when it needs to return an
-EEXIST.

Attaching the Fixes to the FOLL_PIN rework commit, as it happened later
than 1027e4436b6a.

[jhubbard@nvidia.com: added some tags, removed a reference to an out of tree module.]

Link: https://lkml.kernel.org/r/20220207062213.235127-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-2-jhubbard@nvidia.com
Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Debugged-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 7bc1ba9ce440..41da0bd61bec 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -465,7 +465,7 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
 		pte_t *pte, unsigned int flags)
 {
 	/* No page to get reference */
-	if (flags & FOLL_GET)
+	if (flags & (FOLL_GET | FOLL_PIN))
 		return -EFAULT;
 
 	if (flags & FOLL_TOUCH) {
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 02/14] mm: fix missing cache flush for all tail pages of compound page
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 03/14] mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Muchun Song, Zi Yan, Axel Rasmussen, David Rientjes, Fam Zheng,
	Kirill A . Shutemov, Lars Persson, Mike Kravetz, Peter Xu,
	Xiongchun Duan, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman

From: Muchun Song <songmuchun@bytedance.com>

commit 2771739a7162782c0aa6424b2e3dd874e884a15d upstream.

The D-cache maintenance inside move_to_new_page() only consider one
page, there is still D-cache maintenance issue for tail pages of
compound page (e.g. THP or HugeTLB).

THP migration is only enabled on x86_64, ARM64 and powerpc, while
powerpc and arm64 need to maintain the consistency between I-Cache and
D-Cache, which depends on flush_dcache_page() to maintain the
consistency between I-Cache and D-Cache.

But there is no issues on arm64 and powerpc since they already considers
the compound page cache flushing in their icache flush function.
HugeTLB migration is enabled on arm, arm64, mips, parisc, powerpc,
riscv, s390 and sh, while arm has handled the compound page cache flush
in flush_dcache_page(), but most others do not.

In theory, the issue exists on many architectures.  Fix this by not
using flush_dcache_folio() since it is not backportable.

Link: https://lkml.kernel.org/r/20220210123058.79206-3-songmuchun@bytedance.com
Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/migrate.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 086a36637467..fc0e14ecd42a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -916,9 +916,12 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 		if (!PageMappingFlags(page))
 			page->mapping = NULL;
 
-		if (likely(!is_zone_device_page(newpage)))
-			flush_dcache_page(newpage);
+		if (likely(!is_zone_device_page(newpage))) {
+			int i, nr = compound_nr(newpage);
 
+			for (i = 0; i < nr; i++)
+				flush_dcache_page(newpage + i);
+		}
 	}
 out:
 	return rc;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 03/14] mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 02/14] mm: fix missing cache flush for all tail pages of compound page Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 04/14] mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte() Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Muchun Song, Mike Kravetz, Axel Rasmussen, David Rientjes,
	Fam Zheng, Kirill A . Shutemov, Lars Persson, Peter Xu,
	Xiongchun Duan, Zi Yan, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman

From: Muchun Song <songmuchun@bytedance.com>

commit e763243cc6cb1fcc720ec58cfd6e7c35ae90a479 upstream.

userfaultfd calls copy_huge_page_from_user() which does not do any cache
flushing for the target page.  Then the target page will be mapped to
the user space with a different address (user address), which might have
an alias issue with the kernel address used to copy the data from the
user to.

Fix this issue by flushing dcache in copy_huge_page_from_user().

Link: https://lkml.kernel.org/r/20220210123058.79206-4-songmuchun@bytedance.com
Fixes: fa4d75c1de13 ("userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/memory.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index b69afe3dd597..886925d97759 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5475,6 +5475,8 @@ long copy_huge_page_from_user(struct page *dst_page,
 		if (rc)
 			break;
 
+		flush_dcache_page(subpage);
+
 		cond_resched();
 	}
 	return ret_val;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 04/14] mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte()
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 02/14] mm: fix missing cache flush for all tail pages of compound page Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 03/14] mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 05/14] mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte() Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Muchun Song, Mike Kravetz, Axel Rasmussen, David Rientjes,
	Fam Zheng, Kirill A . Shutemov, Lars Persson, Peter Xu,
	Xiongchun Duan, Zi Yan, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman

From: Muchun Song <songmuchun@bytedance.com>

commit 348923665a0e50ad9fc0b3bb8127d3cb976691cc upstream.

folio_copy() will copy the data from one page to the target page, then
the target page will be mapped to the user space address, which might
have an alias issue with the kernel address used to copy the data from
the page to.  There are 2 ways to fix this issue.

 1) insert flush_dcache_page() after folio_copy().

 2) replace folio_copy() with copy_user_huge_page() which already
    considers the cache maintenance.

We chose 2) way to fix the issue since architectures can optimize this
situation.  It is also make backports easier.

Link: https://lkml.kernel.org/r/20220210123058.79206-5-songmuchun@bytedance.com
Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/hugetlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a1da8757cc9c..e2dc190c6725 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5820,7 +5820,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 			*pagep = NULL;
 			goto out;
 		}
-		folio_copy(page_folio(page), page_folio(*pagep));
+		copy_user_huge_page(page, *pagep, dst_addr, dst_vma,
+				    pages_per_huge_page(h));
 		put_page(*pagep);
 		*pagep = NULL;
 	}
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 05/14] mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte()
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 04/14] mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte() Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 06/14] mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Muchun Song, Mike Kravetz, Axel Rasmussen, David Rientjes,
	Fam Zheng, Kirill A . Shutemov, Lars Persson, Peter Xu,
	Xiongchun Duan, Zi Yan, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman

From: Muchun Song <songmuchun@bytedance.com>

commit 19b482c29b6f3805f1d8e93015847b89e2f7f3b1 upstream.

userfaultfd calls shmem_mfill_atomic_pte() which does not do any cache
flushing for the target page.  Then the target page will be mapped to
the user space with a different address (user address), which might have
an alias issue with the kernel address used to copy the data from the
user to.  Insert flush_dcache_page() in non-zero-page case.  And replace
clear_highpage() with clear_user_highpage() which already considers the
cache maintenance.

Link: https://lkml.kernel.org/r/20220210123058.79206-6-songmuchun@bytedance.com
Fixes: 8d1039634206 ("userfaultfd: shmem: add shmem_mfill_zeropage_pte for userfaultfd support")
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/shmem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index a09b29ec2b45..7a46419d331d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2357,8 +2357,10 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 				/* don't free the page */
 				goto out_unacct_blocks;
 			}
+
+			flush_dcache_page(page);
 		} else {		/* ZEROPAGE */
-			clear_highpage(page);
+			clear_user_highpage(page, dst_addr);
 		}
 	} else {
 		page = *pagep;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 06/14] mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic()
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 05/14] mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte() Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 07/14] mm/page_alloc: fetch the correct pcp buddy during bulk free Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Muchun Song, Axel Rasmussen, David Rientjes, Fam Zheng,
	Kirill A . Shutemov, Lars Persson, Mike Kravetz, Peter Xu,
	Xiongchun Duan, Zi Yan, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman

From: Muchun Song <songmuchun@bytedance.com>

commit 7c25a0b89a487878b0691e6524fb5a8827322194 upstream.

userfaultfd calls mcopy_atomic_pte() and __mcopy_atomic() which do not
do any cache flushing for the target page.  Then the target page will be
mapped to the user space with a different address (user address), which
might have an alias issue with the kernel address used to copy the data
from the user to.  Fix this by insert flush_dcache_page() after
copy_from_user() succeeds.

Link: https://lkml.kernel.org/r/20220210123058.79206-7-songmuchun@bytedance.com
Fixes: b6ebaedb4cb1 ("userfaultfd: avoid mmap_sem read recursion in mcopy_atomic")
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/userfaultfd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 885e5adb0168..7259f96faaa0 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -153,6 +153,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
 			/* don't free the page */
 			goto out;
 		}
+
+		flush_dcache_page(page);
 	} else {
 		page = *pagep;
 		*pagep = NULL;
@@ -628,6 +630,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 				err = -EFAULT;
 				goto out;
 			}
+			flush_dcache_page(page);
 			goto retry;
 		} else
 			BUG_ON(page);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 07/14] mm/page_alloc: fetch the correct pcp buddy during bulk free
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 06/14] mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 08/14] mm/page_alloc: check high-order pages for corruption during PCP operations Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Mel Gorman, Vlastimil Babka, Aaron Lu, Dave Hansen, Michal Hocko,
	Jesper Dangaard Brouer, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman

From: Mel Gorman <mgorman@techsingularity.net>

commit ca7b59b1de72450b3e696bada3506a519ac5455c upstream.

Patch series "Follow-up on high-order PCP caching", v2.

Commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored
on the per-cpu lists") was primarily aimed at reducing the cost of SLUB
cache refills of high-order pages in two ways.  Firstly, zone lock
acquisitions was reduced and secondly, there were fewer buddy list
modifications.  This is a follow-up series fixing some issues that
became apparant after merging.

Patch 1 is a functional fix.  It's harmless but inefficient.

Patches 2-5 reduce the overhead of bulk freeing of PCP pages.  While the
overhead is small, it's cumulative and noticable when truncating large
files.  The changelog for patch 4 includes results of a microbench that
deletes large sparse files with data in page cache.  Sparse files were
used to eliminate filesystem overhead.

Patch 6 addresses issues with high-order PCP pages being stored on PCP
lists for too long.  Pages freed on a CPU potentially may not be quickly
reused and in some cases this can increase cache miss rates.  Details
are included in the changelog.

This patch (of 6):

free_pcppages_bulk() prefetches buddies about to be freed but the order
must also be passed in as PCP lists store multiple orders.

Link: https://lkml.kernel.org/r/20220217002227.5739-1-mgorman@techsingularity.net
Link: https://lkml.kernel.org/r/20220217002227.5739-2-mgorman@techsingularity.net
Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e6f211dcf82e..b2ef0e75fd29 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1432,10 +1432,10 @@ static bool bulkfree_pcp_prepare(struct page *page)
 }
 #endif /* CONFIG_DEBUG_VM */
 
-static inline void prefetch_buddy(struct page *page)
+static inline void prefetch_buddy(struct page *page, unsigned int order)
 {
 	unsigned long pfn = page_to_pfn(page);
-	unsigned long buddy_pfn = __find_buddy_pfn(pfn, 0);
+	unsigned long buddy_pfn = __find_buddy_pfn(pfn, order);
 	struct page *buddy = page + (buddy_pfn - pfn);
 
 	prefetch(buddy);
@@ -1512,7 +1512,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			 * prefetch buddy for the first pcp->batch nr of pages.
 			 */
 			if (prefetch_nr) {
-				prefetch_buddy(page);
+				prefetch_buddy(page, order);
 				prefetch_nr--;
 			}
 		} while (count > 0 && --batch_free && !list_empty(list));
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 08/14] mm/page_alloc: check high-order pages for corruption during PCP operations
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 07/14] mm/page_alloc: fetch the correct pcp buddy during bulk free Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 09/14] mm/hwpoison: fix error page recovered but reported "not recovered" Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Mel Gorman, Eric Dumazet, Shakeel Butt, Vlastimil Babka,
	David Rientjes, Michal Hocko, Wei Xu, Greg Thelen, Hugh Dickins,
	Andrew Morton, Linus Torvalds, Greg Kroah-Hartman

From: Mel Gorman <mgorman@techsingularity.net>

commit 77fe7f136a7312954b1b8b7eeb4bc91fc3c14a3f upstream.

Eric Dumazet pointed out that commit 44042b449872 ("mm/page_alloc: allow
high-order pages to be stored on the per-cpu lists") only checks the
head page during PCP refill and allocation operations.  This was an
oversight and all pages should be checked.  This will incur a small
performance penalty but it's necessary for correctness.

Link: https://lkml.kernel.org/r/20220310092456.GJ15701@techsingularity.net
Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/page_alloc.c | 46 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b2ef0e75fd29..adceee44adf6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2342,23 +2342,36 @@ static inline int check_new_page(struct page *page)
 	return 1;
 }
 
+static bool check_new_pages(struct page *page, unsigned int order)
+{
+	int i;
+	for (i = 0; i < (1 << order); i++) {
+		struct page *p = page + i;
+
+		if (unlikely(check_new_page(p)))
+			return true;
+	}
+
+	return false;
+}
+
 #ifdef CONFIG_DEBUG_VM
 /*
  * With DEBUG_VM enabled, order-0 pages are checked for expected state when
  * being allocated from pcp lists. With debug_pagealloc also enabled, they are
  * also checked when pcp lists are refilled from the free lists.
  */
-static inline bool check_pcp_refill(struct page *page)
+static inline bool check_pcp_refill(struct page *page, unsigned int order)
 {
 	if (debug_pagealloc_enabled_static())
-		return check_new_page(page);
+		return check_new_pages(page, order);
 	else
 		return false;
 }
 
-static inline bool check_new_pcp(struct page *page)
+static inline bool check_new_pcp(struct page *page, unsigned int order)
 {
-	return check_new_page(page);
+	return check_new_pages(page, order);
 }
 #else
 /*
@@ -2366,32 +2379,19 @@ static inline bool check_new_pcp(struct page *page)
  * when pcp lists are being refilled from the free lists. With debug_pagealloc
  * enabled, they are also checked when being allocated from the pcp lists.
  */
-static inline bool check_pcp_refill(struct page *page)
+static inline bool check_pcp_refill(struct page *page, unsigned int order)
 {
-	return check_new_page(page);
+	return check_new_pages(page, order);
 }
-static inline bool check_new_pcp(struct page *page)
+static inline bool check_new_pcp(struct page *page, unsigned int order)
 {
 	if (debug_pagealloc_enabled_static())
-		return check_new_page(page);
+		return check_new_pages(page, order);
 	else
 		return false;
 }
 #endif /* CONFIG_DEBUG_VM */
 
-static bool check_new_pages(struct page *page, unsigned int order)
-{
-	int i;
-	for (i = 0; i < (1 << order); i++) {
-		struct page *p = page + i;
-
-		if (unlikely(check_new_page(p)))
-			return true;
-	}
-
-	return false;
-}
-
 inline void post_alloc_hook(struct page *page, unsigned int order,
 				gfp_t gfp_flags)
 {
@@ -3037,7 +3037,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		if (unlikely(page == NULL))
 			break;
 
-		if (unlikely(check_pcp_refill(page)))
+		if (unlikely(check_pcp_refill(page, order)))
 			continue;
 
 		/*
@@ -3641,7 +3641,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
 		page = list_first_entry(list, struct page, lru);
 		list_del(&page->lru);
 		pcp->count -= 1 << order;
-	} while (check_new_pcp(page));
+	} while (check_new_pcp(page, order));
 
 	return page;
 }
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 09/14] mm/hwpoison: fix error page recovered but reported "not recovered"
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 08/14] mm/page_alloc: check high-order pages for corruption during PCP operations Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 10/14] mm/mlock: fix potential imbalanced rlimit ucounts adjustment Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Naoya Horiguchi, Youquan Song, Tony Luck, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

commit 046545a661af2beec21de7b90ca0e35f05088a81 upstream.

When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a
UCNA signature, and the core reporting and SRAR signature machine check
when the data is about to be consumed.

If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure() and the machine check
processing code finds the page already poisoned.  It calls
kill_accessing_process() to make sure a SIGBUS is sent.  But returns the
wrong error code.

Console log looks like this:

  mce: Uncorrected hardware memory error in user-access at 3710b3400
  Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered
  Memory failure: 0x3710b3: already hardware poisoned
  Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption
  mce: Memory error not recovered

kill_accessing_process() is supposed to return -EHWPOISON to notify that
SIGBUS is already set to the process and kill_me_maybe() doesn't have to
send it again.  But current code simply fails to do this, so fix it to
make sure to work as intended.  This change avoids the noise message
"Memory error not recovered" and skips duplicate SIGBUSs.

[tony.luck@intel.com: reword some parts of commit message]

Link: https://lkml.kernel.org/r/20220113231117.1021405-1-naoya.horiguchi@linux.dev
Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Youquan Song <youquan.song@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/memory-failure.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 15dcedbc1730..682eedb5ea75 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -707,8 +707,10 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
 			      (void *)&priv);
 	if (ret == 1 && priv.tk.addr)
 		kill_proc(&priv.tk, pfn, flags);
+	else
+		ret = 0;
 	mmap_read_unlock(p->mm);
-	return ret ? -EFAULT : -EHWPOISON;
+	return ret > 0 ? -EHWPOISON : -EFAULT;
 }
 
 static const char *action_name[] = {
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 10/14] mm/mlock: fix potential imbalanced rlimit ucounts adjustment
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 09/14] mm/hwpoison: fix error page recovered but reported "not recovered" Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 11/14] mm,migrate: fix establishing demotion target Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Miaohe Lin, Andrew Morton, Hugh Dickins, Herbert van den Bergh,
	Chris Mason, Linus Torvalds, Greg Kroah-Hartman

From: Miaohe Lin <linmiaohe@huawei.com>

commit 5c2a956c3eea173b2bc89f632507c0eeaebf6c4a upstream.

user_shm_lock forgets to set allowed to 0 when get_ucounts fails.  So
the later user_shm_unlock might do the extra dec_rlimit_ucounts.  Fix
this by resetting allowed to 0.

Link: https://lkml.kernel.org/r/20220310132417.41189-1-linmiaohe@huawei.com
Fixes: d7c9e99aee48 ("Reimplement RLIMIT_MEMLOCK on top of ucounts")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/mlock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/mlock.c b/mm/mlock.c
index 37f969ec68fa..b565b1aac8d4 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -838,6 +838,7 @@ int user_shm_lock(size_t size, struct ucounts *ucounts)
 	}
 	if (!get_ucounts(ucounts)) {
 		dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
+		allowed = 0;
 		goto out;
 	}
 	allowed = 1;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 11/14] mm,migrate: fix establishing demotion target
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 10/14] mm/mlock: fix potential imbalanced rlimit ucounts adjustment Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 12/14] mm/thp: refix __split_huge_pmd_locked() for migration PMD Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Huang Ying, Baolin Wang, Dave Hansen, Zi Yan, Oscar Salvador,
	Yang Shi, zhongjiang-ali, Xunlei Pang, Mel Gorman, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman

From: Huang Ying <ying.huang@intel.com>

commit fc89213a636c3735eb3386f10a34c082271b4192 upstream.

In commit ac16ec835314 ("mm: migrate: support multiple target nodes
demotion"), after the first demotion target node is found, we will
continue to check the next candidate obtained via find_next_best_node().
This is to find all demotion target nodes with same NUMA distance.  But
one side effect of find_next_best_node() is that the candidate node
returned will be set in "used" parameter, even if the candidate node isn't
passed in the following NUMA distance checking, the candidate node will
not be used as demotion target node for the following nodes.  For example,
for system as follows,

node distances:
node   0   1   2   3
  0:  10  21  17  28
  1:  21  10  28  17
  2:  17  28  10  28
  3:  28  17  28  10

when we establish demotion target node for node 0, in the first round node
2 is added to the demotion target node set.  Then in the second round,
node 3 is checked and failed because distance(0, 3) > distance(0, 2).  But
node 3 is set in "used" nodemask too.  When we establish demotion target
node for node 1, there is no available node.  This is wrong, node 3 should
be set as the demotion target of node 1.

To fix this, if the candidate node is failed to pass the distance
checking, it will be cleared in "used" nodemask.  So that it can be used
for the following node.

The bug can be reproduced and fixed with this patch on a 2 socket server
machine with DRAM and PMEM.

Link: https://lkml.kernel.org/r/20220128055940.1792614-1-ying.huang@intel.com
Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Xunlei Pang <xlpang@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/migrate.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index fc0e14ecd42a..ac7673e43dda 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -3085,18 +3085,21 @@ static int establish_migrate_target(int node, nodemask_t *used,
 	if (best_distance != -1) {
 		val = node_distance(node, migration_target);
 		if (val > best_distance)
-			return NUMA_NO_NODE;
+			goto out_clear;
 	}
 
 	index = nd->nr;
 	if (WARN_ONCE(index >= DEMOTION_TARGET_NODES,
 		      "Exceeds maximum demotion target nodes\n"))
-		return NUMA_NO_NODE;
+		goto out_clear;
 
 	nd->nodes[index] = migration_target;
 	nd->nr++;
 
 	return migration_target;
+out_clear:
+	node_clear(migration_target, *used);
+	return NUMA_NO_NODE;
 }
 
 /*
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 12/14] mm/thp: refix __split_huge_pmd_locked() for migration PMD
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 11/14] mm,migrate: fix establishing demotion target Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap() Greg Kroah-Hartman
  2022-04-28 15:42 ` [PATCH AUTOSEL 14/14] mm/thp: fix NR_FILE_MAPPED accounting in page_*_file_rmap() Greg Kroah-Hartman
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Hugh Dickins, Yang Shi, Ralph Campbell, Zi Yan,
	Kirill A. Shutemov, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman

From: Hugh Dickins <hughd@google.com>

commit 9d84604b845c3888d1bede43d16ab3ebedb13e24 upstream.

Migration entries do not contribute to a page's reference count: move
__split_huge_pmd_locked()'s page_ref_add() into pmd_migration's else
block (along with the page_count() check - a page is quite likely to
have reference count frozen to 0 when a migration entry is found).

This will fix a very rare anonymous memory leak, after a
split_huge_pmd() raced with an anon split_huge_page() or an anon THP
migrate_pages(): since the wrongly raised refcount stopped the page
(perhaps small, perhaps huge, depending on when the race hit) from ever
being freed.

At first I thought there were worse risks, from prematurely unfreezing a
frozen page: but now think that would only affect page cache pages,
which do not come this way (except for anonymous pages in swap cache,
perhaps).

Link: https://lkml.kernel.org/r/84792468-f512-e48f-378c-e34c3641e97@google.com
Fixes: ec0abae6dcdf ("mm/thp: fix __split_huge_pmd_locked() for migration PMD")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 406a3c28c026..468fca576bc2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2055,9 +2055,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		young = pmd_young(old_pmd);
 		soft_dirty = pmd_soft_dirty(old_pmd);
 		uffd_wp = pmd_uffd_wp(old_pmd);
+		VM_BUG_ON_PAGE(!page_count(page), page);
+		page_ref_add(page, HPAGE_PMD_NR - 1);
 	}
-	VM_BUG_ON_PAGE(!page_count(page), page);
-	page_ref_add(page, HPAGE_PMD_NR - 1);
 
 	/*
 	 * Withdraw the table only after we mark the pmd entry invalid.
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 12/14] mm/thp: refix __split_huge_pmd_locked() for migration PMD Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  2022-04-28 16:51   ` Hugh Dickins
  2022-04-28 15:42 ` [PATCH AUTOSEL 14/14] mm/thp: fix NR_FILE_MAPPED accounting in page_*_file_rmap() Greg Kroah-Hartman
  12 siblings, 1 reply; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Hugh Dickins, Yang Shi, Kirill A. Shutemov, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman

From: Hugh Dickins <hughd@google.com>

commit bd55b0c2d64e84a75575f548a33a3dfecc135b65 upstream.

PageDoubleMap is maintained differently for anon and for shmem+file: the
shmem+file one was never cleared, because a safe place to do so could
not be found; so it would blight future use of the cached hugepage until
evicted.

See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/

But page_add_file_rmap() does provide a safe place to do so (though later
than one might wish): allowing testing to return to an initial state
without a damaging drop_caches.

Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com
Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/rmap.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/rmap.c b/mm/rmap.c
index 9e27f9f038d3..444d0d958aff 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1252,6 +1252,17 @@ void page_add_file_rmap(struct page *page, bool compound)
 		}
 		if (!atomic_inc_and_test(compound_mapcount_ptr(page)))
 			goto out;
+
+		/*
+		 * It is racy to ClearPageDoubleMap in page_remove_file_rmap();
+		 * but page lock is held by all page_add_file_rmap() compound
+		 * callers, and SetPageDoubleMap below warns if !PageLocked:
+		 * so here is a place that DoubleMap can be safely cleared.
+		 */
+		VM_WARN_ON_ONCE(!PageLocked(page));
+		if (nr == nr_pages && PageDoubleMap(page))
+			ClearPageDoubleMap(page);
+
 		if (PageSwapBacked(page))
 			__mod_lruvec_page_state(page, NR_SHMEM_PMDMAPPED,
 						nr_pages);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 14/14] mm/thp: fix NR_FILE_MAPPED accounting in page_*_file_rmap()
  2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2022-04-28 15:42 ` [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap() Greg Kroah-Hartman
@ 2022-04-28 15:42 ` Greg Kroah-Hartman
  12 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 15:42 UTC (permalink / raw)
  To: stable, linux-kernel
  Cc: Hugh Dickins, Yang Shi, Kirill A. Shutemov, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman

From: Hugh Dickins <hughd@google.com>

commit 5d543f13e2f5580828de885c751d68a35b6a493d upstream.

NR_FILE_MAPPED accounting in mm/rmap.c (for /proc/meminfo "Mapped" and
/proc/vmstat "nr_mapped" and the memcg's memory.stat "mapped_file") is
slightly flawed for file or shmem huge pages.

It is well thought out, and looks convincing, but there's a racy case when
the careful counting in page_remove_file_rmap() (without page lock) gets
discarded.  So that in a workload like two "make -j20" kernel builds under
memory pressure, with cc1 on hugepage text, "Mapped" can easily grow by a
spurious 5MB or more on each iteration, ending up implausibly bigger than
most other numbers in /proc/meminfo.  And, hypothetically, might grow to
the point of seriously interfering in mm/vmscan.c's heuristics, which do
take NR_FILE_MAPPED into some consideration.

Fixed by moving the __mod_lruvec_page_state() down to where it will not be
missed before return (and I've grown a bit tired of that oft-repeated
but-not-everywhere comment on the __ness: it gets lost in the move here).

Does page_add_file_rmap() need the same change?  I suspect not, because
page lock is held in all relevant cases, and its skipping case looks safe;
but it's much easier to be sure, if we do make the same change.

Link: https://lkml.kernel.org/r/e02e52a1-8550-a57c-ed29-f51191ea2375@google.com
Fixes: dd78fedde4b9 ("rmap: support file thp")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/rmap.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 444d0d958aff..fa09b5eaff34 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1239,14 +1239,14 @@ void page_add_new_anon_rmap(struct page *page,
  */
 void page_add_file_rmap(struct page *page, bool compound)
 {
-	int i, nr = 1;
+	int i, nr = 0;
 
 	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
 	lock_page_memcg(page);
 	if (compound && PageTransHuge(page)) {
 		int nr_pages = thp_nr_pages(page);
 
-		for (i = 0, nr = 0; i < nr_pages; i++) {
+		for (i = 0; i < nr_pages; i++) {
 			if (atomic_inc_and_test(&page[i]._mapcount))
 				nr++;
 		}
@@ -1279,17 +1279,18 @@ void page_add_file_rmap(struct page *page, bool compound)
 			if (PageMlocked(page))
 				clear_page_mlock(head);
 		}
-		if (!atomic_inc_and_test(&page->_mapcount))
-			goto out;
+		if (atomic_inc_and_test(&page->_mapcount))
+			nr++;
 	}
-	__mod_lruvec_page_state(page, NR_FILE_MAPPED, nr);
 out:
+	if (nr)
+		__mod_lruvec_page_state(page, NR_FILE_MAPPED, nr);
 	unlock_page_memcg(page);
 }
 
 static void page_remove_file_rmap(struct page *page, bool compound)
 {
-	int i, nr = 1;
+	int i, nr = 0;
 
 	VM_BUG_ON_PAGE(compound && !PageHead(page), page);
 
@@ -1304,12 +1305,12 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 	if (compound && PageTransHuge(page)) {
 		int nr_pages = thp_nr_pages(page);
 
-		for (i = 0, nr = 0; i < nr_pages; i++) {
+		for (i = 0; i < nr_pages; i++) {
 			if (atomic_add_negative(-1, &page[i]._mapcount))
 				nr++;
 		}
 		if (!atomic_add_negative(-1, compound_mapcount_ptr(page)))
-			return;
+			goto out;
 		if (PageSwapBacked(page))
 			__mod_lruvec_page_state(page, NR_SHMEM_PMDMAPPED,
 						-nr_pages);
@@ -1317,16 +1318,13 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 			__mod_lruvec_page_state(page, NR_FILE_PMDMAPPED,
 						-nr_pages);
 	} else {
-		if (!atomic_add_negative(-1, &page->_mapcount))
-			return;
+		if (atomic_add_negative(-1, &page->_mapcount))
+			nr++;
 	}
 
-	/*
-	 * We use the irq-unsafe __{inc|mod}_lruvec_page_state because
-	 * these counters are not modified in interrupt context, and
-	 * pte lock(a spinlock) is held, which implies preemption disabled.
-	 */
-	__mod_lruvec_page_state(page, NR_FILE_MAPPED, -nr);
+out:
+	if (nr)
+		__mod_lruvec_page_state(page, NR_FILE_MAPPED, -nr);
 
 	if (unlikely(PageMlocked(page)))
 		clear_page_mlock(page);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
  2022-04-28 15:42 ` [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap() Greg Kroah-Hartman
@ 2022-04-28 16:51   ` Hugh Dickins
  2022-04-28 16:58     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2022-04-28 16:51 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: stable, linux-kernel, Hugh Dickins, Yang Shi, Kirill A. Shutemov,
	Andrew Morton, Linus Torvalds, linux-mm

On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:

> From: Hugh Dickins <hughd@google.com>
> 
> commit bd55b0c2d64e84a75575f548a33a3dfecc135b65 upstream.
> 
> PageDoubleMap is maintained differently for anon and for shmem+file: the
> shmem+file one was never cleared, because a safe place to do so could
> not be found; so it would blight future use of the cached hugepage until
> evicted.
> 
> See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/
> 
> But page_add_file_rmap() does provide a safe place to do so (though later
> than one might wish): allowing testing to return to an initial state
> without a damaging drop_caches.
> 
> Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com
> Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Reviewed-by: Yang Shi <shy828301@gmail.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NAK.

I thought we had a long-standing agreement that AUTOSEL does not try
to add patches from akpm's tree which had not been marked for stable.

(Whereas, if a developer asks for such a patch to be added to stable
later, and verifies the result, that's of course a different matter.)

I've chosen to answer to this patch of my 3 in your 14 AUTOSELs,
because this one is just an improvement, not at all a bugfix needed
for stable (maybe AUTOSEL noticed "racy" or "safely" in the comments,
and misunderstood).  The "Fixes" was intended to help any humans who
wanted to backport into their trees.

I do recall that this 13/14, and 14/14, are mods to mm/rmap.c
which followed other (mm/munlock) mods to mm/rmap.c in 5.18-rc1,
which affected the out path of the function involved, and somehow
made 14/14 a little cleaner.  I'm sorry, but I just don't rate it
worth my time at the moment, to verify whether 14/14 happens to
have ended up as a correct patch or not.

And nobody can verify them without these AUTOSELs saying to which
tree they are targeted - 5.17 I suppose.

Hugh

> ---
>  mm/rmap.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9e27f9f038d3..444d0d958aff 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1252,6 +1252,17 @@ void page_add_file_rmap(struct page *page, bool compound)
>  		}
>  		if (!atomic_inc_and_test(compound_mapcount_ptr(page)))
>  			goto out;
> +
> +		/*
> +		 * It is racy to ClearPageDoubleMap in page_remove_file_rmap();
> +		 * but page lock is held by all page_add_file_rmap() compound
> +		 * callers, and SetPageDoubleMap below warns if !PageLocked:
> +		 * so here is a place that DoubleMap can be safely cleared.
> +		 */
> +		VM_WARN_ON_ONCE(!PageLocked(page));
> +		if (nr == nr_pages && PageDoubleMap(page))
> +			ClearPageDoubleMap(page);
> +
>  		if (PageSwapBacked(page))
>  			__mod_lruvec_page_state(page, NR_SHMEM_PMDMAPPED,
>  						nr_pages);
> -- 
> 2.36.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
  2022-04-28 16:51   ` Hugh Dickins
@ 2022-04-28 16:58     ` Greg Kroah-Hartman
  2022-04-28 19:27       ` Hugh Dickins
  2022-05-02  8:45       ` Pavel Machek
  0 siblings, 2 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-28 16:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: stable, linux-kernel, Yang Shi, Kirill A. Shutemov,
	Andrew Morton, Linus Torvalds, linux-mm

On Thu, Apr 28, 2022 at 09:51:58AM -0700, Hugh Dickins wrote:
> On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:
> 
> > From: Hugh Dickins <hughd@google.com>
> > 
> > commit bd55b0c2d64e84a75575f548a33a3dfecc135b65 upstream.
> > 
> > PageDoubleMap is maintained differently for anon and for shmem+file: the
> > shmem+file one was never cleared, because a safe place to do so could
> > not be found; so it would blight future use of the cached hugepage until
> > evicted.
> > 
> > See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/
> > 
> > But page_add_file_rmap() does provide a safe place to do so (though later
> > than one might wish): allowing testing to return to an initial state
> > without a damaging drop_caches.
> > 
> > Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com
> > Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > Reviewed-by: Yang Shi <shy828301@gmail.com>
> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> NAK.
> 
> I thought we had a long-standing agreement that AUTOSEL does not try
> to add patches from akpm's tree which had not been marked for stable.

True, this was my attempt at saying "hey these all look like they should
go to stable trees, why not?"

> I've chosen to answer to this patch of my 3 in your 14 AUTOSELs,
> because this one is just an improvement, not at all a bugfix needed
> for stable (maybe AUTOSEL noticed "racy" or "safely" in the comments,
> and misunderstood).  The "Fixes" was intended to help any humans who
> wanted to backport into their trees.

This all was off of the Fixes: tag.  Again, if these commits fix
something why are they not for stable?  I'm a human asking to backport
these into the stable trees based on that :)

> I do recall that this 13/14, and 14/14, are mods to mm/rmap.c
> which followed other (mm/munlock) mods to mm/rmap.c in 5.18-rc1,
> which affected the out path of the function involved, and somehow
> made 14/14 a little cleaner.  I'm sorry, but I just don't rate it
> worth my time at the moment, to verify whether 14/14 happens to
> have ended up as a correct patch or not.
> 
> And nobody can verify them without these AUTOSELs saying to which
> tree they are targeted - 5.17 I suppose.

5.17 to start with, older ones based on where the Fixes: tags went to.

So do you really want me to drop these?  I will but why are you adding
fixes: tags if you don't want people to take them?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
  2022-04-28 16:58     ` Greg Kroah-Hartman
@ 2022-04-28 19:27       ` Hugh Dickins
  2022-04-28 22:45         ` Sean Christopherson
  2022-04-30  0:27         ` Sasha Levin
  2022-05-02  8:45       ` Pavel Machek
  1 sibling, 2 replies; 21+ messages in thread
From: Hugh Dickins @ 2022-04-28 19:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Hugh Dickins, stable, linux-kernel, Yang Shi, Kirill A. Shutemov,
	Andrew Morton, Linus Torvalds, linux-mm

On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:
> On Thu, Apr 28, 2022 at 09:51:58AM -0700, Hugh Dickins wrote:
> > On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:
> > 
> > > From: Hugh Dickins <hughd@google.com>
> > > 
> > > commit bd55b0c2d64e84a75575f548a33a3dfecc135b65 upstream.
> > > 
> > > PageDoubleMap is maintained differently for anon and for shmem+file: the
> > > shmem+file one was never cleared, because a safe place to do so could
> > > not be found; so it would blight future use of the cached hugepage until
> > > evicted.
> > > 
> > > See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/
> > > 
> > > But page_add_file_rmap() does provide a safe place to do so (though later
> > > than one might wish): allowing testing to return to an initial state
> > > without a damaging drop_caches.
> > > 
> > > Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com
> > > Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
> > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > Reviewed-by: Yang Shi <shy828301@gmail.com>
> > > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > NAK.
> > 
> > I thought we had a long-standing agreement that AUTOSEL does not try
> > to add patches from akpm's tree which had not been marked for stable.
> 
> True, this was my attempt at saying "hey these all look like they should
> go to stable trees, why not?"

Okay, it seems I should have read "AUTOSEL" as "Hey, GregKH here,
these all look like they should go to stable trees, why not?",
which would have drawn a friendlier response.

The answer is that I considered stable at the time, and akpm did too,
and none of my three (I've not looked through the other 11) are serious
enough to be needed in stable; and I'm cautious about backports, because
I know that the tree they went on top of differs thereabouts from 5.17.

Of course I think the patches in 5.18-rc are good, and yes, they're
things I've thought worthwhile enough for me personally to port forward
over several releases until I had time to send in.  But that doesn't
make them safe stable candidates, without someone to verify and vouch
for the results in this or that tree - I run on a much slower clock
than you and most around here, I do not have time for that at present
(and would prefer not even to be having this conversation).

But I'm happily overruled if any mm guys think they are worth that
extra effort, and will verify and vouch for them.

> 
> > I've chosen to answer to this patch of my 3 in your 14 AUTOSELs,
> > because this one is just an improvement, not at all a bugfix needed
> > for stable (maybe AUTOSEL noticed "racy" or "safely" in the comments,
> > and misunderstood).  The "Fixes" was intended to help any humans who
> > wanted to backport into their trees.
> 
> This all was off of the Fixes: tag.  Again, if these commits fix
> something why are they not for stable?  I'm a human asking to backport
> these into the stable trees based on that :)

Your humanity is not in doubt :)  But I think we've gone over this
too many times - each year?  There's a "Fixes:" tag and "Cc: stable"
tag, and in akpm's tree we prefer to be able to specify "Fixes:" to
help each other, without that automatically implying "Cc: stable".
Andrew goes to considerable trouble to determine when "Cc: stable"
is appropriate.

> 
> > I do recall that this 13/14, and 14/14, are mods to mm/rmap.c
> > which followed other (mm/munlock) mods to mm/rmap.c in 5.18-rc1,
> > which affected the out path of the function involved, and somehow
> > made 14/14 a little cleaner.  I'm sorry, but I just don't rate it
> > worth my time at the moment, to verify whether 14/14 happens to
> > have ended up as a correct patch or not.
> > 
> > And nobody can verify them without these AUTOSELs saying to which
> > tree they are targeted - 5.17 I suppose.
> 
> 5.17 to start with, older ones based on where the Fixes: tags went to.
> 
> So do you really want me to drop these?  I will but why are you adding
> fixes: tags if you don't want people to take them?

Yes, please drop them - thanks.  As to the other 11: I hope authors
will speak up one way or the other, but I'll drop out now.

Hugh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
  2022-04-28 19:27       ` Hugh Dickins
@ 2022-04-28 22:45         ` Sean Christopherson
  2022-04-29 12:13           ` Greg Kroah-Hartman
  2022-04-30  0:27         ` Sasha Levin
  1 sibling, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2022-04-28 22:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Greg Kroah-Hartman, stable, linux-kernel, Yang Shi,
	Kirill A. Shutemov, Andrew Morton, Linus Torvalds, linux-mm,
	Sasha Levin, Paolo Bonzini

+Sasha and Paolo

On Thu, Apr 28, 2022, Hugh Dickins wrote:
> On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:
> > On Thu, Apr 28, 2022 at 09:51:58AM -0700, Hugh Dickins wrote:
> > > On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:
> > > 
> > > > From: Hugh Dickins <hughd@google.com>
> > > > 
> > > > commit bd55b0c2d64e84a75575f548a33a3dfecc135b65 upstream.
> > > > 
> > > > PageDoubleMap is maintained differently for anon and for shmem+file: the
> > > > shmem+file one was never cleared, because a safe place to do so could
> > > > not be found; so it would blight future use of the cached hugepage until
> > > > evicted.
> > > > 
> > > > See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/
> > > > 
> > > > But page_add_file_rmap() does provide a safe place to do so (though later
> > > > than one might wish): allowing testing to return to an initial state
> > > > without a damaging drop_caches.
> > > > 
> > > > Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com
> > > > Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
> > > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > > Reviewed-by: Yang Shi <shy828301@gmail.com>
> > > > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > 
> > > NAK.
> > > 
> > > I thought we had a long-standing agreement that AUTOSEL does not try
> > > to add patches from akpm's tree which had not been marked for stable.
> > 
> > True, this was my attempt at saying "hey these all look like they should
> > go to stable trees, why not?"
> 
> Okay, it seems I should have read "AUTOSEL" as "Hey, GregKH here,
> these all look like they should go to stable trees, why not?",
> which would have drawn a friendlier response.

FWIW, Sasha has been using MANUALSEL for the KVM tree to solicit an explicit ACK
from Paolo for these types of patches.  AFAICT, it has been working quite well.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
  2022-04-28 22:45         ` Sean Christopherson
@ 2022-04-29 12:13           ` Greg Kroah-Hartman
  0 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-29 12:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Hugh Dickins, stable, linux-kernel, Yang Shi, Kirill A. Shutemov,
	Andrew Morton, Linus Torvalds, linux-mm, Sasha Levin,
	Paolo Bonzini

On Thu, Apr 28, 2022 at 10:45:18PM +0000, Sean Christopherson wrote:
> +Sasha and Paolo
> 
> On Thu, Apr 28, 2022, Hugh Dickins wrote:
> > On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:
> > > On Thu, Apr 28, 2022 at 09:51:58AM -0700, Hugh Dickins wrote:
> > > > On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:
> > > > 
> > > > > From: Hugh Dickins <hughd@google.com>
> > > > > 
> > > > > commit bd55b0c2d64e84a75575f548a33a3dfecc135b65 upstream.
> > > > > 
> > > > > PageDoubleMap is maintained differently for anon and for shmem+file: the
> > > > > shmem+file one was never cleared, because a safe place to do so could
> > > > > not be found; so it would blight future use of the cached hugepage until
> > > > > evicted.
> > > > > 
> > > > > See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/
> > > > > 
> > > > > But page_add_file_rmap() does provide a safe place to do so (though later
> > > > > than one might wish): allowing testing to return to an initial state
> > > > > without a damaging drop_caches.
> > > > > 
> > > > > Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com
> > > > > Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
> > > > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > > > Reviewed-by: Yang Shi <shy828301@gmail.com>
> > > > > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > > > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > 
> > > > NAK.
> > > > 
> > > > I thought we had a long-standing agreement that AUTOSEL does not try
> > > > to add patches from akpm's tree which had not been marked for stable.
> > > 
> > > True, this was my attempt at saying "hey these all look like they should
> > > go to stable trees, why not?"
> > 
> > Okay, it seems I should have read "AUTOSEL" as "Hey, GregKH here,
> > these all look like they should go to stable trees, why not?",
> > which would have drawn a friendlier response.
> 
> FWIW, Sasha has been using MANUALSEL for the KVM tree to solicit an explicit ACK
> from Paolo for these types of patches.  AFAICT, it has been working quite well.

Yes, that is what I should have put here, sorry about that.  These were
manually picked by me and I am asking if they should be included or not.
I'll resend after dropping Hugh's patches from the series.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
  2022-04-28 19:27       ` Hugh Dickins
  2022-04-28 22:45         ` Sean Christopherson
@ 2022-04-30  0:27         ` Sasha Levin
  1 sibling, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2022-04-30  0:27 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Greg Kroah-Hartman, stable, linux-kernel, Yang Shi,
	Kirill A. Shutemov, Andrew Morton, Linus Torvalds, linux-mm

On Thu, Apr 28, 2022 at 12:27:40PM -0700, Hugh Dickins wrote:
>On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:
>> On Thu, Apr 28, 2022 at 09:51:58AM -0700, Hugh Dickins wrote:
>> > On Thu, 28 Apr 2022, Greg Kroah-Hartman wrote:
>> >
>> > > From: Hugh Dickins <hughd@google.com>
>> > >
>> > > commit bd55b0c2d64e84a75575f548a33a3dfecc135b65 upstream.
>> > >
>> > > PageDoubleMap is maintained differently for anon and for shmem+file: the
>> > > shmem+file one was never cleared, because a safe place to do so could
>> > > not be found; so it would blight future use of the cached hugepage until
>> > > evicted.
>> > >
>> > > See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/
>> > >
>> > > But page_add_file_rmap() does provide a safe place to do so (though later
>> > > than one might wish): allowing testing to return to an initial state
>> > > without a damaging drop_caches.
>> > >
>> > > Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com
>> > > Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
>> > > Signed-off-by: Hugh Dickins <hughd@google.com>
>> > > Reviewed-by: Yang Shi <shy828301@gmail.com>
>> > > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> > > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>> > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> >
>> > NAK.
>> >
>> > I thought we had a long-standing agreement that AUTOSEL does not try
>> > to add patches from akpm's tree which had not been marked for stable.

I guess it was only between myself and mm/ :p

>> True, this was my attempt at saying "hey these all look like they should
>> go to stable trees, why not?"
>
>Okay, it seems I should have read "AUTOSEL" as "Hey, GregKH here,
>these all look like they should go to stable trees, why not?",
>which would have drawn a friendlier response.

FRIENDLYGREGBOT :)

>The answer is that I considered stable at the time, and akpm did too,
>and none of my three (I've not looked through the other 11) are serious
>enough to be needed in stable; and I'm cautious about backports, because
>I know that the tree they went on top of differs thereabouts from 5.17.
>
>Of course I think the patches in 5.18-rc are good, and yes, they're
>things I've thought worthwhile enough for me personally to port forward
>over several releases until I had time to send in.  But that doesn't
>make them safe stable candidates, without someone to verify and vouch
>for the results in this or that tree - I run on a much slower clock
>than you and most around here, I do not have time for that at present
>(and would prefer not even to be having this conversation).
>
>But I'm happily overruled if any mm guys think they are worth that
>extra effort, and will verify and vouch for them.

What's the extra effort here? We're seeing so many cases where we see
issues with LTS kernels and we end up spending so much time triaging and
diagnosing them only to find out that they've already been fixed.

Honesly, having them in -stable seems like *less* effort to me.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
  2022-04-28 16:58     ` Greg Kroah-Hartman
  2022-04-28 19:27       ` Hugh Dickins
@ 2022-05-02  8:45       ` Pavel Machek
  1 sibling, 0 replies; 21+ messages in thread
From: Pavel Machek @ 2022-05-02  8:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Hugh Dickins, stable, linux-kernel, Yang Shi, Kirill A. Shutemov,
	Andrew Morton, Linus Torvalds, linux-mm

Hi!

> > I've chosen to answer to this patch of my 3 in your 14 AUTOSELs,
> > because this one is just an improvement, not at all a bugfix needed
> > for stable (maybe AUTOSEL noticed "racy" or "safely" in the comments,
> > and misunderstood).  The "Fixes" was intended to help any humans who
> > wanted to backport into their trees.
> 
> This all was off of the Fixes: tag.  Again, if these commits fix
> something why are they not for stable?  I'm a human asking to backport
> these into the stable trees based on that :)

I see this as a repeated pattern: People add Fixes: tag for trivial
things that should not really go to stable (typo in comment?) and
stable takes it is a serious bug that needs to be fixed in stable.

Best regards,
								Pavel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-05-02  8:46 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 15:42 [PATCH AUTOSEL 01/14] mm: fix invalid page pointer returned with FOLL_PIN gups Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 02/14] mm: fix missing cache flush for all tail pages of compound page Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 03/14] mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 04/14] mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte() Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 05/14] mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte() Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 06/14] mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 07/14] mm/page_alloc: fetch the correct pcp buddy during bulk free Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 08/14] mm/page_alloc: check high-order pages for corruption during PCP operations Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 09/14] mm/hwpoison: fix error page recovered but reported "not recovered" Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 10/14] mm/mlock: fix potential imbalanced rlimit ucounts adjustment Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 11/14] mm,migrate: fix establishing demotion target Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 12/14] mm/thp: refix __split_huge_pmd_locked() for migration PMD Greg Kroah-Hartman
2022-04-28 15:42 ` [PATCH AUTOSEL 13/14] mm/thp: ClearPageDoubleMap in first page_add_file_rmap() Greg Kroah-Hartman
2022-04-28 16:51   ` Hugh Dickins
2022-04-28 16:58     ` Greg Kroah-Hartman
2022-04-28 19:27       ` Hugh Dickins
2022-04-28 22:45         ` Sean Christopherson
2022-04-29 12:13           ` Greg Kroah-Hartman
2022-04-30  0:27         ` Sasha Levin
2022-05-02  8:45       ` Pavel Machek
2022-04-28 15:42 ` [PATCH AUTOSEL 14/14] mm/thp: fix NR_FILE_MAPPED accounting in page_*_file_rmap() Greg Kroah-Hartman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.