linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* incoming
@ 2020-01-14  0:28 Andrew Morton
  2020-01-14  0:29 ` [patch 01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations Andrew Morton
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


11 MM fixes, based on b3a987b0264d3ddbb24293ebff10eddfc472f653:


    Vlastimil Babka <vbabka@suse.cz>:
      mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations

    David Hildenbrand <david@redhat.com>:
      mm/memory_hotplug: don't free usage map when removing a re-added early section

    "Kirill A. Shutemov" <kirill@shutemov.name>:
    Patch series "Fix two above-47bit hint address vs. THP bugs":
      mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment
      mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment

    Roman Gushchin <guro@fb.com>:
      mm: memcg/slab: fix percpu slab vmstats flushing

    Vlastimil Babka <vbabka@suse.cz>:
      mm, debug_pagealloc: don't rely on static keys too early

    Wen Yang <wenyang@linux.alibaba.com>:
    Patch series "use div64_ul() instead of div_u64() if the divisor is:
      mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
      mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide
      mm/page-writeback.c: improve arithmetic divisions

    Adrian Huang <ahuang12@lenovo.com>:
      mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid

    Yang Shi <yang.shi@linux.alibaba.com>:
      mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE

 include/linux/mm.h                 |   18 +++++++++-
 include/linux/mmzone.h             |    5 +--
 include/trace/events/huge_memory.h |    3 +
 init/main.c                        |    1 
 mm/huge_memory.c                   |   38 ++++++++++++++---------
 mm/memcontrol.c                    |   37 +++++-----------------
 mm/mempolicy.c                     |   10 ++++--
 mm/page-writeback.c                |   10 +++---
 mm/page_alloc.c                    |   61 ++++++++++---------------------------
 mm/shmem.c                         |    7 ++--
 mm/slab.c                          |    4 +-
 mm/slab_common.c                   |    3 +
 mm/slub.c                          |    2 -
 mm/sparse.c                        |    9 ++++-
 mm/vmalloc.c                       |    4 +-
 15 files changed, 102 insertions(+), 110 deletions(-)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
  2020-01-14  0:28 incoming Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  2:16   ` Linus Torvalds
  2020-01-14  0:29 ` [patch 02/11] mm/memory_hotplug: don't free usage map when removing a re-added early section Andrew Morton
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: rientjes, mhocko, mgorman, kirill, aarcange, vbabka, akpm,
	linux-mm, mm-commits, torvalds

From: Vlastimil Babka <vbabka@suse.cz>
Subject: mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations

THP page faults now attempt a __GFP_THISNODE allocation first, which
should only compact existing free memory, followed by another attempt that
can allocate from any node using reclaim/compaction effort specified by
global defrag setting and madvise.

This patch makes the following changes to the scheme:

- before the patch, the first allocation relies on a check for pageblock
  order and __GFP_IO to prevent excessive reclaim.  This however affects
  also the second attempt, which is not limited to single node.  Instead
  of that, reuse the existing check for costly order __GFP_NORETRY
  allocations, and make sure the first THP attempt uses __GFP_NORETRY.  As
  a side-effect, all costly order __GFP_NORETRY allocations will bail out
  if compaction needs reclaim, while previously they only bailed out when
  compaction was deferred due to previous failures.  This should be still
  acceptable within the __GFP_NORETRY semantics.

- before the patch, the second allocation attempt (on all nodes) was
  passing __GFP_NORETRY.  This is redundant as the check for pageblock
  order (discussed above) was stronger.  It's also contrary to
  madvise(MADV_HUGEPAGE) which means some effort to allocate THP is
  requested.  After this patch, the second attempt doesn't pass
  __GFP_THISNODE nor __GFP_NORETRY.

To sum up, THP page faults now try the following attempts:

1. local node only THP allocation with no reclaim, just compaction.
2. for madvised VMA's or when synchronous compaction is enabled always - THP
   allocation from any node with effort determined by global defrag setting
   and VMA madvise
3. fallback to base pages on any node

Link: http://lkml.kernel.org/r/08a3f4dd-c3ce-0009-86c5-9ee51aba8557@suse.cz
Fixes: b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempolicy.c  |   10 +++++++---
 mm/page_alloc.c |   24 +++++-------------------
 2 files changed, 12 insertions(+), 22 deletions(-)

--- a/mm/mempolicy.c~mm-thp-tweak-reclaim-compaction-effort-of-local-only-and-all-node-allocations
+++ a/mm/mempolicy.c
@@ -2148,18 +2148,22 @@ alloc_pages_vma(gfp_t gfp, int order, st
 		nmask = policy_nodemask(gfp, pol);
 		if (!nmask || node_isset(hpage_node, *nmask)) {
 			mpol_cond_put(pol);
+			/*
+			 * First, try to allocate THP only on local node, but
+			 * don't reclaim unnecessarily, just compact.
+			 */
 			page = __alloc_pages_node(hpage_node,
-						gfp | __GFP_THISNODE, order);
+				gfp | __GFP_THISNODE | __GFP_NORETRY, order);
 
 			/*
 			 * If hugepage allocations are configured to always
 			 * synchronous compact or the vma has been madvised
 			 * to prefer hugepage backing, retry allowing remote
-			 * memory as well.
+			 * memory with both reclaim and compact as well.
 			 */
 			if (!page && (gfp & __GFP_DIRECT_RECLAIM))
 				page = __alloc_pages_node(hpage_node,
-						gfp | __GFP_NORETRY, order);
+								gfp, order);
 
 			goto out;
 		}
--- a/mm/page_alloc.c~mm-thp-tweak-reclaim-compaction-effort-of-local-only-and-all-node-allocations
+++ a/mm/page_alloc.c
@@ -4476,8 +4476,11 @@ retry_cpuset:
 		if (page)
 			goto got_pg;
 
-		 if (order >= pageblock_order && (gfp_mask & __GFP_IO) &&
-		     !(gfp_mask & __GFP_RETRY_MAYFAIL)) {
+		/*
+		 * Checks for costly allocations with __GFP_NORETRY, which
+		 * includes some THP page fault allocations
+		 */
+		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
 			/*
 			 * If allocating entire pageblock(s) and compaction
 			 * failed because all zones are below low watermarks
@@ -4498,23 +4501,6 @@ retry_cpuset:
 			if (compact_result == COMPACT_SKIPPED ||
 			    compact_result == COMPACT_DEFERRED)
 				goto nopage;
-		}
-
-		/*
-		 * Checks for costly allocations with __GFP_NORETRY, which
-		 * includes THP page fault allocations
-		 */
-		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
-			/*
-			 * If compaction is deferred for high-order allocations,
-			 * it is because sync compaction recently failed. If
-			 * this is the case and the caller requested a THP
-			 * allocation, we do not want to heavily disrupt the
-			 * system, so we fail the allocation instead of entering
-			 * direct reclaim.
-			 */
-			if (compact_result == COMPACT_DEFERRED)
-				goto nopage;
 
 			/*
 			 * Looks like reclaim/compaction is worth trying, but
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 02/11] mm/memory_hotplug: don't free usage map when removing a re-added early section
  2020-01-14  0:28 incoming Andrew Morton
  2020-01-14  0:29 ` [patch 01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  0:29 ` [patch 03/11] mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment Andrew Morton
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: piliu, osalvador, mhocko, dan.j.williams, david, akpm, linux-mm,
	mm-commits, torvalds

From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: don't free usage map when removing a re-added early section

When we remove an early section, we don't free the usage map, as the usage
maps of other sections are placed into the same page.  Once the section is
removed, it is no longer an early section (especially, the memmap is
freed).  When we re-add that section, the usage map is reused, however, it
is no longer an early section.  When removing that section again, we try
to kfree() a usage map that was allocated during early boot - bad.

Let's check against PageReserved() to see if we are dealing with an usage
map that was allocated during boot.  We could also check against
!(PageSlab(usage_page) || PageCompound(usage_page)), but PageReserved() is
cleaner.

Can be triggered using memtrace under ppc64/powernv:

$ mount -t debugfs none /sys/kernel/debug/
$ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
$ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
[   12.093442] ------------[ cut here ]------------
[   12.093469] kernel BUG at mm/slub.c:3969!
[   12.093656] Oops: Exception in kernel mode, sig: 5 [#1]
[   12.093961] LE PAGE_SIZE=3D64K MMU=3DHash SMP NR_CPUS=3D2048 NUMA Powe=
rNV
[   12.094320] Modules linked in:
[   12.094615] CPU: 0 PID: 154 Comm: sh Not tainted 5.5.0-rc2-next-201912=
16-00005-g0be1dba7b7c0 #61
[   12.095066] NIP:  c000000000396b38 LR: c000000000385848 CTR: c00000000=
0143d30
[   12.095427] REGS: c000000073077680 TRAP: 0700   Not tainted  (5.5.0-rc=
2-next-20191216-00005-g0be1dba7b7c0)
[   12.095886] MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE=
>  CR: 28004828  XER: 20000000
[   12.096395] CFAR: c000000000396b9c IRQMASK: 0
[   12.096395] GPR00: c000000000385848 c000000073077910 c00000000110f300 =
c00000007ffffc00
[   12.096395] GPR04: 0000000000000000 ffffffffffffffff 0000000000000000 =
0000000000000000
[   12.096395] GPR08: 0000000000000000 0000000000000001 0000000000000000 =
ffffffffffffffc8
[   12.096395] GPR12: 0000000000004000 c0000000012d0000 0000000000001000 =
c000000000d33c78
[   12.096395] GPR16: 0000000000000000 c0000000011bfeb0 ffffffffffffe000 =
c0000000000b6370
[   12.096395] GPR20: ffffffffe0000000 c0000000011411c0 0000000000006000 =
c0000000000b6390
[   12.096395] GPR24: 0000000010000000 0000000000000040 0000000000000000 =
0000000000000000
[   12.096395] GPR28: c000000000385848 c00c0000001fffc0 0000000000004000 =
0000000000000000
[   12.099882] NIP [c000000000396b38] kfree+0x338/0x3b0
[   12.100135] LR [c000000000385848] section_deactivate+0x138/0x200
[   12.100508] Call Trace:
[   12.100927] [c000000073077910] [c0000000010599a8] 0xc0000000010599a8 (=
unreliable)
[   12.101338] [c000000073077960] [c000000000385848] section_deactivate+0=
x138/0x200
[   12.101696] [c000000073077a10] [c00000000039b9f4] __remove_pages+0x114=
/0x150
[   12.102025] [c000000073077a60] [c00000000006793c] arch_remove_memory+0=
x3c/0x160
[   12.102381] [c000000073077ae0] [c00000000039c154] try_remove_memory+0x=
114/0x1a0
[   12.102715] [c000000073077b90] [c00000000039c020] __remove_memory+0x20=
/0x40
[   12.103041] [c000000073077bb0] [c0000000000b6714] memtrace_enable_set+=
0x254/0x850
[   12.103402] [c000000073077cb0] [c0000000004197e8] simple_attr_write+0x=
138/0x160
[   12.103751] [c000000073077d10] [c000000000558c9c] full_proxy_write+0x8=
c/0x110
[   12.104100] [c000000073077d60] [c0000000003d02a8] __vfs_write+0x38/0x7=
0
[   12.104409] [c000000073077d80] [c0000000003d3c5c] vfs_write+0x11c/0x2a=
0
[   12.104711] [c000000073077dd0] [c0000000003d4054] ksys_write+0x84/0x14=
0
[   12.105011] [c000000073077e20] [c00000000000b594] system_call+0x5c/0x6=
8
[   12.105357] Instruction dump:
[   12.105606] e93d0000 75290001 41820090 8bfd0051 38a0ffff 7ca5f830 7bff=
0020 7ca507b4
[   12.105993] e95d0000 39200000 754a0001 4182005c <0b090000> 893d0007 3d=
42000b 38800006
[   12.106583] ---[ end trace 4b053cbd84e0db62 ]---

The first invocation will offline+remove memory blocks. The second
invocation will first add+online them again, in order to offline+remove
them again (usually we are lucky and the exact same memory blocks will
get "reallocated").

Tested on powernv with boot memory: The usage map will not get freed.
Tested on x86-64 with DIMMs: The usage map will get freed.

Using Dynamic Memory under a Power DLAPR can trigger it easily. 
Triggering removal (I assume after previously removed+re-added) of
memory from the HMC GUI can crash the kernel with the same call trace
and is fixed by this patch.

Link: http://lkml.kernel.org/r/20191217104637.5509-1-david@redhat.com
Fixes: 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY flag")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Pingfan Liu <piliu@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/sparse.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/mm/sparse.c~mm-memory_hotplug-dont-free-usage-map-when-removing-a-re-added-early-section
+++ a/mm/sparse.c
@@ -777,7 +777,14 @@ static void section_deactivate(unsigned
 	if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
 		unsigned long section_nr = pfn_to_section_nr(pfn);
 
-		if (!section_is_early) {
+		/*
+		 * When removing an early section, the usage map is kept (as the
+		 * usage maps of other sections fall into the same page). It
+		 * will be re-used when re-adding the section - which is then no
+		 * longer an early section. If the usage map is PageReserved, it
+		 * was allocated during boot.
+		 */
+		if (!PageReserved(virt_to_page(ms->usage))) {
 			kfree(ms->usage);
 			ms->usage = NULL;
 		}
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 03/11] mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment
  2020-01-14  0:28 incoming Andrew Morton
  2020-01-14  0:29 ` [patch 01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations Andrew Morton
  2020-01-14  0:29 ` [patch 02/11] mm/memory_hotplug: don't free usage map when removing a re-added early section Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  0:29 ` [patch 04/11] mm/shmem.c: thp, shmem: " Andrew Morton
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: thomas.willhalm, stable, otto.g.bruggeman, kirill.shutemov,
	dan.j.williams, aneesh.kumar, kirill, akpm, linux-mm, mm-commits,
	torvalds

From: "Kirill A. Shutemov" <kirill@shutemov.name>
Subject: mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment

Patch series "Fix two above-47bit hint address vs. THP bugs".

The two get_unmapped_area() implementations have to be fixed to provide
THP-friendly mappings if above-47bit hint address is specified.


This patch (of 2):

Filesystems use thp_get_unmapped_area() to provide THP-friendly mappings. 
For DAX in particular.

Normally, the kernel doesn't create userspace mappings above 47-bit, even
if the machine allows this (such as with 5-level paging on x86-64).  Not
all user space is ready to handle wide addresses.  It's known that at
least some JIT compilers use higher bits in pointers to encode their
information.

Userspace can ask for allocation from full address space by specifying
hint address (with or without MAP_FIXED) above 47-bits.  If the
application doesn't need a particular address, but wants to allocate from
whole address space it can specify -1 as a hint address.

Unfortunately, this trick breaks thp_get_unmapped_area(): the function
would not try to allocate PMD-aligned area if *any* hint address
specified.

Modify the routine to handle it correctly:

 - Try to allocate the space at the specified hint address with length
   padding required for PMD alignment.
 - If failed, retry without length padding (but with the same hint
   address);
 - If the returned address matches the hint address return it.
 - Otherwise, align the address as required for THP and return.

The user specified hint address is passed down to get_unmapped_area() so
above-47bit hint address will be taken into account without breaking
alignment requirements.

Link: http://lkml.kernel.org/r/20191220142548.7118-2-kirill.shutemov@linux.intel.com
Fixes: b569bab78d8d ("x86/mm: Prepare to expose larger address space to userspace")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Thomas Willhalm <thomas.willhalm@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Bruggeman, Otto G" <otto.g.bruggeman@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |   38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

--- a/mm/huge_memory.c~thp-fix-conflict-of-above-47bit-hint-address-and-pmd-alignment
+++ a/mm/huge_memory.c
@@ -527,13 +527,13 @@ void prep_transhuge_page(struct page *pa
 	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
 }
 
-static unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
+static unsigned long __thp_get_unmapped_area(struct file *filp,
+		unsigned long addr, unsigned long len,
 		loff_t off, unsigned long flags, unsigned long size)
 {
-	unsigned long addr;
 	loff_t off_end = off + len;
 	loff_t off_align = round_up(off, size);
-	unsigned long len_pad;
+	unsigned long len_pad, ret;
 
 	if (off_end <= off_align || (off_end - off_align) < size)
 		return 0;
@@ -542,30 +542,40 @@ static unsigned long __thp_get_unmapped_
 	if (len_pad < len || (off + len_pad) < off)
 		return 0;
 
-	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
+	ret = current->mm->get_unmapped_area(filp, addr, len_pad,
 					      off >> PAGE_SHIFT, flags);
-	if (IS_ERR_VALUE(addr))
+
+	/*
+	 * The failure might be due to length padding. The caller will retry
+	 * without the padding.
+	 */
+	if (IS_ERR_VALUE(ret))
 		return 0;
 
-	addr += (off - addr) & (size - 1);
-	return addr;
+	/*
+	 * Do not try to align to THP boundary if allocation at the address
+	 * hint succeeds.
+	 */
+	if (ret == addr)
+		return addr;
+
+	ret += (off - ret) & (size - 1);
+	return ret;
 }
 
 unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
 		unsigned long len, unsigned long pgoff, unsigned long flags)
 {
+	unsigned long ret;
 	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
 
-	if (addr)
-		goto out;
 	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
 		goto out;
 
-	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
-	if (addr)
-		return addr;
-
- out:
+	ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE);
+	if (ret)
+		return ret;
+out:
 	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
 }
 EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 04/11] mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment
  2020-01-14  0:28 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2020-01-14  0:29 ` [patch 03/11] mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  0:29 ` [patch 05/11] mm: memcg/slab: fix percpu slab vmstats flushing Andrew Morton
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: thomas.willhalm, stable, otto.g.bruggeman, kirill.shutemov,
	dan.j.williams, aneesh.kumar, kirill, akpm, linux-mm, mm-commits,
	torvalds

From: "Kirill A. Shutemov" <kirill@shutemov.name>
Subject: mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment

Shmem/tmpfs tries to provide THP-friendly mappings if huge pages are
enabled.  But it doesn't work well with above-47bit hint address.

Normally, the kernel doesn't create userspace mappings above 47-bit, even
if the machine allows this (such as with 5-level paging on x86-64).  Not
all user space is ready to handle wide addresses.  It's known that at
least some JIT compilers use higher bits in pointers to encode their
information.

Userspace can ask for allocation from full address space by specifying
hint address (with or without MAP_FIXED) above 47-bits.  If the
application doesn't need a particular address, but wants to allocate from
whole address space it can specify -1 as a hint address.

Unfortunately, this trick breaks THP alignment in shmem/tmp:
shmem_get_unmapped_area() would not try to allocate PMD-aligned area if
*any* hint address specified.

This can be fixed by requesting the aligned area if the we failed to
allocated at user-specified hint address.  The request with inflated
length will also take the user-specified hint address.  This way we will
not lose an allocation request from the full address space.

[kirill@shutemov.name: fold in a fixup]
  Link: http://lkml.kernel.org/r/20191223231309.t6bh5hkbmokihpfu@box
Link: http://lkml.kernel.org/r/20191220142548.7118-3-kirill.shutemov@linux.intel.com
Fixes: b569bab78d8d ("x86/mm: Prepare to expose larger address space to userspace")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Willhalm, Thomas" <thomas.willhalm@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Bruggeman, Otto G" <otto.g.bruggeman@intel.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/shmem.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/mm/shmem.c~thp-shmem-fix-conflict-of-above-47bit-hint-address-and-pmd-alignment
+++ a/mm/shmem.c
@@ -2107,9 +2107,10 @@ unsigned long shmem_get_unmapped_area(st
 	/*
 	 * Our priority is to support MAP_SHARED mapped hugely;
 	 * and support MAP_PRIVATE mapped hugely too, until it is COWed.
-	 * But if caller specified an address hint, respect that as before.
+	 * But if caller specified an address hint and we allocated area there
+	 * successfully, respect that as before.
 	 */
-	if (uaddr)
+	if (uaddr == addr)
 		return addr;
 
 	if (shmem_huge != SHMEM_HUGE_FORCE) {
@@ -2143,7 +2144,7 @@ unsigned long shmem_get_unmapped_area(st
 	if (inflated_len < len)
 		return addr;
 
-	inflated_addr = get_area(NULL, 0, inflated_len, 0, flags);
+	inflated_addr = get_area(NULL, uaddr, inflated_len, 0, flags);
 	if (IS_ERR_VALUE(inflated_addr))
 		return addr;
 	if (inflated_addr & ~PAGE_MASK)
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 05/11] mm: memcg/slab: fix percpu slab vmstats flushing
  2020-01-14  0:28 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2020-01-14  0:29 ` [patch 04/11] mm/shmem.c: thp, shmem: " Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  0:29 ` [patch 06/11] mm, debug_pagealloc: don't rely on static keys too early Andrew Morton
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: stable, mhocko, hannes, guro, akpm, linux-mm, mm-commits, torvalds

From: Roman Gushchin <guro@fb.com>
Subject: mm: memcg/slab: fix percpu slab vmstats flushing

Currently slab percpu vmstats are flushed twice: during the memcg
offlining and just before freeing the memcg structure.  Each time percpu
counters are summed, added to the atomic counterparts and propagated up by
the cgroup tree.

The second flushing is required due to how recursive vmstats are
implemented: counters are batched in percpu variables on a local level,
and once a percpu value is crossing some predefined threshold, it spills
over to atomic values on the local and each ascendant levels.  It means
that without flushing some numbers cached in percpu variables will be
dropped on floor each time a cgroup is destroyed.  And with uptime the
error on upper levels might become noticeable.

The first flushing aims to make counters on ancestor levels more precise. 
Dying cgroups may resume in the dying state for a long time.  After
kmem_cache reparenting which is performed during the offlining slab
counters of the dying cgroup don't have any chances to be updated, because
any slab operations will be performed on the parent level.  It means that
the inaccuracy caused by percpu batching will not decrease up to the final
destruction of the cgroup.  By the original idea flushing slab counters
during the offlining should minimize the visible inaccuracy of slab
counters on the parent level.

The problem is that percpu counters are not zeroed after the first
flushing.  So every cached percpu value is summed twice.  It creates a
small error (up to 32 pages per cpu, but usually less) which accumulates
on parent cgroup level.  After creating and destroying of thousands of
child cgroups, slab counter on parent level can be way off the real value.

For now, let's just stop flushing slab counters on memcg offlining.  It
can't be done correctly without scheduling a work on each cpu: reading and
zeroing it during css offlining can race with an asynchronous update,
which doesn't expect values to be changed underneath.

With this change, slab counters on parent level will become eventually
consistent.  Once all dying children are gone, values are correct.  And if
not, the error is capped by 32 * NR_CPUS pages per dying cgroup.

It's not perfect, as slab are reparented, so any updates after the
reparenting will happen on the parent level.  It means that if a slab page
was allocated, a counter on child level was bumped, then the page was
reparented and freed, the annihilation of positive and negative counter
values will not happen until the child cgroup is released.  It makes slab
counters different from others, and it might want us to implement flushing
in a correct form again.  But it's also a question of performance:
scheduling a work on each cpu isn't free, and it's an open question if the
benefit of having more accurate counters is worth it.

We might also consider flushing all counters on offlining, not only slab
counters.

So let's fix the main problem now: make the slab counters eventually
consistent, so at least the error won't grow with uptime (or more
precisely the number of created and destroyed cgroups).  And think about
the accuracy of counters separately.

Link: http://lkml.kernel.org/r/20191220042728.1045881-1-guro@fb.com
Fixes: bee07b33db78 ("mm: memcontrol: flush percpu slab vmstats on kmem offlining")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmzone.h |    5 ++---
 mm/memcontrol.c        |   37 +++++++++----------------------------
 2 files changed, 11 insertions(+), 31 deletions(-)

--- a/include/linux/mmzone.h~mm-memcg-slab-fix-percpu-slab-vmstats-flushing
+++ a/include/linux/mmzone.h
@@ -215,9 +215,8 @@ enum node_stat_item {
 	NR_INACTIVE_FILE,	/*  "     "     "   "       "         */
 	NR_ACTIVE_FILE,		/*  "     "     "   "       "         */
 	NR_UNEVICTABLE,		/*  "     "     "   "       "         */
-	NR_SLAB_RECLAIMABLE,	/* Please do not reorder this item */
-	NR_SLAB_UNRECLAIMABLE,	/* and this one without looking at
-				 * memcg_flush_percpu_vmstats() first. */
+	NR_SLAB_RECLAIMABLE,
+	NR_SLAB_UNRECLAIMABLE,
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
 	WORKINGSET_NODES,
--- a/mm/memcontrol.c~mm-memcg-slab-fix-percpu-slab-vmstats-flushing
+++ a/mm/memcontrol.c
@@ -3287,49 +3287,34 @@ static u64 mem_cgroup_read_u64(struct cg
 	}
 }
 
-static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg, bool slab_only)
+static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg)
 {
-	unsigned long stat[MEMCG_NR_STAT];
+	unsigned long stat[MEMCG_NR_STAT] = {0};
 	struct mem_cgroup *mi;
 	int node, cpu, i;
-	int min_idx, max_idx;
-
-	if (slab_only) {
-		min_idx = NR_SLAB_RECLAIMABLE;
-		max_idx = NR_SLAB_UNRECLAIMABLE;
-	} else {
-		min_idx = 0;
-		max_idx = MEMCG_NR_STAT;
-	}
-
-	for (i = min_idx; i < max_idx; i++)
-		stat[i] = 0;
 
 	for_each_online_cpu(cpu)
-		for (i = min_idx; i < max_idx; i++)
+		for (i = 0; i < MEMCG_NR_STAT; i++)
 			stat[i] += per_cpu(memcg->vmstats_percpu->stat[i], cpu);
 
 	for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
-		for (i = min_idx; i < max_idx; i++)
+		for (i = 0; i < MEMCG_NR_STAT; i++)
 			atomic_long_add(stat[i], &mi->vmstats[i]);
 
-	if (!slab_only)
-		max_idx = NR_VM_NODE_STAT_ITEMS;
-
 	for_each_node(node) {
 		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
 		struct mem_cgroup_per_node *pi;
 
-		for (i = min_idx; i < max_idx; i++)
+		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
 			stat[i] = 0;
 
 		for_each_online_cpu(cpu)
-			for (i = min_idx; i < max_idx; i++)
+			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
 				stat[i] += per_cpu(
 					pn->lruvec_stat_cpu->count[i], cpu);
 
 		for (pi = pn; pi; pi = parent_nodeinfo(pi, node))
-			for (i = min_idx; i < max_idx; i++)
+			for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
 				atomic_long_add(stat[i], &pi->lruvec_stat[i]);
 	}
 }
@@ -3403,13 +3388,9 @@ static void memcg_offline_kmem(struct me
 		parent = root_mem_cgroup;
 
 	/*
-	 * Deactivate and reparent kmem_caches. Then flush percpu
-	 * slab statistics to have precise values at the parent and
-	 * all ancestor levels. It's required to keep slab stats
-	 * accurate after the reparenting of kmem_caches.
+	 * Deactivate and reparent kmem_caches.
 	 */
 	memcg_deactivate_kmem_caches(memcg, parent);
-	memcg_flush_percpu_vmstats(memcg, true);
 
 	kmemcg_id = memcg->kmemcg_id;
 	BUG_ON(kmemcg_id < 0);
@@ -4913,7 +4894,7 @@ static void mem_cgroup_free(struct mem_c
 	 * Flush percpu vmstats and vmevents to guarantee the value correctness
 	 * on parent's and all ancestor levels.
 	 */
-	memcg_flush_percpu_vmstats(memcg, false);
+	memcg_flush_percpu_vmstats(memcg);
 	memcg_flush_percpu_vmevents(memcg);
 	__mem_cgroup_free(memcg);
 }
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 06/11] mm, debug_pagealloc: don't rely on static keys too early
  2020-01-14  0:28 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2020-01-14  0:29 ` [patch 05/11] mm: memcg/slab: fix percpu slab vmstats flushing Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  0:29 ` [patch 07/11] mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() Andrew Morton
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: willy, stable, sfr, peterz, mhocko, mgorman, kirill.shutemov,
	iamjoonsoo.kim, cai, bp, vbabka, akpm, linux-mm, mm-commits,
	torvalds

From: Vlastimil Babka <vbabka@suse.cz>
Subject: mm, debug_pagealloc: don't rely on static keys too early

Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable
debugging") has introduced a static key to reduce overhead when
debug_pagealloc is compiled in but not enabled.  It relied on the
assumption that jump_label_init() is called before parse_early_param() as
in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it
is safe to enable the static key.

However, it turns out multiple architectures call parse_early_param()
earlier from their setup_arch().  x86 also calls jump_label_init() even
earlier, so no issue was found while testing the commit, but same is not
true for e.g.  ppc64 and s390 where the kernel would not boot with
debug_pagealloc=on as found by our QA.

To fix this without tricky changes to init code of multiple architectures,
this patch partially reverts the static key conversion from 96a2b03f281d. 
Init-time and non-fastpath calls (such as in arch code) of
debug_pagealloc_enabled() will again test a simple bool variable. 
Fastpath mm code is converted to a new debug_pagealloc_enabled_static()
variant that relies on the static key, which is enabled in a well-defined
point in mm_init() where it's guaranteed that jump_label_init() has been
called, regardless of architecture.

[sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
  Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |   18 +++++++++++++++---
 init/main.c        |    1 +
 mm/page_alloc.c    |   37 +++++++++++++------------------------
 mm/slab.c          |    4 ++--
 mm/slub.c          |    2 +-
 mm/vmalloc.c       |    4 ++--
 6 files changed, 34 insertions(+), 32 deletions(-)

--- a/include/linux/mm.h~mm-debug_pagealloc-dont-rely-on-static-keys-too-early
+++ a/include/linux/mm.h
@@ -2658,14 +2658,26 @@ static inline bool want_init_on_free(voi
 	       !page_poisoning_enabled();
 }
 
-#ifdef CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT
-DECLARE_STATIC_KEY_TRUE(_debug_pagealloc_enabled);
+#ifdef CONFIG_DEBUG_PAGEALLOC
+extern void init_debug_pagealloc(void);
 #else
-DECLARE_STATIC_KEY_FALSE(_debug_pagealloc_enabled);
+static inline void init_debug_pagealloc(void) {}
 #endif
+extern bool _debug_pagealloc_enabled_early;
+DECLARE_STATIC_KEY_FALSE(_debug_pagealloc_enabled);
 
 static inline bool debug_pagealloc_enabled(void)
 {
+	return IS_ENABLED(CONFIG_DEBUG_PAGEALLOC) &&
+		_debug_pagealloc_enabled_early;
+}
+
+/*
+ * For use in fast paths after init_debug_pagealloc() has run, or when a
+ * false negative result is not harmful when called too early.
+ */
+static inline bool debug_pagealloc_enabled_static(void)
+{
 	if (!IS_ENABLED(CONFIG_DEBUG_PAGEALLOC))
 		return false;
 
--- a/init/main.c~mm-debug_pagealloc-dont-rely-on-static-keys-too-early
+++ a/init/main.c
@@ -553,6 +553,7 @@ static void __init mm_init(void)
 	 * bigger than MAX_ORDER unless SPARSEMEM.
 	 */
 	page_ext_init_flatmem();
+	init_debug_pagealloc();
 	report_meminit();
 	mem_init();
 	kmem_cache_init();
--- a/mm/page_alloc.c~mm-debug_pagealloc-dont-rely-on-static-keys-too-early
+++ a/mm/page_alloc.c
@@ -694,34 +694,27 @@ void prep_compound_page(struct page *pag
 #ifdef CONFIG_DEBUG_PAGEALLOC
 unsigned int _debug_guardpage_minorder;
 
-#ifdef CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT
-DEFINE_STATIC_KEY_TRUE(_debug_pagealloc_enabled);
-#else
+bool _debug_pagealloc_enabled_early __read_mostly
+			= IS_ENABLED(CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT);
+EXPORT_SYMBOL(_debug_pagealloc_enabled_early);
 DEFINE_STATIC_KEY_FALSE(_debug_pagealloc_enabled);
-#endif
 EXPORT_SYMBOL(_debug_pagealloc_enabled);
 
 DEFINE_STATIC_KEY_FALSE(_debug_guardpage_enabled);
 
 static int __init early_debug_pagealloc(char *buf)
 {
-	bool enable = false;
-
-	if (kstrtobool(buf, &enable))
-		return -EINVAL;
-
-	if (enable)
-		static_branch_enable(&_debug_pagealloc_enabled);
-
-	return 0;
+	return kstrtobool(buf, &_debug_pagealloc_enabled_early);
 }
 early_param("debug_pagealloc", early_debug_pagealloc);
 
-static void init_debug_guardpage(void)
+void init_debug_pagealloc(void)
 {
 	if (!debug_pagealloc_enabled())
 		return;
 
+	static_branch_enable(&_debug_pagealloc_enabled);
+
 	if (!debug_guardpage_minorder())
 		return;
 
@@ -1186,7 +1179,7 @@ static __always_inline bool free_pages_p
 	 */
 	arch_free_page(page, order);
 
-	if (debug_pagealloc_enabled())
+	if (debug_pagealloc_enabled_static())
 		kernel_map_pages(page, 1 << order, 0);
 
 	kasan_free_nondeferred_pages(page, order);
@@ -1207,7 +1200,7 @@ static bool free_pcp_prepare(struct page
 
 static bool bulkfree_pcp_prepare(struct page *page)
 {
-	if (debug_pagealloc_enabled())
+	if (debug_pagealloc_enabled_static())
 		return free_pages_check(page);
 	else
 		return false;
@@ -1221,7 +1214,7 @@ static bool bulkfree_pcp_prepare(struct
  */
 static bool free_pcp_prepare(struct page *page)
 {
-	if (debug_pagealloc_enabled())
+	if (debug_pagealloc_enabled_static())
 		return free_pages_prepare(page, 0, true);
 	else
 		return free_pages_prepare(page, 0, false);
@@ -1973,10 +1966,6 @@ void __init page_alloc_init_late(void)
 
 	for_each_populated_zone(zone)
 		set_zone_contiguous(zone);
-
-#ifdef CONFIG_DEBUG_PAGEALLOC
-	init_debug_guardpage();
-#endif
 }
 
 #ifdef CONFIG_CMA
@@ -2106,7 +2095,7 @@ static inline bool free_pages_prezeroed(
  */
 static inline bool check_pcp_refill(struct page *page)
 {
-	if (debug_pagealloc_enabled())
+	if (debug_pagealloc_enabled_static())
 		return check_new_page(page);
 	else
 		return false;
@@ -2128,7 +2117,7 @@ static inline bool check_pcp_refill(stru
 }
 static inline bool check_new_pcp(struct page *page)
 {
-	if (debug_pagealloc_enabled())
+	if (debug_pagealloc_enabled_static())
 		return check_new_page(page);
 	else
 		return false;
@@ -2155,7 +2144,7 @@ inline void post_alloc_hook(struct page
 	set_page_refcounted(page);
 
 	arch_alloc_page(page, order);
-	if (debug_pagealloc_enabled())
+	if (debug_pagealloc_enabled_static())
 		kernel_map_pages(page, 1 << order, 1);
 	kasan_alloc_pages(page, order);
 	kernel_poison_pages(page, 1 << order, 1);
--- a/mm/slab.c~mm-debug_pagealloc-dont-rely-on-static-keys-too-early
+++ a/mm/slab.c
@@ -1416,7 +1416,7 @@ static void kmem_rcu_free(struct rcu_hea
 #if DEBUG
 static bool is_debug_pagealloc_cache(struct kmem_cache *cachep)
 {
-	if (debug_pagealloc_enabled() && OFF_SLAB(cachep) &&
+	if (debug_pagealloc_enabled_static() && OFF_SLAB(cachep) &&
 		(cachep->size % PAGE_SIZE) == 0)
 		return true;
 
@@ -2008,7 +2008,7 @@ int __kmem_cache_create(struct kmem_cach
 	 * to check size >= 256. It guarantees that all necessary small
 	 * sized slab is initialized in current slab initialization sequence.
 	 */
-	if (debug_pagealloc_enabled() && (flags & SLAB_POISON) &&
+	if (debug_pagealloc_enabled_static() && (flags & SLAB_POISON) &&
 		size >= 256 && cachep->object_size > cache_line_size()) {
 		if (size < PAGE_SIZE || size % PAGE_SIZE == 0) {
 			size_t tmp_size = ALIGN(size, PAGE_SIZE);
--- a/mm/slub.c~mm-debug_pagealloc-dont-rely-on-static-keys-too-early
+++ a/mm/slub.c
@@ -288,7 +288,7 @@ static inline void *get_freepointer_safe
 	unsigned long freepointer_addr;
 	void *p;
 
-	if (!debug_pagealloc_enabled())
+	if (!debug_pagealloc_enabled_static())
 		return get_freepointer(s, object);
 
 	freepointer_addr = (unsigned long)object + s->offset;
--- a/mm/vmalloc.c~mm-debug_pagealloc-dont-rely-on-static-keys-too-early
+++ a/mm/vmalloc.c
@@ -1383,7 +1383,7 @@ static void free_unmap_vmap_area(struct
 {
 	flush_cache_vunmap(va->va_start, va->va_end);
 	unmap_vmap_area(va);
-	if (debug_pagealloc_enabled())
+	if (debug_pagealloc_enabled_static())
 		flush_tlb_kernel_range(va->va_start, va->va_end);
 
 	free_vmap_area_noflush(va);
@@ -1681,7 +1681,7 @@ static void vb_free(const void *addr, un
 
 	vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
 
-	if (debug_pagealloc_enabled())
+	if (debug_pagealloc_enabled_static())
 		flush_tlb_kernel_range((unsigned long)addr,
 					(unsigned long)addr + size);
 
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 07/11] mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
  2020-01-14  0:28 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2020-01-14  0:29 ` [patch 06/11] mm, debug_pagealloc: don't rely on static keys too early Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  0:29 ` [patch 08/11] mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide Andrew Morton
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: tj, cai, axboe, wenyang, akpm, linux-mm, mm-commits, torvalds

From: Wen Yang <wenyang@linux.alibaba.com>
Subject: mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()

Patch series "use div64_ul() instead of div_u64() if the divisor is
unsigned long".

We were first inspired by commit b0ab99e7736a ("sched: Fix possible divide
by zero in avg_atom () calculation"), then refer to the recently analyzed
mm code, we found this suspicious place.

 201                 if (min) {
 202                         min *= this_bw;
 203                         do_div(min, tot_bw);
 204                 }

And we also disassembled and confirmed it:

/usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 201
0xffffffff811c37da <__wb_calc_thresh+234>:      xor    %r10d,%r10d
0xffffffff811c37dd <__wb_calc_thresh+237>:      test   %rax,%rax
0xffffffff811c37e0 <__wb_calc_thresh+240>:      je 0xffffffff811c3800 <__wb_calc_thresh+272>
/usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 202
0xffffffff811c37e2 <__wb_calc_thresh+242>:      imul   %r8,%rax
/usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 203
0xffffffff811c37e6 <__wb_calc_thresh+246>:      mov    %r9d,%r10d    ---> truncates it to 32 bits here
0xffffffff811c37e9 <__wb_calc_thresh+249>:      xor    %edx,%edx
0xffffffff811c37eb <__wb_calc_thresh+251>:      div    %r10
0xffffffff811c37ee <__wb_calc_thresh+254>:      imul   %rbx,%rax
0xffffffff811c37f2 <__wb_calc_thresh+258>:      shr    $0x2,%rax
0xffffffff811c37f6 <__wb_calc_thresh+262>:      mul    %rcx
0xffffffff811c37f9 <__wb_calc_thresh+265>:      shr    $0x2,%rdx
0xffffffff811c37fd <__wb_calc_thresh+269>:      mov    %rdx,%r10

This series uses div64_ul() instead of div_u64() if the divisor is
unsigned long, to avoid truncation to 32-bit on 64-bit platforms.


This patch (of 3):

The variables 'min' and 'max' are unsigned long and do_div truncates them
to 32 bits, which means it can test non-zero and be truncated to zero for
division.  Fix this issue by using div64_ul() instead.

Link: http://lkml.kernel.org/r/20200102081442.8273-2-wenyang@linux.alibaba.com
Fixes: 693108a8a667 ("writeback: make bdi->min/max_ratio handling cgroup writeback aware")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page-writeback.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/page-writeback.c~mm-page-writebackc-avoid-potential-division-by-zero-in-wb_min_max_ratio
+++ a/mm/page-writeback.c
@@ -201,11 +201,11 @@ static void wb_min_max_ratio(struct bdi_
 	if (this_bw < tot_bw) {
 		if (min) {
 			min *= this_bw;
-			do_div(min, tot_bw);
+			min = div64_ul(min, tot_bw);
 		}
 		if (max < 100) {
 			max *= this_bw;
-			do_div(max, tot_bw);
+			max = div64_ul(max, tot_bw);
 		}
 	}
 
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 08/11] mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide
  2020-01-14  0:28 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2020-01-14  0:29 ` [patch 07/11] mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  0:29 ` [patch 09/11] mm/page-writeback.c: improve arithmetic divisions Andrew Morton
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: tj, cai, axboe, wenyang, akpm, linux-mm, mm-commits, torvalds

From: Wen Yang <wenyang@linux.alibaba.com>
Subject: mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide

The two variables 'numerator' and 'denominator', though they are declared
as long, they should actually be unsigned long (according to the
implementation of the fprop_fraction_percpu() function)

And do_div() does a 64-by-32 division, while the divisor 'denominator' is
unsigned long, thus 64-bit on 64-bit platforms.  Hence the proper function
to call is div64_ul().

Link: http://lkml.kernel.org/r/20200102081442.8273-3-wenyang@linux.alibaba.com
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page-writeback.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/page-writeback.c~mm-page-writebackc-use-div64_ul-for-u64-by-unsigned-long-divide
+++ a/mm/page-writeback.c
@@ -766,7 +766,7 @@ static unsigned long __wb_calc_thresh(st
 	struct wb_domain *dom = dtc_dom(dtc);
 	unsigned long thresh = dtc->thresh;
 	u64 wb_thresh;
-	long numerator, denominator;
+	unsigned long numerator, denominator;
 	unsigned long wb_min_ratio, wb_max_ratio;
 
 	/*
@@ -777,7 +777,7 @@ static unsigned long __wb_calc_thresh(st
 
 	wb_thresh = (thresh * (100 - bdi_min_ratio)) / 100;
 	wb_thresh *= numerator;
-	do_div(wb_thresh, denominator);
+	wb_thresh = div64_ul(wb_thresh, denominator);
 
 	wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio);
 
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 09/11] mm/page-writeback.c: improve arithmetic divisions
  2020-01-14  0:28 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2020-01-14  0:29 ` [patch 08/11] mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  0:29 ` [patch 10/11] mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid Andrew Morton
  2020-01-14  0:29 ` [patch 11/11] mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE Andrew Morton
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: tj, cai, axboe, wenyang, akpm, linux-mm, mm-commits, torvalds

From: Wen Yang <wenyang@linux.alibaba.com>
Subject: mm/page-writeback.c: improve arithmetic divisions

Use div64_ul() instead of do_div() if the divisor is unsigned long, to
avoid truncation to 32-bit on 64-bit platforms.

Link: http://lkml.kernel.org/r/20200102081442.8273-4-wenyang@linux.alibaba.com
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page-writeback.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/page-writeback.c~mm-page-writebackc-improve-arithmetic-divisions
+++ a/mm/page-writeback.c
@@ -1102,7 +1102,7 @@ static void wb_update_write_bandwidth(st
 	bw = written - min(written, wb->written_stamp);
 	bw *= HZ;
 	if (unlikely(elapsed > period)) {
-		do_div(bw, elapsed);
+		bw = div64_ul(bw, elapsed);
 		avg = bw;
 		goto out;
 	}
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 10/11] mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid
  2020-01-14  0:28 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2020-01-14  0:29 ` [patch 09/11] mm/page-writeback.c: improve arithmetic divisions Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  2020-01-14  0:29 ` [patch 11/11] mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE Andrew Morton
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: stable, shakeelb, rientjes, penberg, mhocko, lixc17, jroedel,
	iamjoonsoo.kim, hannes, cl, ahuang12, akpm, linux-mm, mm-commits,
	torvalds

From: Adrian Huang <ahuang12@lenovo.com>
Subject: mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid

When booting with amd_iommu=off, the following WARNING message
appears:
  AMD-Vi: AMD IOMMU disabled on kernel command-line
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:2772 flush_workqueue+0x42e/0x450
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc3-amd-iommu #6
  Hardware name: Lenovo ThinkSystem SR655-2S/7D2WRCZ000, BIOS D8E101L-1.00 12/05/2019
  RIP: 0010:flush_workqueue+0x42e/0x450
  Code: ff 0f 0b e9 7a fd ff ff 4d 89 ef e9 33 fe ff ff 0f 0b e9 7f fd ff ff 0f 0b e9 bc fd ff ff 0f 0b e9 a8 fd ff ff e8 52 2c fe ff <0f> 0b 31 d2 48 c7 c6 e0 88 c5 95 48 c7 c7 d8 ad f0 95 e8 19 f5 04
  RSP: 0000:ffffffff96203d80 EFLAGS: 00010246
  RAX: ffffffff96203dc8 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: ffffffff96a63120 RSI: ffffffff95efcba2 RDI: ffffffff96203dc0
  RBP: ffffffff96203e08 R08: 0000000000000000 R09: ffffffff962a1828
  R10: 00000000f0000080 R11: dead000000000100 R12: ffff8d8a87c0a770
  R13: dead000000000100 R14: 0000000000000456 R15: ffffffff96203da0
  FS:  0000000000000000(0000) GS:ffff8d8dbd000000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff8d91cfbff000 CR3: 000000078920a000 CR4: 00000000000406b0
  Call Trace:
   ? wait_for_completion+0x51/0x180
   kmem_cache_destroy+0x69/0x260
   iommu_go_to_state+0x40c/0x5ab
   amd_iommu_prepare+0x16/0x2a
   irq_remapping_prepare+0x36/0x5f
   enable_IR_x2apic+0x21/0x172
   default_setup_apic_routing+0x12/0x6f
   apic_intr_mode_init+0x1a1/0x1f1
   x86_late_time_init+0x17/0x1c
   start_kernel+0x480/0x53f
   secondary_startup_64+0xb6/0xc0
  ---[ end trace 30894107c3749449 ]---
  x2apic: IRQ remapping doesn't support X2APIC mode
  x2apic disabled

The warning is caused by the calling of 'kmem_cache_destroy()'
in free_iommu_resources(). Here is the call path:
  free_iommu_resources
    kmem_cache_destroy
      flush_memcg_workqueue
        flush_workqueue

The root cause is that the IOMMU subsystem runs before the workqueue
subsystem, which the variable 'wq_online' is still 'false'.  This leads to
the statement 'if (WARN_ON(!wq_online))' in flush_workqueue() is 'true'.

Since the variable 'memcg_kmem_cache_wq' is not allocated during the time,
it is unnecessary to call flush_memcg_workqueue().  This prevents the
WARNING message triggered by flush_workqueue().

Link: http://lkml.kernel.org/r/20200103085503.1665-1-ahuang12@lenovo.com
Fixes: 92ee383f6daab ("mm: fix race between kmem_cache destroy, create and deactivate")
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reported-by: Xiaochun Lee <lixc17@lenovo.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slab_common.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/slab_common.c~mm-memcg-slab-call-flush_memcg_workqueue-only-if-memcg-workqueue-is-valid
+++ a/mm/slab_common.c
@@ -903,7 +903,8 @@ static void flush_memcg_workqueue(struct
 	 * deactivates the memcg kmem_caches through workqueue. Make sure all
 	 * previous workitems on workqueue are processed.
 	 */
-	flush_workqueue(memcg_kmem_cache_wq);
+	if (likely(memcg_kmem_cache_wq))
+		flush_workqueue(memcg_kmem_cache_wq);
 
 	/*
 	 * If we're racing with children kmem_cache deactivation, it might
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 11/11] mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE
  2020-01-14  0:28 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2020-01-14  0:29 ` [patch 10/11] mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid Andrew Morton
@ 2020-01-14  0:29 ` Andrew Morton
  10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2020-01-14  0:29 UTC (permalink / raw)
  To: songliubraving, kirill.shutemov, anshuman.khandual, yang.shi,
	akpm, linux-mm, mm-commits, torvalds

From: Yang Shi <yang.shi@linux.alibaba.com>
Subject: mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE

Commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem)
FS") introduced a new khugepaged scan result: SCAN_PAGE_HAS_PRIVATE, but
the corresponding description for trace events were not added.

Link: http://lkml.kernel.org/r/1574793844-2914-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/trace/events/huge_memory.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/include/trace/events/huge_memory.h~mm-khugepaged-add-trace-status-description-for-scan_page_has_private
+++ a/include/trace/events/huge_memory.h
@@ -31,7 +31,8 @@
 	EM( SCAN_ALLOC_HUGE_PAGE_FAIL,	"alloc_huge_page_failed")	\
 	EM( SCAN_CGROUP_CHARGE_FAIL,	"ccgroup_charge_failed")	\
 	EM( SCAN_EXCEED_SWAP_PTE,	"exceed_swap_pte")		\
-	EMe(SCAN_TRUNCATED,		"truncated")			\
+	EM( SCAN_TRUNCATED,		"truncated")			\
+	EMe(SCAN_PAGE_HAS_PRIVATE,	"page_has_private")		\
 
 #undef EM
 #undef EMe
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
  2020-01-14  0:29 ` [patch 01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations Andrew Morton
@ 2020-01-14  2:16   ` Linus Torvalds
  2020-01-14  8:05     ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2020-01-14  2:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Michal Hocko, Mel Gorman, Kirill A . Shutemov,
	Andrea Arcangeli, Vlastimil Babka, Linux-MM, mm-commits

On Mon, Jan 13, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> From: Vlastimil Babka <vbabka@suse.cz>
> Subject: mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations

I absolutely _detest_ how we've had this pattern of trying to change
the THP logic in a late -rc game.

Considering how unsuccessful some of the earlier attempts have been,
late -rc kernels really are *not* the time to make changes like this.

I'm going to take it this time, but

 (a) I'm going to revert *immediately* if anybody even peeps about the
new behavior

 (b) I want this to stop. Andrew - send THP "tweaks" during the merge
window, and not like some late-rc "fixes", when there are absolutely
_zero_ performance numbers or anything else critical associated with
the patch.

IOW, there is a decided lack of "bugfix" here. And if there actually
_were_ performance numbers, they should have been in the explanation.

So stop randomly messing with the THP allocation code, since it's
_known_ to be fragile with subtle side effects.

                    Linus


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
  2020-01-14  2:16   ` Linus Torvalds
@ 2020-01-14  8:05     ` Vlastimil Babka
  2020-01-14  8:46       ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2020-01-14  8:05 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: David Rientjes, Michal Hocko, Mel Gorman, Kirill A . Shutemov,
	Andrea Arcangeli, Linux-MM, mm-commits

On 1/14/20 3:16 AM, Linus Torvalds wrote:
> On Mon, Jan 13, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>> From: Vlastimil Babka <vbabka@suse.cz>
>> Subject: mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
> 
> I absolutely _detest_ how we've had this pattern of trying to change
> the THP logic in a late -rc game.
> 
> Considering how unsuccessful some of the earlier attempts have been,
> late -rc kernels really are *not* the time to make changes like this.

Agreed, myself would have been much more comfortable in rc1.

> I'm going to take it this time, but

Thanks.

>  (a) I'm going to revert *immediately* if anybody even peeps about the
> new behavior
> 
>  (b) I want this to stop. Andrew - send THP "tweaks" during the merge
> window, and not like some late-rc "fixes", when there are absolutely
> _zero_ performance numbers or anything else critical associated with
> the patch.
> 
> IOW, there is a decided lack of "bugfix" here. And if there actually
> _were_ performance numbers, they should have been in the explanation.

FWIW Mel did some tests [1] which showed some improvement to THP success rate
and no significant extra swapping that was feared. We didn't get confirmation
from Andrea about his real life workloads though.

[1] https://lore.kernel.org/linux-mm/20191113112042.GG28938@suse.de/

> So stop randomly messing with the THP allocation code, since it's
> _known_ to be fragile with subtle side effects.
> 
>                     Linus
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
  2020-01-14  8:05     ` Vlastimil Babka
@ 2020-01-14  8:46       ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2020-01-14  8:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linus Torvalds, Andrew Morton, David Rientjes, Mel Gorman,
	Kirill A . Shutemov, Andrea Arcangeli, Linux-MM, mm-commits

On Tue 14-01-20 09:05:57, Vlastimil Babka wrote:
> On 1/14/20 3:16 AM, Linus Torvalds wrote:
> > On Mon, Jan 13, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >>
> >> From: Vlastimil Babka <vbabka@suse.cz>
> >> Subject: mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
> > 
> > I absolutely _detest_ how we've had this pattern of trying to change
> > the THP logic in a late -rc game.
> > 
> > Considering how unsuccessful some of the earlier attempts have been,
> > late -rc kernels really are *not* the time to make changes like this.
> 
> Agreed, myself would have been much more comfortable in rc1.

Yeah I do agree. The patch is a result of a regression I have reported
back in early October (http://lkml.kernel.org/r/20191001083743.GC15624@dhcp22.suse.cz).
The workload is quite trivial albeit artificial which doesn't make it
super urgent. Still something to have addressed though.  It has been
mostly ignored for quite some time until Vlastimil came with the patch
which improved the situation while not causing any other obvious problems
(http://lkml.kernel.org/r/20191029151549.GO31513@dhcp22.suse.cz).
It has been sitting in the mmotm tree since Oct 29 and in linux-next
around a similar time without any reports. 5.5-rc1 (Dec 8) would have
been a good time to target but I understand that a more time in
linux-next wouldn't hurt and targeting 5.6-rc1 should be sufficient.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-01-14  8:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-14  0:28 incoming Andrew Morton
2020-01-14  0:29 ` [patch 01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations Andrew Morton
2020-01-14  2:16   ` Linus Torvalds
2020-01-14  8:05     ` Vlastimil Babka
2020-01-14  8:46       ` Michal Hocko
2020-01-14  0:29 ` [patch 02/11] mm/memory_hotplug: don't free usage map when removing a re-added early section Andrew Morton
2020-01-14  0:29 ` [patch 03/11] mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment Andrew Morton
2020-01-14  0:29 ` [patch 04/11] mm/shmem.c: thp, shmem: " Andrew Morton
2020-01-14  0:29 ` [patch 05/11] mm: memcg/slab: fix percpu slab vmstats flushing Andrew Morton
2020-01-14  0:29 ` [patch 06/11] mm, debug_pagealloc: don't rely on static keys too early Andrew Morton
2020-01-14  0:29 ` [patch 07/11] mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() Andrew Morton
2020-01-14  0:29 ` [patch 08/11] mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide Andrew Morton
2020-01-14  0:29 ` [patch 09/11] mm/page-writeback.c: improve arithmetic divisions Andrew Morton
2020-01-14  0:29 ` [patch 10/11] mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid Andrew Morton
2020-01-14  0:29 ` [patch 11/11] mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).