mm-commits Archive on lore.kernel.org
 help / color / Atom feed
* incoming
@ 2020-08-15  0:29 Andrew Morton
  2020-08-15  0:30 ` [patch 01/39] asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one() Andrew Morton
                   ` (139 more replies)
  0 siblings, 140 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


39 patches, based on b923f1247b72fc100b87792fd2129d026bb10e66.

Subsystems affected by this patch series:

  mm/hotfixes
  lz4
  exec
  mailmap
  mm/thp
  autofs
  mm/madvise
  sysctl
  mm/kmemleak
  mm/misc
  lib

Subsystem: mm/hotfixes

    Mike Rapoport <rppt@linux.ibm.com>:
      asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()

    Baoquan He <bhe@redhat.com>:
      Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"

Subsystem: lz4

    Nick Terrell <terrelln@fb.com>:
      lz4: fix kernel decompression speed

Subsystem: exec

    Kees Cook <keescook@chromium.org>:
    Patch series "Fix S_ISDIR execve() errno":
      exec: restore EACCES of S_ISDIR execve()
      selftests/exec: add file type errno tests

Subsystem: mailmap

    Greg Kurz <groug@kaod.org>:
      mailmap: add entry for Greg Kurz

Subsystem: mm/thp

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "THP prep patches":
      mm: store compound_nr as well as compound_order
      mm: move page-flags include to top of file
      mm: add thp_order
      mm: add thp_size
      mm: replace hpage_nr_pages with thp_nr_pages
      mm: add thp_head
      mm: introduce offset_in_thp

Subsystem: autofs

    Randy Dunlap <rdunlap@infradead.org>:
      fs: autofs: delete repeated words in comments

Subsystem: mm/madvise

    Minchan Kim <minchan@kernel.org>:
    Patch series "introduce memory hinting API for external process", v8:
      mm/madvise: pass task and mm to do_madvise
      pid: move pidfd_get_pid() to pid.c
      mm/madvise: introduce process_madvise() syscall: an external memory hinting API
      mm/madvise: check fatal signal pending of target process

Subsystem: sysctl

    Xiaoming Ni <nixiaoming@huawei.com>:
      all arch: remove system call sys_sysctl

Subsystem: mm/kmemleak

    Qian Cai <cai@lca.pw>:
      mm/kmemleak: silence KCSAN splats in checksum

Subsystem: mm/misc

    Qian Cai <cai@lca.pw>:
      mm/frontswap: mark various intentional data races
      mm/page_io: mark various intentional data races
      mm/swap_state: mark various intentional data races

    Kirill A. Shutemov <kirill@shutemov.name>:
      mm/filemap.c: fix a data race in filemap_fault()

    Qian Cai <cai@lca.pw>:
      mm/swapfile: fix and annotate various data races
      mm/page_counter: fix various data races at memsw
      mm/memcontrol: fix a data race in scan count
      mm/list_lru: fix a data race in list_lru_count_one
      mm/mempool: fix a data race in mempool_free()
      mm/rmap: annotate a data race at tlb_flush_batched
      mm/swap.c: annotate data races for lru_rotate_pvecs
      mm: annotate a data race in page_zonenum()

    Romain Naour <romain.naour@gmail.com>:
      include/asm-generic/vmlinux.lds.h: align ro_after_init

    Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:
      sh: clkfwk: remove r8/r16/r32
      sh: use generic strncpy()

Subsystem: lib

    Krzysztof Kozlowski <krzk@kernel.org>:
    Patch series "iomap: Constify ioreadX() iomem argument", v3:
      iomap: constify ioreadX() iomem argument (as in generic implementation)
      rtl818x: constify ioreadX() iomem argument (as in generic implementation)
      ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
      virtio: pci: constify ioreadX() iomem argument (as in generic implementation)

 .mailmap                                               |    1 
 arch/alpha/include/asm/core_apecs.h                    |    6 
 arch/alpha/include/asm/core_cia.h                      |    6 
 arch/alpha/include/asm/core_lca.h                      |    6 
 arch/alpha/include/asm/core_marvel.h                   |    4 
 arch/alpha/include/asm/core_mcpcia.h                   |    6 
 arch/alpha/include/asm/core_t2.h                       |    2 
 arch/alpha/include/asm/io.h                            |   12 -
 arch/alpha/include/asm/io_trivial.h                    |   16 -
 arch/alpha/include/asm/jensen.h                        |    2 
 arch/alpha/include/asm/machvec.h                       |    6 
 arch/alpha/kernel/core_marvel.c                        |    2 
 arch/alpha/kernel/io.c                                 |   12 -
 arch/alpha/kernel/syscalls/syscall.tbl                 |    3 
 arch/arm/configs/am200epdkit_defconfig                 |    1 
 arch/arm/tools/syscall.tbl                             |    3 
 arch/arm64/include/asm/unistd.h                        |    2 
 arch/arm64/include/asm/unistd32.h                      |    6 
 arch/ia64/kernel/syscalls/syscall.tbl                  |    3 
 arch/m68k/kernel/syscalls/syscall.tbl                  |    3 
 arch/microblaze/kernel/syscalls/syscall.tbl            |    3 
 arch/mips/configs/cu1000-neo_defconfig                 |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl              |    3 
 arch/mips/kernel/syscalls/syscall_n64.tbl              |    3 
 arch/mips/kernel/syscalls/syscall_o32.tbl              |    3 
 arch/parisc/include/asm/io.h                           |    4 
 arch/parisc/kernel/syscalls/syscall.tbl                |    3 
 arch/parisc/lib/iomap.c                                |   72 +++---
 arch/powerpc/kernel/iomap.c                            |   28 +-
 arch/powerpc/kernel/syscalls/syscall.tbl               |    3 
 arch/s390/kernel/syscalls/syscall.tbl                  |    3 
 arch/sh/configs/dreamcast_defconfig                    |    1 
 arch/sh/configs/espt_defconfig                         |    1 
 arch/sh/configs/hp6xx_defconfig                        |    1 
 arch/sh/configs/landisk_defconfig                      |    1 
 arch/sh/configs/lboxre2_defconfig                      |    1 
 arch/sh/configs/microdev_defconfig                     |    1 
 arch/sh/configs/migor_defconfig                        |    1 
 arch/sh/configs/r7780mp_defconfig                      |    1 
 arch/sh/configs/r7785rp_defconfig                      |    1 
 arch/sh/configs/rts7751r2d1_defconfig                  |    1 
 arch/sh/configs/rts7751r2dplus_defconfig               |    1 
 arch/sh/configs/se7206_defconfig                       |    1 
 arch/sh/configs/se7343_defconfig                       |    1 
 arch/sh/configs/se7619_defconfig                       |    1 
 arch/sh/configs/se7705_defconfig                       |    1 
 arch/sh/configs/se7750_defconfig                       |    1 
 arch/sh/configs/se7751_defconfig                       |    1 
 arch/sh/configs/secureedge5410_defconfig               |    1 
 arch/sh/configs/sh03_defconfig                         |    1 
 arch/sh/configs/sh7710voipgw_defconfig                 |    1 
 arch/sh/configs/sh7757lcr_defconfig                    |    1 
 arch/sh/configs/sh7763rdp_defconfig                    |    1 
 arch/sh/configs/shmin_defconfig                        |    1 
 arch/sh/configs/titan_defconfig                        |    1 
 arch/sh/include/asm/string_32.h                        |   26 --
 arch/sh/kernel/iomap.c                                 |   22 -
 arch/sh/kernel/syscalls/syscall.tbl                    |    3 
 arch/sparc/kernel/syscalls/syscall.tbl                 |    3 
 arch/x86/entry/syscalls/syscall_32.tbl                 |    3 
 arch/x86/entry/syscalls/syscall_64.tbl                 |    4 
 arch/xtensa/kernel/syscalls/syscall.tbl                |    3 
 drivers/mailbox/bcm-pdc-mailbox.c                      |    2 
 drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h |    6 
 drivers/ntb/hw/intel/ntb_hw_gen1.c                     |    2 
 drivers/ntb/hw/intel/ntb_hw_gen3.h                     |    2 
 drivers/ntb/hw/intel/ntb_hw_intel.h                    |    2 
 drivers/nvdimm/btt.c                                   |    4 
 drivers/nvdimm/pmem.c                                  |    6 
 drivers/sh/clk/cpg.c                                   |   25 --
 drivers/virtio/virtio_pci_modern.c                     |    6 
 fs/autofs/dev-ioctl.c                                  |    4 
 fs/io_uring.c                                          |    2 
 fs/namei.c                                             |    4 
 include/asm-generic/iomap.h                            |   28 +-
 include/asm-generic/pgalloc.h                          |    2 
 include/asm-generic/vmlinux.lds.h                      |    1 
 include/linux/compat.h                                 |    5 
 include/linux/huge_mm.h                                |   58 ++++-
 include/linux/io-64-nonatomic-hi-lo.h                  |    4 
 include/linux/io-64-nonatomic-lo-hi.h                  |    4 
 include/linux/memcontrol.h                             |    2 
 include/linux/mm.h                                     |   16 -
 include/linux/mm_inline.h                              |    6 
 include/linux/mm_types.h                               |    1 
 include/linux/pagemap.h                                |    6 
 include/linux/pid.h                                    |    1 
 include/linux/syscalls.h                               |    4 
 include/linux/sysctl.h                                 |    6 
 include/uapi/asm-generic/unistd.h                      |    4 
 kernel/Makefile                                        |    2 
 kernel/exit.c                                          |   17 -
 kernel/pid.c                                           |   17 +
 kernel/sys_ni.c                                        |    3 
 kernel/sysctl_binary.c                                 |  171 --------------
 lib/iomap.c                                            |   30 +-
 lib/lz4/lz4_compress.c                                 |    4 
 lib/lz4/lz4_decompress.c                               |   18 -
 lib/lz4/lz4defs.h                                      |   10 
 lib/lz4/lz4hc_compress.c                               |    2 
 mm/compaction.c                                        |    2 
 mm/filemap.c                                           |   22 +
 mm/frontswap.c                                         |    8 
 mm/gup.c                                               |    2 
 mm/internal.h                                          |    4 
 mm/kmemleak.c                                          |    2 
 mm/list_lru.c                                          |    2 
 mm/madvise.c                                           |  190 ++++++++++++++--
 mm/memcontrol.c                                        |   10 
 mm/memory.c                                            |    4 
 mm/memory_hotplug.c                                    |    7 
 mm/mempolicy.c                                         |    2 
 mm/mempool.c                                           |    2 
 mm/migrate.c                                           |   18 -
 mm/mlock.c                                             |    9 
 mm/page_alloc.c                                        |    5 
 mm/page_counter.c                                      |   13 -
 mm/page_io.c                                           |   12 -
 mm/page_vma_mapped.c                                   |    6 
 mm/rmap.c                                              |   10 
 mm/swap.c                                              |   21 -
 mm/swap_state.c                                        |   10 
 mm/swapfile.c                                          |   33 +-
 mm/vmscan.c                                            |    6 
 mm/vmstat.c                                            |   12 -
 mm/workingset.c                                        |    6 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl     |    2 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl        |    2 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl      |    2 
 tools/testing/selftests/exec/.gitignore                |    1 
 tools/testing/selftests/exec/Makefile                  |    5 
 tools/testing/selftests/exec/non-regular.c             |  196 +++++++++++++++++
 132 files changed, 815 insertions(+), 614 deletions(-)


^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 01/39] asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()
  2020-08-15  0:29 incoming Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 02/39] Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone" Andrew Morton
                   ` (138 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, linux-mm, lkp, mm-commits, rppt, torvalds

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()

The #ifdef statement that guards the generic version of pud_alloc_one() by
mistake used __HAVE_ARCH_PUD_FREE instead of __HAVE_ARCH_PUD_ALLOC_ONE.

Fix it.

Link: http://lkml.kernel.org/r/20200812191415.GE163101@linux.ibm.com
Fixes: d9e8b929670b ("asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/asm-generic/pgalloc.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/asm-generic/pgalloc.h~asm-generic-pgalloch-use-correct-ifdef-to-enable-pud_alloc_one
+++ a/include/asm-generic/pgalloc.h
@@ -147,7 +147,7 @@ static inline void pmd_free(struct mm_st
 
 #if CONFIG_PGTABLE_LEVELS > 3
 
-#ifndef __HAVE_ARCH_PUD_FREE
+#ifndef __HAVE_ARCH_PUD_ALLOC_ONE
 /**
  * pud_alloc_one - allocate a page for PUD-level page table
  * @mm: the mm_struct of the current context
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 02/39] Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"
  2020-08-15  0:29 incoming Andrew Morton
  2020-08-15  0:30 ` [patch 01/39] asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one() Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 03/39] lz4: fix kernel decompression speed Andrew Morton
                   ` (137 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, bhe, david, linux-mm, mm-commits, rientjes, sonnyrao,
	stable, torvalds

From: Baoquan He <bhe@redhat.com>
Subject: Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"

This reverts commit 26e7deadaae175.

Sonny reported that one of their tests started failing on the latest
kernel on their Chrome OS platform.  The root cause is that the above
commit removed the protection line of empty zone, while the parser used in
the test relies on the protection line to mark the end of each zone.

Let's revert it to avoid breaking userspace testing or applications.

Link: http://lkml.kernel.org/r/20200811075412.12872-1-bhe@redhat.com
Fixes: 26e7deadaae175 ("mm/vmstat.c: do not show lowmem reserve protection information of empty zone)"
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[5.8.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmstat.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/mm/vmstat.c~revert-mm-vmstatc-do-not-show-lowmem-reserve-protection-information-of-empty-zone
+++ a/mm/vmstat.c
@@ -1642,12 +1642,6 @@ static void zoneinfo_show_print(struct s
 		   zone->present_pages,
 		   zone_managed_pages(zone));
 
-	/* If unpopulated, no other information is useful */
-	if (!populated_zone(zone)) {
-		seq_putc(m, '\n');
-		return;
-	}
-
 	seq_printf(m,
 		   "\n        protection: (%ld",
 		   zone->lowmem_reserve[0]);
@@ -1655,6 +1649,12 @@ static void zoneinfo_show_print(struct s
 		seq_printf(m, ", %ld", zone->lowmem_reserve[i]);
 	seq_putc(m, ')');
 
+	/* If unpopulated, no other information is useful */
+	if (!populated_zone(zone)) {
+		seq_putc(m, '\n');
+		return;
+	}
+
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		seq_printf(m, "\n      %-12s %lu", zone_stat_name(i),
 			   zone_page_state(zone, i));
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 03/39] lz4: fix kernel decompression speed
  2020-08-15  0:29 incoming Andrew Morton
  2020-08-15  0:30 ` [patch 01/39] asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one() Andrew Morton
  2020-08-15  0:30 ` [patch 02/39] Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone" Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 04/39] exec: restore EACCES of S_ISDIR execve() Andrew Morton
                   ` (136 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: 4sschmid, akpm, gaoxiang25, gregkh, linux-mm, mingo, mm-commits,
	nivedita, terrelln, torvalds, yann.collet.73

From: Nick Terrell <terrelln@fb.com>
Subject: lz4: fix kernel decompression speed

This patch replaces all memcpy() calls with LZ4_memcpy() which calls
__builtin_memcpy() so the compiler can inline it.

LZ4 relies heavily on memcpy() with a constant size being inlined.  In x86
and i386 pre-boot environments memcpy() cannot be inlined because memcpy()
doesn't get defined as __builtin_memcpy().

An equivalent patch has been applied upstream so that the next import
won't lose this change [1].

I've measured the kernel decompression speed using QEMU before and after
this patch for the x86_64 and i386 architectures.  The speed-up is about
10x as shown below.

Code	Arch	Kernel Size	Time	Speed
v5.8	x86_64	11504832 B	148 ms	 79 MB/s
patch	x86_64	11503872 B	 13 ms	885 MB/s
v5.8	i386	 9621216 B	 91 ms	106 MB/s
patch	i386	 9620224 B	 10 ms	962 MB/s

I also measured the time to decompress the initramfs on x86_64, i386, and
arm.  All three show the same decompression speed before and after, as
expected.

[1] https://github.com/lz4/lz4/pull/890

Link: http://lkml.kernel.org/r/20200803194022.2966806-1-nickrterrell@gmail.com
Signed-off-by: Nick Terrell <terrelln@fb.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/lz4/lz4_compress.c   |    4 ++--
 lib/lz4/lz4_decompress.c |   18 +++++++++---------
 lib/lz4/lz4defs.h        |   10 ++++++++++
 lib/lz4/lz4hc_compress.c |    2 +-
 4 files changed, 22 insertions(+), 12 deletions(-)

--- a/lib/lz4/lz4_compress.c~lz4-fix-kernel-decompression-speed
+++ a/lib/lz4/lz4_compress.c
@@ -446,7 +446,7 @@ _last_literals:
 			*op++ = (BYTE)(lastRun << ML_BITS);
 		}
 
-		memcpy(op, anchor, lastRun);
+		LZ4_memcpy(op, anchor, lastRun);
 
 		op += lastRun;
 	}
@@ -708,7 +708,7 @@ _last_literals:
 		} else {
 			*op++ = (BYTE)(lastRunSize<<ML_BITS);
 		}
-		memcpy(op, anchor, lastRunSize);
+		LZ4_memcpy(op, anchor, lastRunSize);
 		op += lastRunSize;
 	}
 
--- a/lib/lz4/lz4_decompress.c~lz4-fix-kernel-decompression-speed
+++ a/lib/lz4/lz4_decompress.c
@@ -153,7 +153,7 @@ static FORCE_INLINE int LZ4_decompress_g
 		   && likely((endOnInput ? ip < shortiend : 1) &
 			     (op <= shortoend))) {
 			/* Copy the literals */
-			memcpy(op, ip, endOnInput ? 16 : 8);
+			LZ4_memcpy(op, ip, endOnInput ? 16 : 8);
 			op += length; ip += length;
 
 			/*
@@ -172,9 +172,9 @@ static FORCE_INLINE int LZ4_decompress_g
 			    (offset >= 8) &&
 			    (dict == withPrefix64k || match >= lowPrefix)) {
 				/* Copy the match. */
-				memcpy(op + 0, match + 0, 8);
-				memcpy(op + 8, match + 8, 8);
-				memcpy(op + 16, match + 16, 2);
+				LZ4_memcpy(op + 0, match + 0, 8);
+				LZ4_memcpy(op + 8, match + 8, 8);
+				LZ4_memcpy(op + 16, match + 16, 2);
 				op += length + MINMATCH;
 				/* Both stages worked, load the next token. */
 				continue;
@@ -263,7 +263,7 @@ static FORCE_INLINE int LZ4_decompress_g
 				}
 			}
 
-			memcpy(op, ip, length);
+			LZ4_memcpy(op, ip, length);
 			ip += length;
 			op += length;
 
@@ -350,7 +350,7 @@ _copy_match:
 				size_t const copySize = (size_t)(lowPrefix - match);
 				size_t const restSize = length - copySize;
 
-				memcpy(op, dictEnd - copySize, copySize);
+				LZ4_memcpy(op, dictEnd - copySize, copySize);
 				op += copySize;
 				if (restSize > (size_t)(op - lowPrefix)) {
 					/* overlap copy */
@@ -360,7 +360,7 @@ _copy_match:
 					while (op < endOfMatch)
 						*op++ = *copyFrom++;
 				} else {
-					memcpy(op, lowPrefix, restSize);
+					LZ4_memcpy(op, lowPrefix, restSize);
 					op += restSize;
 				}
 			}
@@ -386,7 +386,7 @@ _copy_match:
 				while (op < copyEnd)
 					*op++ = *match++;
 			} else {
-				memcpy(op, match, mlen);
+				LZ4_memcpy(op, match, mlen);
 			}
 			op = copyEnd;
 			if (op == oend)
@@ -400,7 +400,7 @@ _copy_match:
 			op[2] = match[2];
 			op[3] = match[3];
 			match += inc32table[offset];
-			memcpy(op + 4, match, 4);
+			LZ4_memcpy(op + 4, match, 4);
 			match -= dec64table[offset];
 		} else {
 			LZ4_copy8(op, match);
--- a/lib/lz4/lz4defs.h~lz4-fix-kernel-decompression-speed
+++ a/lib/lz4/lz4defs.h
@@ -137,6 +137,16 @@ static FORCE_INLINE void LZ4_writeLE16(v
 	return put_unaligned_le16(value, memPtr);
 }
 
+/*
+ * LZ4 relies on memcpy with a constant size being inlined. In freestanding
+ * environments, the compiler can't assume the implementation of memcpy() is
+ * standard compliant, so apply its specialized memcpy() inlining logic. When
+ * possible, use __builtin_memcpy() to tell the compiler to analyze memcpy()
+ * as-if it were standard compliant, so it can inline it in freestanding
+ * environments. This is needed when decompressing the Linux Kernel, for example.
+ */
+#define LZ4_memcpy(dst, src, size) __builtin_memcpy(dst, src, size)
+
 static FORCE_INLINE void LZ4_copy8(void *dst, const void *src)
 {
 #if LZ4_ARCH64
--- a/lib/lz4/lz4hc_compress.c~lz4-fix-kernel-decompression-speed
+++ a/lib/lz4/lz4hc_compress.c
@@ -570,7 +570,7 @@ _Search3:
 			*op++ = (BYTE) lastRun;
 		} else
 			*op++ = (BYTE)(lastRun<<ML_BITS);
-		memcpy(op, anchor, iend - anchor);
+		LZ4_memcpy(op, anchor, iend - anchor);
 		op += iend - anchor;
 	}
 
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 04/39] exec: restore EACCES of S_ISDIR execve()
  2020-08-15  0:29 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2020-08-15  0:30 ` [patch 03/39] lz4: fix kernel decompression speed Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 05/39] selftests/exec: add file type errno tests Andrew Morton
                   ` (135 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, gregkh, gregkh, keescook, linux-mm, maz, mm-commits, torvalds

From: Kees Cook <keescook@chromium.org>
Subject: exec: restore EACCES of S_ISDIR execve()

Patch series "Fix S_ISDIR execve() errno".

Fix an errno change for execve() of directories, noticed by Marc Zyngier. 
Along with the fix, include a regression test to avoid seeing this return
in the future.


This patch (of 2):

The return code for attempting to execute a directory has always been
EACCES.  Adjust the S_ISDIR exec test to reflect the old errno instead of
the general EISDIR for other kinds of "open" attempts on directories.

Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
Link: https://lore.kernel.org/lkml/20200813151305.6191993b@why
Fixes: 633fb6ac3980 ("exec: move S_ISREG() check earlier")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@google.com>
Tested-by: Greg Kroah-Hartman <gregkh@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/namei.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/fs/namei.c~exec-restore-eacces-of-s_isdir-execve
+++ a/fs/namei.c
@@ -2849,8 +2849,10 @@ static int may_open(const struct path *p
 	case S_IFLNK:
 		return -ELOOP;
 	case S_IFDIR:
-		if (acc_mode & (MAY_WRITE | MAY_EXEC))
+		if (acc_mode & MAY_WRITE)
 			return -EISDIR;
+		if (acc_mode & MAY_EXEC)
+			return -EACCES;
 		break;
 	case S_IFBLK:
 	case S_IFCHR:
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 05/39] selftests/exec: add file type errno tests
  2020-08-15  0:29 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2020-08-15  0:30 ` [patch 04/39] exec: restore EACCES of S_ISDIR execve() Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 06/39] mailmap: add entry for Greg Kurz Andrew Morton
                   ` (134 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, keescook, linux-mm, maz, mm-commits, torvalds

From: Kees Cook <keescook@chromium.org>
Subject: selftests/exec: add file type errno tests

Make sure execve() returns the expected errno values for non-regular
files.

Link: http://lkml.kernel.org/r/20200813231723.2725102-3-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/exec/.gitignore    |    1 
 tools/testing/selftests/exec/Makefile      |    5 
 tools/testing/selftests/exec/non-regular.c |  196 +++++++++++++++++++
 3 files changed, 200 insertions(+), 2 deletions(-)

--- a/tools/testing/selftests/exec/.gitignore~selftests-exec-add-file-type-errno-tests
+++ a/tools/testing/selftests/exec/.gitignore
@@ -10,3 +10,4 @@ execveat.denatured
 /recursion-depth
 xxxxxxxx*
 pipe
+S_I*.test
--- a/tools/testing/selftests/exec/Makefile~selftests-exec-add-file-type-errno-tests
+++ a/tools/testing/selftests/exec/Makefile
@@ -3,7 +3,7 @@ CFLAGS = -Wall
 CFLAGS += -Wno-nonnull
 CFLAGS += -D_GNU_SOURCE
 
-TEST_PROGS := binfmt_script
+TEST_PROGS := binfmt_script non-regular
 TEST_GEN_PROGS := execveat
 TEST_GEN_FILES := execveat.symlink execveat.denatured script subdir pipe
 # Makefile is a run-time dependency, since it's accessed by the execveat test
@@ -11,7 +11,8 @@ TEST_FILES := Makefile
 
 TEST_GEN_PROGS += recursion-depth
 
-EXTRA_CLEAN := $(OUTPUT)/subdir.moved $(OUTPUT)/execveat.moved $(OUTPUT)/xxxxx*
+EXTRA_CLEAN := $(OUTPUT)/subdir.moved $(OUTPUT)/execveat.moved $(OUTPUT)/xxxxx*	\
+	       $(OUTPUT)/S_I*.test
 
 include ../lib.mk
 
--- /dev/null
+++ a/tools/testing/selftests/exec/non-regular.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/sysmacros.h>
+#include <sys/types.h>
+
+#include "../kselftest_harness.h"
+
+/* Remove a file, ignoring the result if it didn't exist. */
+void rm(struct __test_metadata *_metadata, const char *pathname,
+	int is_dir)
+{
+	int rc;
+
+	if (is_dir)
+		rc = rmdir(pathname);
+	else
+		rc = unlink(pathname);
+
+	if (rc < 0) {
+		ASSERT_EQ(errno, ENOENT) {
+			TH_LOG("Not ENOENT: %s", pathname);
+		}
+	} else {
+		ASSERT_EQ(rc, 0) {
+			TH_LOG("Failed to remove: %s", pathname);
+		}
+	}
+}
+
+FIXTURE(file) {
+	char *pathname;
+	int is_dir;
+};
+
+FIXTURE_VARIANT(file)
+{
+	const char *name;
+	int expected;
+	int is_dir;
+	void (*setup)(struct __test_metadata *_metadata,
+		      FIXTURE_DATA(file) *self,
+		      const FIXTURE_VARIANT(file) *variant);
+	int major, minor, mode; /* for mknod() */
+};
+
+void setup_link(struct __test_metadata *_metadata,
+		FIXTURE_DATA(file) *self,
+		const FIXTURE_VARIANT(file) *variant)
+{
+	const char * const paths[] = {
+		"/bin/true",
+		"/usr/bin/true",
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(paths); i++) {
+		if (access(paths[i], X_OK) == 0) {
+			ASSERT_EQ(symlink(paths[i], self->pathname), 0);
+			return;
+		}
+	}
+	ASSERT_EQ(1, 0) {
+		TH_LOG("Could not find viable 'true' binary");
+	}
+}
+
+FIXTURE_VARIANT_ADD(file, S_IFLNK)
+{
+	.name = "S_IFLNK",
+	.expected = ELOOP,
+	.setup = setup_link,
+};
+
+void setup_dir(struct __test_metadata *_metadata,
+	       FIXTURE_DATA(file) *self,
+	       const FIXTURE_VARIANT(file) *variant)
+{
+	ASSERT_EQ(mkdir(self->pathname, 0755), 0);
+}
+
+FIXTURE_VARIANT_ADD(file, S_IFDIR)
+{
+	.name = "S_IFDIR",
+	.is_dir = 1,
+	.expected = EACCES,
+	.setup = setup_dir,
+};
+
+void setup_node(struct __test_metadata *_metadata,
+		FIXTURE_DATA(file) *self,
+		const FIXTURE_VARIANT(file) *variant)
+{
+	dev_t dev;
+	int rc;
+
+	dev = makedev(variant->major, variant->minor);
+	rc = mknod(self->pathname, 0755 | variant->mode, dev);
+	ASSERT_EQ(rc, 0) {
+		if (errno == EPERM)
+			SKIP(return, "Please run as root; cannot mknod(%s)",
+				variant->name);
+	}
+}
+
+FIXTURE_VARIANT_ADD(file, S_IFBLK)
+{
+	.name = "S_IFBLK",
+	.expected = EACCES,
+	.setup = setup_node,
+	/* /dev/loop0 */
+	.major = 7,
+	.minor = 0,
+	.mode = S_IFBLK,
+};
+
+FIXTURE_VARIANT_ADD(file, S_IFCHR)
+{
+	.name = "S_IFCHR",
+	.expected = EACCES,
+	.setup = setup_node,
+	/* /dev/zero */
+	.major = 1,
+	.minor = 5,
+	.mode = S_IFCHR,
+};
+
+void setup_fifo(struct __test_metadata *_metadata,
+		FIXTURE_DATA(file) *self,
+		const FIXTURE_VARIANT(file) *variant)
+{
+	ASSERT_EQ(mkfifo(self->pathname, 0755), 0);
+}
+
+FIXTURE_VARIANT_ADD(file, S_IFIFO)
+{
+	.name = "S_IFIFO",
+	.expected = EACCES,
+	.setup = setup_fifo,
+};
+
+FIXTURE_SETUP(file)
+{
+	ASSERT_GT(asprintf(&self->pathname, "%s.test", variant->name), 6);
+	self->is_dir = variant->is_dir;
+
+	rm(_metadata, self->pathname, variant->is_dir);
+	variant->setup(_metadata, self, variant);
+}
+
+FIXTURE_TEARDOWN(file)
+{
+	rm(_metadata, self->pathname, self->is_dir);
+}
+
+TEST_F(file, exec_errno)
+{
+	char * const argv[2] = { (char * const)self->pathname, NULL };
+
+	EXPECT_LT(execv(argv[0], argv), 0);
+	EXPECT_EQ(errno, variant->expected);
+}
+
+/* S_IFSOCK */
+FIXTURE(sock)
+{
+	int fd;
+};
+
+FIXTURE_SETUP(sock)
+{
+	self->fd = socket(AF_INET, SOCK_STREAM, 0);
+	ASSERT_GE(self->fd, 0);
+}
+
+FIXTURE_TEARDOWN(sock)
+{
+	if (self->fd >= 0)
+		ASSERT_EQ(close(self->fd), 0);
+}
+
+TEST_F(sock, exec_errno)
+{
+	char * const argv[2] = { " magic socket ", NULL };
+	char * const envp[1] = { NULL };
+
+	EXPECT_LT(fexecve(self->fd, argv, envp), 0);
+	EXPECT_EQ(errno, EACCES);
+}
+
+TEST_HARNESS_MAIN
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 06/39] mailmap: add entry for Greg Kurz
  2020-08-15  0:29 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2020-08-15  0:30 ` [patch 05/39] selftests/exec: add file type errno tests Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 07/39] mm: store compound_nr as well as compound_order Andrew Morton
                   ` (133 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, groug, linux-mm, mm-commits, torvalds

From: Greg Kurz <groug@kaod.org>
Subject: mailmap: add entry for Greg Kurz

I had stopped using gkurz@linux.vnet.ibm.com a while back already but this
email address was shutdown last June when I quit IBM.  It's about time to
map it to groug@kaod.org.

Link: http://lkml.kernel.org/r/159724692879.76040.4938578139173154028.stgit@bahia.lan
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .mailmap |    1 +
 1 file changed, 1 insertion(+)

--- a/.mailmap~mailmap-add-entry-for-greg-kurz
+++ a/.mailmap
@@ -104,6 +104,7 @@ Gerald Schaefer <gerald.schaefer@linux.i
 Greg Kroah-Hartman <greg@echidna.(none)>
 Greg Kroah-Hartman <gregkh@suse.de>
 Greg Kroah-Hartman <greg@kroah.com>
+Greg Kurz <groug@kaod.org> <gkurz@linux.vnet.ibm.com>
 Gregory CLEMENT <gregory.clement@bootlin.com> <gregory.clement@free-electrons.com>
 Hanjun Guo <guohanjun@huawei.com> <hanjun.guo@linaro.org>
 Heiko Carstens <hca@linux.ibm.com> <h.carstens@de.ibm.com>
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 07/39] mm: store compound_nr as well as compound_order
  2020-08-15  0:29 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2020-08-15  0:30 ` [patch 06/39] mailmap: add entry for Greg Kurz Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 08/39] mm: move page-flags include to top of file Andrew Morton
                   ` (132 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, david, kirill.shutemov, linux-mm, mike.kravetz, mm-commits,
	torvalds, william.kucharski, willy, ziy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: store compound_nr as well as compound_order

Patch series "THP prep patches".

These are some generic cleanups and improvements, which I would like
merged into mmotm soon.  The first one should be a performance improvement
for all users of compound pages, and the others are aimed at getting code
to compile away when CONFIG_TRANSPARENT_HUGEPAGE is disabled (ie small
systems).  Also better documented / less confusing than the current prefix
mixture of compound, hpage and thp.


This patch (of 7):

This removes a few instructions from functions which need to know how many
pages are in a compound page.  The storage used is either page->mapping on
64-bit or page->index on 32-bit.  Both of these are fine to overlay on
tail pages.

Link: http://lkml.kernel.org/r/20200629151959.15779-1-willy@infradead.org
Link: http://lkml.kernel.org/r/20200629151959.15779-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h       |    5 ++++-
 include/linux/mm_types.h |    1 +
 mm/page_alloc.c          |    5 +++--
 3 files changed, 8 insertions(+), 3 deletions(-)

--- a/include/linux/mm.h~mm-store-compound_nr-as-well-as-compound_order
+++ a/include/linux/mm.h
@@ -922,12 +922,15 @@ static inline int compound_pincount(stru
 static inline void set_compound_order(struct page *page, unsigned int order)
 {
 	page[1].compound_order = order;
+	page[1].compound_nr = 1U << order;
 }
 
 /* Returns the number of pages in this potentially compound page. */
 static inline unsigned long compound_nr(struct page *page)
 {
-	return 1UL << compound_order(page);
+	if (!PageHead(page))
+		return 1;
+	return page[1].compound_nr;
 }
 
 /* Returns the number of bytes in this potentially compound page. */
--- a/include/linux/mm_types.h~mm-store-compound_nr-as-well-as-compound_order
+++ a/include/linux/mm_types.h
@@ -134,6 +134,7 @@ struct page {
 			unsigned char compound_dtor;
 			unsigned char compound_order;
 			atomic_t compound_mapcount;
+			unsigned int compound_nr; /* 1 << compound_order */
 		};
 		struct {	/* Second tail page of compound page */
 			unsigned long _compound_pad_1;	/* compound_head */
--- a/mm/page_alloc.c~mm-store-compound_nr-as-well-as-compound_order
+++ a/mm/page_alloc.c
@@ -666,8 +666,6 @@ void prep_compound_page(struct page *pag
 	int i;
 	int nr_pages = 1 << order;
 
-	set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
-	set_compound_order(page, order);
 	__SetPageHead(page);
 	for (i = 1; i < nr_pages; i++) {
 		struct page *p = page + i;
@@ -675,6 +673,9 @@ void prep_compound_page(struct page *pag
 		p->mapping = TAIL_MAPPING;
 		set_compound_head(p, page);
 	}
+
+	set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
+	set_compound_order(page, order);
 	atomic_set(compound_mapcount_ptr(page), -1);
 	if (hpage_pincount_available(page))
 		atomic_set(compound_pincount_ptr(page), 0);
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 08/39] mm: move page-flags include to top of file
  2020-08-15  0:29 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2020-08-15  0:30 ` [patch 07/39] mm: store compound_nr as well as compound_order Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 09/39] mm: add thp_order Andrew Morton
                   ` (131 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, david, kirill.shutemov, linux-mm, mike.kravetz, mm-commits,
	torvalds, william.kucharski, willy, ziy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: move page-flags include to top of file

Give up on the notion that we can remove page-flags.h from mm.h.  There
are currently 14 inline functions which use a PageFoo function.  Also, two
of the files directly included by mm.h include page-flags.h themselves,
and there are probably more indirect inclusions.  So just include it at
the top like any other header file.

Link: http://lkml.kernel.org/r/20200629151959.15779-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

--- a/include/linux/mm.h~mm-move-page-flags-include-to-top-of-file
+++ a/include/linux/mm.h
@@ -24,6 +24,7 @@
 #include <linux/resource.h>
 #include <linux/page_ext.h>
 #include <linux/err.h>
+#include <linux/page-flags.h>
 #include <linux/page_ref.h>
 #include <linux/memremap.h>
 #include <linux/overflow.h>
@@ -668,11 +669,6 @@ int vma_is_stack_for_current(struct vm_a
 struct mmu_gather;
 struct inode;
 
-/*
- * FIXME: take this include out, include page-flags.h in
- * files which need it (119 of them)
- */
-#include <linux/page-flags.h>
 #include <linux/huge_mm.h>
 
 /*
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 09/39] mm: add thp_order
  2020-08-15  0:29 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2020-08-15  0:30 ` [patch 08/39] mm: move page-flags include to top of file Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 10/39] mm: add thp_size Andrew Morton
                   ` (130 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, david, kirill.shutemov, linux-mm, mike.kravetz, mm-commits,
	torvalds, william.kucharski, willy, ziy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: add thp_order

This function returns the order of a transparent huge page.  It compiles
to 0 if CONFIG_TRANSPARENT_HUGEPAGE is disabled.

Link: http://lkml.kernel.org/r/20200629151959.15779-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/huge_mm.h |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

--- a/include/linux/huge_mm.h~mm-add-thp_order
+++ a/include/linux/huge_mm.h
@@ -258,6 +258,19 @@ static inline spinlock_t *pud_trans_huge
 	else
 		return NULL;
 }
+
+/**
+ * thp_order - Order of a transparent huge page.
+ * @page: Head page of a transparent huge page.
+ */
+static inline unsigned int thp_order(struct page *page)
+{
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
+	if (PageHead(page))
+		return HPAGE_PMD_ORDER;
+	return 0;
+}
+
 static inline int hpage_nr_pages(struct page *page)
 {
 	if (unlikely(PageTransHuge(page)))
@@ -317,6 +330,12 @@ static inline struct list_head *page_def
 #define HPAGE_PUD_MASK ({ BUILD_BUG(); 0; })
 #define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; })
 
+static inline unsigned int thp_order(struct page *page)
+{
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
+	return 0;
+}
+
 static inline int hpage_nr_pages(struct page *page)
 {
 	VM_BUG_ON_PAGE(PageTail(page), page);
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 10/39] mm: add thp_size
  2020-08-15  0:29 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2020-08-15  0:30 ` [patch 09/39] mm: add thp_order Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 11/39] mm: replace hpage_nr_pages with thp_nr_pages Andrew Morton
                   ` (129 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, david, kirill.shutemov, linux-mm, mike.kravetz, mm-commits,
	torvalds, william.kucharski, willy, ziy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: add thp_size

This function returns the number of bytes in a THP.  It is like
page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE
is disabled.

Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/nvdimm/btt.c    |    4 +---
 drivers/nvdimm/pmem.c   |    6 ++----
 include/linux/huge_mm.h |   11 +++++++++++
 mm/internal.h           |    2 +-
 mm/page_io.c            |    2 +-
 mm/page_vma_mapped.c    |    4 ++--
 6 files changed, 18 insertions(+), 11 deletions(-)

--- a/drivers/nvdimm/btt.c~mm-add-thp_size
+++ a/drivers/nvdimm/btt.c
@@ -1490,10 +1490,8 @@ static int btt_rw_page(struct block_devi
 {
 	struct btt *btt = bdev->bd_disk->private_data;
 	int rc;
-	unsigned int len;
 
-	len = hpage_nr_pages(page) * PAGE_SIZE;
-	rc = btt_do_bvec(btt, NULL, page, len, 0, op, sector);
+	rc = btt_do_bvec(btt, NULL, page, thp_size(page), 0, op, sector);
 	if (rc == 0)
 		page_endio(page, op_is_write(op), 0);
 
--- a/drivers/nvdimm/pmem.c~mm-add-thp_size
+++ a/drivers/nvdimm/pmem.c
@@ -238,11 +238,9 @@ static int pmem_rw_page(struct block_dev
 	blk_status_t rc;
 
 	if (op_is_write(op))
-		rc = pmem_do_write(pmem, page, 0, sector,
-				   hpage_nr_pages(page) * PAGE_SIZE);
+		rc = pmem_do_write(pmem, page, 0, sector, thp_size(page));
 	else
-		rc = pmem_do_read(pmem, page, 0, sector,
-				   hpage_nr_pages(page) * PAGE_SIZE);
+		rc = pmem_do_read(pmem, page, 0, sector, thp_size(page));
 	/*
 	 * The ->rw_page interface is subtle and tricky.  The core
 	 * retries on any error, so we can only invoke page_endio() in
--- a/include/linux/huge_mm.h~mm-add-thp_size
+++ a/include/linux/huge_mm.h
@@ -462,4 +462,15 @@ static inline bool thp_migration_support
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+/**
+ * thp_size - Size of a transparent huge page.
+ * @page: Head page of a transparent huge page.
+ *
+ * Return: Number of bytes in this page.
+ */
+static inline unsigned long thp_size(struct page *page)
+{
+	return PAGE_SIZE << thp_order(page);
+}
+
 #endif /* _LINUX_HUGE_MM_H */
--- a/mm/internal.h~mm-add-thp_size
+++ a/mm/internal.h
@@ -396,7 +396,7 @@ vma_address(struct page *page, struct vm
 	unsigned long start, end;
 
 	start = __vma_address(page, vma);
-	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
+	end = start + thp_size(page) - PAGE_SIZE;
 
 	/* page should be within @vma mapping range */
 	VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma);
--- a/mm/page_io.c~mm-add-thp_size
+++ a/mm/page_io.c
@@ -40,7 +40,7 @@ static struct bio *get_swap_bio(gfp_t gf
 		bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9;
 		bio->bi_end_io = end_io;
 
-		bio_add_page(bio, page, PAGE_SIZE * hpage_nr_pages(page), 0);
+		bio_add_page(bio, page, thp_size(page), 0);
 	}
 	return bio;
 }
--- a/mm/page_vma_mapped.c~mm-add-thp_size
+++ a/mm/page_vma_mapped.c
@@ -227,7 +227,7 @@ next_pte:
 			if (pvmw->address >= pvmw->vma->vm_end ||
 			    pvmw->address >=
 					__vma_address(pvmw->page, pvmw->vma) +
-					hpage_nr_pages(pvmw->page) * PAGE_SIZE)
+					thp_size(pvmw->page))
 				return not_found(pvmw);
 			/* Did we cross page table boundary? */
 			if (pvmw->address % PMD_SIZE == 0) {
@@ -268,7 +268,7 @@ int page_mapped_in_vma(struct page *page
 	unsigned long start, end;
 
 	start = __vma_address(page, vma);
-	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
+	end = start + thp_size(page) - PAGE_SIZE;
 
 	if (unlikely(end < vma->vm_start || start >= vma->vm_end))
 		return 0;
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 11/39] mm: replace hpage_nr_pages with thp_nr_pages
  2020-08-15  0:29 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2020-08-15  0:30 ` [patch 10/39] mm: add thp_size Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 12/39] mm: add thp_head Andrew Morton
                   ` (128 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, david, kirill.shutemov, linux-mm, mike.kravetz, mm-commits,
	torvalds, william.kucharski, willy, ziy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: replace hpage_nr_pages with thp_nr_pages

The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.

[akpm@linux-foundation.org: fix mm/migrate.c]
Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/huge_mm.h   |   13 +++++++++----
 include/linux/mm_inline.h |    6 +++---
 include/linux/pagemap.h   |    6 +++---
 mm/compaction.c           |    2 +-
 mm/filemap.c              |    2 +-
 mm/gup.c                  |    2 +-
 mm/internal.h             |    2 +-
 mm/memcontrol.c           |   10 +++++-----
 mm/memory_hotplug.c       |    7 +++----
 mm/mempolicy.c            |    2 +-
 mm/migrate.c              |   18 +++++++++---------
 mm/mlock.c                |    9 ++++-----
 mm/page_io.c              |    2 +-
 mm/page_vma_mapped.c      |    2 +-
 mm/rmap.c                 |    8 ++++----
 mm/swap.c                 |   16 ++++++++--------
 mm/swap_state.c           |    6 +++---
 mm/swapfile.c             |    2 +-
 mm/vmscan.c               |    6 +++---
 mm/workingset.c           |    6 +++---
 20 files changed, 65 insertions(+), 62 deletions(-)

--- a/include/linux/huge_mm.h~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/include/linux/huge_mm.h
@@ -271,9 +271,14 @@ static inline unsigned int thp_order(str
 	return 0;
 }
 
-static inline int hpage_nr_pages(struct page *page)
+/**
+ * thp_nr_pages - The number of regular pages in this huge page.
+ * @page: The head page of a huge page.
+ */
+static inline int thp_nr_pages(struct page *page)
 {
-	if (unlikely(PageTransHuge(page)))
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
+	if (PageHead(page))
 		return HPAGE_PMD_NR;
 	return 1;
 }
@@ -336,9 +341,9 @@ static inline unsigned int thp_order(str
 	return 0;
 }
 
-static inline int hpage_nr_pages(struct page *page)
+static inline int thp_nr_pages(struct page *page)
 {
-	VM_BUG_ON_PAGE(PageTail(page), page);
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
 	return 1;
 }
 
--- a/include/linux/mm_inline.h~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/include/linux/mm_inline.h
@@ -48,14 +48,14 @@ static __always_inline void update_lru_s
 static __always_inline void add_page_to_lru_list(struct page *page,
 				struct lruvec *lruvec, enum lru_list lru)
 {
-	update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page));
+	update_lru_size(lruvec, lru, page_zonenum(page), thp_nr_pages(page));
 	list_add(&page->lru, &lruvec->lists[lru]);
 }
 
 static __always_inline void add_page_to_lru_list_tail(struct page *page,
 				struct lruvec *lruvec, enum lru_list lru)
 {
-	update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page));
+	update_lru_size(lruvec, lru, page_zonenum(page), thp_nr_pages(page));
 	list_add_tail(&page->lru, &lruvec->lists[lru]);
 }
 
@@ -63,7 +63,7 @@ static __always_inline void del_page_fro
 				struct lruvec *lruvec, enum lru_list lru)
 {
 	list_del(&page->lru);
-	update_lru_size(lruvec, lru, page_zonenum(page), -hpage_nr_pages(page));
+	update_lru_size(lruvec, lru, page_zonenum(page), -thp_nr_pages(page));
 }
 
 /**
--- a/include/linux/pagemap.h~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/include/linux/pagemap.h
@@ -381,7 +381,7 @@ static inline struct page *find_subpage(
 	if (PageHuge(head))
 		return head;
 
-	return head + (index & (hpage_nr_pages(head) - 1));
+	return head + (index & (thp_nr_pages(head) - 1));
 }
 
 struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
@@ -773,7 +773,7 @@ static inline struct page *readahead_pag
 
 	page = xa_load(&rac->mapping->i_pages, rac->_index);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	rac->_batch_count = hpage_nr_pages(page);
+	rac->_batch_count = thp_nr_pages(page);
 
 	return page;
 }
@@ -796,7 +796,7 @@ static inline unsigned int __readahead_b
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		VM_BUG_ON_PAGE(PageTail(page), page);
 		array[i++] = page;
-		rac->_batch_count += hpage_nr_pages(page);
+		rac->_batch_count += thp_nr_pages(page);
 
 		/*
 		 * The page cache isn't using multi-index entries yet,
--- a/mm/compaction.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/compaction.c
@@ -1009,7 +1009,7 @@ isolate_migratepages_block(struct compac
 		del_page_from_lru_list(page, lruvec, page_lru(page));
 		mod_node_page_state(page_pgdat(page),
 				NR_ISOLATED_ANON + page_is_file_lru(page),
-				hpage_nr_pages(page));
+				thp_nr_pages(page));
 
 isolate_success:
 		list_add(&page->lru, &cc->migratepages);
--- a/mm/filemap.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/filemap.c
@@ -198,7 +198,7 @@ static void unaccount_page_cache_page(st
 	if (PageHuge(page))
 		return;
 
-	nr = hpage_nr_pages(page);
+	nr = thp_nr_pages(page);
 
 	__mod_lruvec_page_state(page, NR_FILE_PAGES, -nr);
 	if (PageSwapBacked(page)) {
--- a/mm/gup.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/gup.c
@@ -1637,7 +1637,7 @@ check_again:
 					mod_node_page_state(page_pgdat(head),
 							    NR_ISOLATED_ANON +
 							    page_is_file_lru(head),
-							    hpage_nr_pages(head));
+							    thp_nr_pages(head));
 				}
 			}
 		}
--- a/mm/internal.h~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/internal.h
@@ -369,7 +369,7 @@ extern void clear_page_mlock(struct page
 static inline void mlock_migrate_page(struct page *newpage, struct page *page)
 {
 	if (TestClearPageMlocked(page)) {
-		int nr_pages = hpage_nr_pages(page);
+		int nr_pages = thp_nr_pages(page);
 
 		/* Holding pmd lock, no change in irq context: __mod is safe */
 		__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
--- a/mm/memcontrol.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/memcontrol.c
@@ -5589,7 +5589,7 @@ static int mem_cgroup_move_account(struc
 {
 	struct lruvec *from_vec, *to_vec;
 	struct pglist_data *pgdat;
-	unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
+	unsigned int nr_pages = compound ? thp_nr_pages(page) : 1;
 	int ret;
 
 	VM_BUG_ON(from == to);
@@ -6682,7 +6682,7 @@ void mem_cgroup_calculate_protection(str
  */
 int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 {
-	unsigned int nr_pages = hpage_nr_pages(page);
+	unsigned int nr_pages = thp_nr_pages(page);
 	struct mem_cgroup *memcg = NULL;
 	int ret = 0;
 
@@ -6912,7 +6912,7 @@ void mem_cgroup_migrate(struct page *old
 		return;
 
 	/* Force-charge the new page. The old one will be freed soon */
-	nr_pages = hpage_nr_pages(newpage);
+	nr_pages = thp_nr_pages(newpage);
 
 	page_counter_charge(&memcg->memory, nr_pages);
 	if (do_memsw_account())
@@ -7114,7 +7114,7 @@ void mem_cgroup_swapout(struct page *pag
 	 * ancestor for the swap instead and transfer the memory+swap charge.
 	 */
 	swap_memcg = mem_cgroup_id_get_online(memcg);
-	nr_entries = hpage_nr_pages(page);
+	nr_entries = thp_nr_pages(page);
 	/* Get references for the tail pages, too */
 	if (nr_entries > 1)
 		mem_cgroup_id_get_many(swap_memcg, nr_entries - 1);
@@ -7158,7 +7158,7 @@ void mem_cgroup_swapout(struct page *pag
  */
 int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
 {
-	unsigned int nr_pages = hpage_nr_pages(page);
+	unsigned int nr_pages = thp_nr_pages(page);
 	struct page_counter *counter;
 	struct mem_cgroup *memcg;
 	unsigned short oldid;
--- a/mm/memory_hotplug.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/memory_hotplug.c
@@ -1299,7 +1299,7 @@ static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
-	struct page *page;
+	struct page *page, *head;
 	int ret = 0;
 	LIST_HEAD(source);
 
@@ -1307,15 +1307,14 @@ do_migrate_range(unsigned long start_pfn
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
+		head = compound_head(page);
 
 		if (PageHuge(page)) {
-			struct page *head = compound_head(page);
 			pfn = page_to_pfn(head) + compound_nr(head) - 1;
 			isolate_huge_page(head, &source);
 			continue;
 		} else if (PageTransHuge(page))
-			pfn = page_to_pfn(compound_head(page))
-				+ hpage_nr_pages(page) - 1;
+			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
 
 		/*
 		 * HWPoison pages have elevated reference counts so the migration would
--- a/mm/mempolicy.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/mempolicy.c
@@ -1049,7 +1049,7 @@ static int migrate_page_add(struct page
 			list_add_tail(&head->lru, pagelist);
 			mod_node_page_state(page_pgdat(head),
 				NR_ISOLATED_ANON + page_is_file_lru(head),
-				hpage_nr_pages(head));
+				thp_nr_pages(head));
 		} else if (flags & MPOL_MF_STRICT) {
 			/*
 			 * Non-movable page may reach here.  And, there may be
--- a/mm/migrate.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/migrate.c
@@ -193,7 +193,7 @@ void putback_movable_pages(struct list_h
 			put_page(page);
 		} else {
 			mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON +
-					page_is_file_lru(page), -hpage_nr_pages(page));
+					page_is_file_lru(page), -thp_nr_pages(page));
 			putback_lru_page(page);
 		}
 	}
@@ -386,7 +386,7 @@ static int expected_page_refs(struct add
 	 */
 	expected_count += is_device_private_page(page);
 	if (mapping)
-		expected_count += hpage_nr_pages(page) + page_has_private(page);
+		expected_count += thp_nr_pages(page) + page_has_private(page);
 
 	return expected_count;
 }
@@ -441,7 +441,7 @@ int migrate_page_move_mapping(struct add
 	 */
 	newpage->index = page->index;
 	newpage->mapping = page->mapping;
-	page_ref_add(newpage, hpage_nr_pages(page)); /* add cache reference */
+	page_ref_add(newpage, thp_nr_pages(page)); /* add cache reference */
 	if (PageSwapBacked(page)) {
 		__SetPageSwapBacked(newpage);
 		if (PageSwapCache(page)) {
@@ -474,7 +474,7 @@ int migrate_page_move_mapping(struct add
 	 * to one less reference.
 	 * We know this isn't the last reference.
 	 */
-	page_ref_unfreeze(page, expected_count - hpage_nr_pages(page));
+	page_ref_unfreeze(page, expected_count - thp_nr_pages(page));
 
 	xas_unlock(&xas);
 	/* Leave irq disabled to prevent preemption while updating stats */
@@ -591,7 +591,7 @@ static void copy_huge_page(struct page *
 	} else {
 		/* thp page */
 		BUG_ON(!PageTransHuge(src));
-		nr_pages = hpage_nr_pages(src);
+		nr_pages = thp_nr_pages(src);
 	}
 
 	for (i = 0; i < nr_pages; i++) {
@@ -1213,7 +1213,7 @@ out:
 		 */
 		if (likely(!__PageMovable(page)))
 			mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON +
-					page_is_file_lru(page), -hpage_nr_pages(page));
+					page_is_file_lru(page), -thp_nr_pages(page));
 	}
 
 	/*
@@ -1446,7 +1446,7 @@ retry:
 			 * during migration.
 			 */
 			is_thp = PageTransHuge(page);
-			nr_subpages = hpage_nr_pages(page);
+			nr_subpages = thp_nr_pages(page);
 			cond_resched();
 
 			if (PageHuge(page))
@@ -1670,7 +1670,7 @@ static int add_page_for_migration(struct
 		list_add_tail(&head->lru, pagelist);
 		mod_node_page_state(page_pgdat(head),
 			NR_ISOLATED_ANON + page_is_file_lru(head),
-			hpage_nr_pages(head));
+			thp_nr_pages(head));
 	}
 out_putpage:
 	/*
@@ -2034,7 +2034,7 @@ static int numamigrate_isolate_page(pg_d
 
 	page_lru = page_is_file_lru(page);
 	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru,
-				hpage_nr_pages(page));
+				thp_nr_pages(page));
 
 	/*
 	 * Isolating the page has taken another reference, so the
--- a/mm/mlock.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/mlock.c
@@ -61,8 +61,7 @@ void clear_page_mlock(struct page *page)
 	if (!TestClearPageMlocked(page))
 		return;
 
-	mod_zone_page_state(page_zone(page), NR_MLOCK,
-			    -hpage_nr_pages(page));
+	mod_zone_page_state(page_zone(page), NR_MLOCK, -thp_nr_pages(page));
 	count_vm_event(UNEVICTABLE_PGCLEARED);
 	/*
 	 * The previous TestClearPageMlocked() corresponds to the smp_mb()
@@ -95,7 +94,7 @@ void mlock_vma_page(struct page *page)
 
 	if (!TestSetPageMlocked(page)) {
 		mod_zone_page_state(page_zone(page), NR_MLOCK,
-				    hpage_nr_pages(page));
+				    thp_nr_pages(page));
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 		if (!isolate_lru_page(page))
 			putback_lru_page(page);
@@ -192,7 +191,7 @@ unsigned int munlock_vma_page(struct pag
 	/*
 	 * Serialize with any parallel __split_huge_page_refcount() which
 	 * might otherwise copy PageMlocked to part of the tail pages before
-	 * we clear it in the head page. It also stabilizes hpage_nr_pages().
+	 * we clear it in the head page. It also stabilizes thp_nr_pages().
 	 */
 	spin_lock_irq(&pgdat->lru_lock);
 
@@ -202,7 +201,7 @@ unsigned int munlock_vma_page(struct pag
 		goto unlock_out;
 	}
 
-	nr_pages = hpage_nr_pages(page);
+	nr_pages = thp_nr_pages(page);
 	__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
 
 	if (__munlock_isolate_lru_page(page, true)) {
--- a/mm/page_io.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/page_io.c
@@ -274,7 +274,7 @@ static inline void count_swpout_vm_event
 	if (unlikely(PageTransHuge(page)))
 		count_vm_event(THP_SWPOUT);
 #endif
-	count_vm_events(PSWPOUT, hpage_nr_pages(page));
+	count_vm_events(PSWPOUT, thp_nr_pages(page));
 }
 
 #if defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
--- a/mm/page_vma_mapped.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/page_vma_mapped.c
@@ -61,7 +61,7 @@ static inline bool pfn_is_match(struct p
 		return page_pfn == pfn;
 
 	/* THP can be referenced by any subpage */
-	return pfn >= page_pfn && pfn - page_pfn < hpage_nr_pages(page);
+	return pfn >= page_pfn && pfn - page_pfn < thp_nr_pages(page);
 }
 
 /**
--- a/mm/rmap.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/rmap.c
@@ -1130,7 +1130,7 @@ void do_page_add_anon_rmap(struct page *
 	}
 
 	if (first) {
-		int nr = compound ? hpage_nr_pages(page) : 1;
+		int nr = compound ? thp_nr_pages(page) : 1;
 		/*
 		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
 		 * these counters are not modified in interrupt context, and
@@ -1169,7 +1169,7 @@ void do_page_add_anon_rmap(struct page *
 void page_add_new_anon_rmap(struct page *page,
 	struct vm_area_struct *vma, unsigned long address, bool compound)
 {
-	int nr = compound ? hpage_nr_pages(page) : 1;
+	int nr = compound ? thp_nr_pages(page) : 1;
 
 	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
 	__SetPageSwapBacked(page);
@@ -1860,7 +1860,7 @@ static void rmap_walk_anon(struct page *
 		return;
 
 	pgoff_start = page_to_pgoff(page);
-	pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
+	pgoff_end = pgoff_start + thp_nr_pages(page) - 1;
 	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
 			pgoff_start, pgoff_end) {
 		struct vm_area_struct *vma = avc->vma;
@@ -1913,7 +1913,7 @@ static void rmap_walk_file(struct page *
 		return;
 
 	pgoff_start = page_to_pgoff(page);
-	pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
+	pgoff_end = pgoff_start + thp_nr_pages(page) - 1;
 	if (!locked)
 		i_mmap_lock_read(mapping);
 	vma_interval_tree_foreach(vma, &mapping->i_mmap,
--- a/mm/swap.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/swap.c
@@ -241,7 +241,7 @@ static void pagevec_move_tail_fn(struct
 		del_page_from_lru_list(page, lruvec, page_lru(page));
 		ClearPageActive(page);
 		add_page_to_lru_list_tail(page, lruvec, page_lru(page));
-		(*pgmoved) += hpage_nr_pages(page);
+		(*pgmoved) += thp_nr_pages(page);
 	}
 }
 
@@ -312,7 +312,7 @@ void lru_note_cost(struct lruvec *lruvec
 void lru_note_cost_page(struct page *page)
 {
 	lru_note_cost(mem_cgroup_page_lruvec(page, page_pgdat(page)),
-		      page_is_file_lru(page), hpage_nr_pages(page));
+		      page_is_file_lru(page), thp_nr_pages(page));
 }
 
 static void __activate_page(struct page *page, struct lruvec *lruvec,
@@ -320,7 +320,7 @@ static void __activate_page(struct page
 {
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
 		int lru = page_lru_base_type(page);
-		int nr_pages = hpage_nr_pages(page);
+		int nr_pages = thp_nr_pages(page);
 
 		del_page_from_lru_list(page, lruvec, lru);
 		SetPageActive(page);
@@ -500,7 +500,7 @@ void lru_cache_add_inactive_or_unevictab
 		 * lock is held(spinlock), which implies preemption disabled.
 		 */
 		__mod_zone_page_state(page_zone(page), NR_MLOCK,
-				    hpage_nr_pages(page));
+				    thp_nr_pages(page));
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 	}
 	lru_cache_add(page);
@@ -532,7 +532,7 @@ static void lru_deactivate_file_fn(struc
 {
 	int lru;
 	bool active;
-	int nr_pages = hpage_nr_pages(page);
+	int nr_pages = thp_nr_pages(page);
 
 	if (!PageLRU(page))
 		return;
@@ -580,7 +580,7 @@ static void lru_deactivate_fn(struct pag
 {
 	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
 		int lru = page_lru_base_type(page);
-		int nr_pages = hpage_nr_pages(page);
+		int nr_pages = thp_nr_pages(page);
 
 		del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
 		ClearPageActive(page);
@@ -599,7 +599,7 @@ static void lru_lazyfree_fn(struct page
 	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
 	    !PageSwapCache(page) && !PageUnevictable(page)) {
 		bool active = PageActive(page);
-		int nr_pages = hpage_nr_pages(page);
+		int nr_pages = thp_nr_pages(page);
 
 		del_page_from_lru_list(page, lruvec,
 				       LRU_INACTIVE_ANON + active);
@@ -972,7 +972,7 @@ static void __pagevec_lru_add_fn(struct
 {
 	enum lru_list lru;
 	int was_unevictable = TestClearPageUnevictable(page);
-	int nr_pages = hpage_nr_pages(page);
+	int nr_pages = thp_nr_pages(page);
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
--- a/mm/swapfile.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/swapfile.c
@@ -1370,7 +1370,7 @@ void put_swap_page(struct page *page, sw
 	unsigned char *map;
 	unsigned int i, free_entries = 0;
 	unsigned char val;
-	int size = swap_entry_size(hpage_nr_pages(page));
+	int size = swap_entry_size(thp_nr_pages(page));
 
 	si = _swap_info_get(entry);
 	if (!si)
--- a/mm/swap_state.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/swap_state.c
@@ -130,7 +130,7 @@ int add_to_swap_cache(struct page *page,
 	struct address_space *address_space = swap_address_space(entry);
 	pgoff_t idx = swp_offset(entry);
 	XA_STATE_ORDER(xas, &address_space->i_pages, idx, compound_order(page));
-	unsigned long i, nr = hpage_nr_pages(page);
+	unsigned long i, nr = thp_nr_pages(page);
 	void *old;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -183,7 +183,7 @@ void __delete_from_swap_cache(struct pag
 			swp_entry_t entry, void *shadow)
 {
 	struct address_space *address_space = swap_address_space(entry);
-	int i, nr = hpage_nr_pages(page);
+	int i, nr = thp_nr_pages(page);
 	pgoff_t idx = swp_offset(entry);
 	XA_STATE(xas, &address_space->i_pages, idx);
 
@@ -278,7 +278,7 @@ void delete_from_swap_cache(struct page
 	xa_unlock_irq(&address_space->i_pages);
 
 	put_swap_page(page, entry);
-	page_ref_sub(page, hpage_nr_pages(page));
+	page_ref_sub(page, thp_nr_pages(page));
 }
 
 void clear_shadow_from_swap_cache(int type, unsigned long begin,
--- a/mm/vmscan.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/vmscan.c
@@ -1354,7 +1354,7 @@ static unsigned int shrink_page_list(str
 			case PAGE_ACTIVATE:
 				goto activate_locked;
 			case PAGE_SUCCESS:
-				stat->nr_pageout += hpage_nr_pages(page);
+				stat->nr_pageout += thp_nr_pages(page);
 
 				if (PageWriteback(page))
 					goto keep;
@@ -1862,7 +1862,7 @@ static unsigned noinline_for_stack move_
 		SetPageLRU(page);
 		lru = page_lru(page);
 
-		nr_pages = hpage_nr_pages(page);
+		nr_pages = thp_nr_pages(page);
 		update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
 		list_move(&page->lru, &lruvec->lists[lru]);
 
@@ -2065,7 +2065,7 @@ static void shrink_active_list(unsigned
 			 * so we ignore them here.
 			 */
 			if ((vm_flags & VM_EXEC) && page_is_file_lru(page)) {
-				nr_rotated += hpage_nr_pages(page);
+				nr_rotated += thp_nr_pages(page);
 				list_add(&page->lru, &l_active);
 				continue;
 			}
--- a/mm/workingset.c~mm-replace-hpage_nr_pages-with-thp_nr_pages
+++ a/mm/workingset.c
@@ -263,7 +263,7 @@ void *workingset_eviction(struct page *p
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 
 	lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
-	workingset_age_nonresident(lruvec, hpage_nr_pages(page));
+	workingset_age_nonresident(lruvec, thp_nr_pages(page));
 	/* XXX: target_memcg can be NULL, go through lruvec */
 	memcgid = mem_cgroup_id(lruvec_memcg(lruvec));
 	eviction = atomic_long_read(&lruvec->nonresident_age);
@@ -374,7 +374,7 @@ void workingset_refault(struct page *pag
 		goto out;
 
 	SetPageActive(page);
-	workingset_age_nonresident(lruvec, hpage_nr_pages(page));
+	workingset_age_nonresident(lruvec, thp_nr_pages(page));
 	inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + file);
 
 	/* Page was active prior to eviction */
@@ -411,7 +411,7 @@ void workingset_activation(struct page *
 	if (!mem_cgroup_disabled() && !memcg)
 		goto out;
 	lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-	workingset_age_nonresident(lruvec, hpage_nr_pages(page));
+	workingset_age_nonresident(lruvec, thp_nr_pages(page));
 out:
 	rcu_read_unlock();
 }
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 12/39] mm: add thp_head
  2020-08-15  0:29 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2020-08-15  0:30 ` [patch 11/39] mm: replace hpage_nr_pages with thp_nr_pages Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 13/39] mm: introduce offset_in_thp Andrew Morton
                   ` (127 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, david, kirill.shutemov, linux-mm, mike.kravetz, mm-commits,
	torvalds, william.kucharski, willy, ziy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: add thp_head

This is like compound_head() but compiles away when
CONFIG_TRANSPARENT_HUGEPAGE is not enabled.

Link: http://lkml.kernel.org/r/20200629151959.15779-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/huge_mm.h |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

--- a/include/linux/huge_mm.h~mm-add-thp_head
+++ a/include/linux/huge_mm.h
@@ -260,6 +260,15 @@ static inline spinlock_t *pud_trans_huge
 }
 
 /**
+ * thp_head - Head page of a transparent huge page.
+ * @page: Any page (tail, head or regular) found in the page cache.
+ */
+static inline struct page *thp_head(struct page *page)
+{
+	return compound_head(page);
+}
+
+/**
  * thp_order - Order of a transparent huge page.
  * @page: Head page of a transparent huge page.
  */
@@ -335,6 +344,12 @@ static inline struct list_head *page_def
 #define HPAGE_PUD_MASK ({ BUILD_BUG(); 0; })
 #define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; })
 
+static inline struct page *thp_head(struct page *page)
+{
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
+	return page;
+}
+
 static inline unsigned int thp_order(struct page *page)
 {
 	VM_BUG_ON_PGFLAGS(PageTail(page), page);
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 13/39] mm: introduce offset_in_thp
  2020-08-15  0:29 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2020-08-15  0:30 ` [patch 12/39] mm: add thp_head Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 14/39] fs: autofs: delete repeated words in comments Andrew Morton
                   ` (126 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, david, kirill.shutemov, linux-mm, mike.kravetz, mm-commits,
	torvalds, william.kucharski, willy, ziy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: introduce offset_in_thp

Mirroring offset_in_page(), this gives you the offset within this
particular page, no matter what size page it is.  It optimises down to
offset_in_page() if CONFIG_TRANSPARENT_HUGEPAGE is not set.

Link: http://lkml.kernel.org/r/20200629151959.15779-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    1 +
 1 file changed, 1 insertion(+)

--- a/include/linux/mm.h~mm-introduce-offset_in_thp
+++ a/include/linux/mm.h
@@ -1594,6 +1594,7 @@ static inline void clear_page_pfmemalloc
 extern void pagefault_out_of_memory(void);
 
 #define offset_in_page(p)	((unsigned long)(p) & ~PAGE_MASK)
+#define offset_in_thp(page, p)	((unsigned long)(p) & (thp_size(page) - 1))
 
 /*
  * Flags passed to show_mem() and show_free_areas() to suppress output in
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 14/39] fs: autofs: delete repeated words in comments
  2020-08-15  0:29 incoming Andrew Morton
                   ` (12 preceding siblings ...)
  2020-08-15  0:30 ` [patch 13/39] mm: introduce offset_in_thp Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 15/39] mm/madvise: pass task and mm to do_madvise Andrew Morton
                   ` (125 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, raven, rdunlap, torvalds

From: Randy Dunlap <rdunlap@infradead.org>
Subject: fs: autofs: delete repeated words in comments

Drop duplicated words {the, at} in comments.

Link: http://lkml.kernel.org/r/20200811021817.24982-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/autofs/dev-ioctl.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/autofs/dev-ioctl.c~fs-autofs-delete-repeated-words-in-comments
+++ a/fs/autofs/dev-ioctl.c
@@ -20,7 +20,7 @@
  * another mount. This situation arises when starting automount(8)
  * or other user space daemon which uses direct mounts or offset
  * mounts (used for autofs lazy mount/umount of nested mount trees),
- * which have been left busy at at service shutdown.
+ * which have been left busy at service shutdown.
  */
 
 typedef int (*ioctl_fn)(struct file *, struct autofs_sb_info *,
@@ -496,7 +496,7 @@ static int autofs_dev_ioctl_askumount(st
  * located path is the root of a mount we return 1 along with
  * the super magic of the mount or 0 otherwise.
  *
- * In both cases the the device number (as returned by
+ * In both cases the device number (as returned by
  * new_encode_dev()) is also returned.
  */
 static int autofs_dev_ioctl_ismountpoint(struct file *fp,
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 15/39] mm/madvise: pass task and mm to do_madvise
  2020-08-15  0:29 incoming Andrew Morton
                   ` (13 preceding siblings ...)
  2020-08-15  0:30 ` [patch 14/39] fs: autofs: delete repeated words in comments Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 16/39] pid: move pidfd_get_pid() to pid.c Andrew Morton
                   ` (124 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, alexander.h.duyck, axboe, bgeffon, christian.brauner,
	christian, dancol, hannes, jannh, joaodias, joel, ktkhai,
	linux-man, linux-mm, mhocko, minchan, mm-commits, oleksandr,
	rientjes, shakeelb, sj38.park, sjpark, sonnyrao, sspatil, surenb,
	timmurray, torvalds, vbabka

From: Minchan Kim <minchan@kernel.org>
Subject: mm/madvise: pass task and mm to do_madvise

Patch series "introduce memory hinting API for external process", v8.

Now, we have MADV_PAGEOUT and MADV_COLD as madvise hinting API.  With
that, application could give hints to kernel what memory range are
preferred to be reclaimed.  However, in some platform(e.g., Android), the
information required to make the hinting decision is not known to the app.
Instead, it is known to a centralized userspace daemon(e.g.,
ActivityManagerService), and that daemon must be able to initiate reclaim
on its own without any app involvement.

To solve the concern, this patch introduces new syscall -
process_madvise(2).  Bascially, it's same with madvise(2) syscall but it
has some differences.

1. It needs pidfd of target process to provide the hint

2.  It supports only MADV_{COLD|PAGEOUT|MERGEABLE|UNMEREABLE} at this
   moment.  Other hints in madvise will be opened when there are explicit
   requests from community to prevent unexpected bugs we couldn't support.

3.  Only privileged processes can do something for other process's
   address space.

For more detail of the new API, please see "mm: introduce external memory
hinting API" description in this patchset.


This patch (of 4):

In upcoming patches, do_madvise will be called from external process
context so we shouldn't asssume "current" is always hinted process's
task_struct.

Furthermore, we must not access mm_struct via task->mm, but obtain it
via access_mm() once (in the following patch) and only use that pointer
[1], so pass it to do_madvise() as well.  Note the vma->vm_mm pointers
are safe, so we can use them further down the call stack.

And let's pass *current* and current->mm as arguments of do_madvise so
it shouldn't change existing behavior but prepare next patch to make
review easy.

Note: io_madvise passes NULL as target_task argument of do_madvise because
it couldn't know who is target.

[1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com

[vbabka@suse.cz: changelog tweak]
[minchan@kernel.org: use current->mm for io_uring]
  Link: http://lkml.kernel.org/r/20200423145215.72666-1-minchan@kernel.org
[akpm@linux-foundation.org: fix it for upstream changes]
[akpm@linux-foundation.org: whoops]
[rdunlap@infradead.org: add missing includes]
Link: http://lkml.kernel.org/r/20200622192900.22757-1-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200302193630.68771-2-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200622192900.22757-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jann Horn <jannh@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: John Dias <joaodias@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: <linux-man@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/io_uring.c      |    2 +-
 include/linux/mm.h |    3 ++-
 mm/madvise.c       |   40 +++++++++++++++++++++++-----------------
 3 files changed, 26 insertions(+), 19 deletions(-)

--- a/fs/io_uring.c~mm-madvise-pass-task-and-mm-to-do_madvise
+++ a/fs/io_uring.c
@@ -3731,7 +3731,7 @@ static int io_madvise(struct io_kiocb *r
 	if (force_nonblock)
 		return -EAGAIN;
 
-	ret = do_madvise(ma->addr, ma->len, ma->advice);
+	ret = do_madvise(NULL, current->mm, ma->addr, ma->len, ma->advice);
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_req_complete(req, ret);
--- a/include/linux/mm.h~mm-madvise-pass-task-and-mm-to-do_madvise
+++ a/include/linux/mm.h
@@ -2547,7 +2547,8 @@ extern int __do_munmap(struct mm_struct
 		       struct list_head *uf, bool downgrade);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t,
 		     struct list_head *uf);
-extern int do_madvise(unsigned long start, size_t len_in, int behavior);
+extern int do_madvise(struct task_struct *target_task, struct mm_struct *mm,
+		unsigned long start, size_t len_in, int behavior);
 
 #ifdef CONFIG_MMU
 extern int __mm_populate(unsigned long addr, unsigned long len,
--- a/mm/madvise.c~mm-madvise-pass-task-and-mm-to-do_madvise
+++ a/mm/madvise.c
@@ -22,12 +22,14 @@
 #include <linux/file.h>
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
+#include <linux/compat.h>
 #include <linux/pagewalk.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
 #include <linux/shmem_fs.h>
 #include <linux/mmu_notifier.h>
 #include <linux/sched/mm.h>
+#include <linux/uio.h>
 
 #include <asm/tlb.h>
 
@@ -255,6 +257,7 @@ static long madvise_willneed(struct vm_a
 			     struct vm_area_struct **prev,
 			     unsigned long start, unsigned long end)
 {
+	struct mm_struct *mm = vma->vm_mm;
 	struct file *file = vma->vm_file;
 	loff_t offset;
 
@@ -289,12 +292,12 @@ static long madvise_willneed(struct vm_a
 	 */
 	*prev = NULL;	/* tell sys_madvise we drop mmap_lock */
 	get_file(file);
-	mmap_read_unlock(current->mm);
+	mmap_read_unlock(mm);
 	offset = (loff_t)(start - vma->vm_start)
 			+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 	vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
 	fput(file);
-	mmap_read_lock(current->mm);
+	mmap_read_lock(mm);
 	return 0;
 }
 
@@ -683,7 +686,6 @@ out:
 	if (nr_swap) {
 		if (current->mm == mm)
 			sync_mm_rss(mm);
-
 		add_mm_counter(mm, MM_SWAPENTS, nr_swap);
 	}
 	arch_leave_lazy_mmu_mode();
@@ -763,6 +765,8 @@ static long madvise_dontneed_free(struct
 				  unsigned long start, unsigned long end,
 				  int behavior)
 {
+	struct mm_struct *mm = vma->vm_mm;
+
 	*prev = vma;
 	if (!can_madv_lru_vma(vma))
 		return -EINVAL;
@@ -770,8 +774,8 @@ static long madvise_dontneed_free(struct
 	if (!userfaultfd_remove(vma, start, end)) {
 		*prev = NULL; /* mmap_lock has been dropped, prev is stale */
 
-		mmap_read_lock(current->mm);
-		vma = find_vma(current->mm, start);
+		mmap_read_lock(mm);
+		vma = find_vma(mm, start);
 		if (!vma)
 			return -ENOMEM;
 		if (start < vma->vm_start) {
@@ -825,6 +829,7 @@ static long madvise_remove(struct vm_are
 	loff_t offset;
 	int error;
 	struct file *f;
+	struct mm_struct *mm = vma->vm_mm;
 
 	*prev = NULL;	/* tell sys_madvise we drop mmap_lock */
 
@@ -852,13 +857,13 @@ static long madvise_remove(struct vm_are
 	get_file(f);
 	if (userfaultfd_remove(vma, start, end)) {
 		/* mmap_lock was not released by userfaultfd_remove() */
-		mmap_read_unlock(current->mm);
+		mmap_read_unlock(mm);
 	}
 	error = vfs_fallocate(f,
 				FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
 				offset, end - start);
 	fput(f);
-	mmap_read_lock(current->mm);
+	mmap_read_lock(mm);
 	return error;
 }
 
@@ -1051,7 +1056,8 @@ madvise_behavior_valid(int behavior)
  *  -EBADF  - map exists, but area maps something that isn't a file.
  *  -EAGAIN - a kernel resource was temporarily unavailable.
  */
-int do_madvise(unsigned long start, size_t len_in, int behavior)
+int do_madvise(struct task_struct *target_task, struct mm_struct *mm,
+		unsigned long start, size_t len_in, int behavior)
 {
 	unsigned long end, tmp;
 	struct vm_area_struct *vma, *prev;
@@ -1089,7 +1095,7 @@ int do_madvise(unsigned long start, size
 
 	write = madvise_need_mmap_write(behavior);
 	if (write) {
-		if (mmap_write_lock_killable(current->mm))
+		if (mmap_write_lock_killable(mm))
 			return -EINTR;
 
 		/*
@@ -1104,12 +1110,12 @@ int do_madvise(unsigned long start, size
 		 * but for now we have the mmget_still_valid()
 		 * model.
 		 */
-		if (!mmget_still_valid(current->mm)) {
-			mmap_write_unlock(current->mm);
+		if (!mmget_still_valid(mm)) {
+			mmap_write_unlock(mm);
 			return -EINTR;
 		}
 	} else {
-		mmap_read_lock(current->mm);
+		mmap_read_lock(mm);
 	}
 
 	/*
@@ -1117,7 +1123,7 @@ int do_madvise(unsigned long start, size
 	 * ranges, just ignore them, but return -ENOMEM at the end.
 	 * - different from the way of handling in mlock etc.
 	 */
-	vma = find_vma_prev(current->mm, start, &prev);
+	vma = find_vma_prev(mm, start, &prev);
 	if (vma && start > vma->vm_start)
 		prev = vma;
 
@@ -1154,19 +1160,19 @@ int do_madvise(unsigned long start, size
 		if (prev)
 			vma = prev->vm_next;
 		else	/* madvise_remove dropped mmap_lock */
-			vma = find_vma(current->mm, start);
+			vma = find_vma(mm, start);
 	}
 out:
 	blk_finish_plug(&plug);
 	if (write)
-		mmap_write_unlock(current->mm);
+		mmap_write_unlock(mm);
 	else
-		mmap_read_unlock(current->mm);
+		mmap_read_unlock(mm);
 
 	return error;
 }
 
 SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
 {
-	return do_madvise(start, len_in, behavior);
+	return do_madvise(current, current->mm, start, len_in, behavior);
 }
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 16/39] pid: move pidfd_get_pid() to pid.c
  2020-08-15  0:29 incoming Andrew Morton
                   ` (14 preceding siblings ...)
  2020-08-15  0:30 ` [patch 15/39] mm/madvise: pass task and mm to do_madvise Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:30 ` [patch 17/39] mm/madvise: introduce process_madvise() syscall: an external memory hinting API Andrew Morton
                   ` (123 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, alexander.h.duyck, axboe, bgeffon, christian.brauner,
	dancol, hannes, jannh, joaodias, joel, ktkhai, linux-man,
	linux-mm, mhocko, minchan, mm-commits, oleksandr, rientjes,
	shakeelb, sj38.park, sjpark, sonnyrao, sspatil, surenb,
	timmurray, torvalds, vbabka

From: Minchan Kim <minchan@kernel.org>
Subject: pid: move pidfd_get_pid() to pid.c

process_madvise syscall needs pidfd_get_pid function to translate pidfd to
pid so this patch move the function to kernel/pid.c.

Link: http://lkml.kernel.org/r/20200302193630.68771-5-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200622192900.22757-3-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jann Horn <jannh@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: <linux-man@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pid.h |    1 +
 kernel/exit.c       |   17 -----------------
 kernel/pid.c        |   17 +++++++++++++++++
 3 files changed, 18 insertions(+), 17 deletions(-)

--- a/include/linux/pid.h~pid-move-pidfd_get_pid-to-pidc
+++ a/include/linux/pid.h
@@ -77,6 +77,7 @@ extern const struct file_operations pidf
 struct file;
 
 extern struct pid *pidfd_pid(const struct file *file);
+struct pid *pidfd_get_pid(unsigned int fd);
 
 static inline struct pid *get_pid(struct pid *pid)
 {
--- a/kernel/exit.c~pid-move-pidfd_get_pid-to-pidc
+++ a/kernel/exit.c
@@ -1474,23 +1474,6 @@ end:
 	return retval;
 }
 
-static struct pid *pidfd_get_pid(unsigned int fd)
-{
-	struct fd f;
-	struct pid *pid;
-
-	f = fdget(fd);
-	if (!f.file)
-		return ERR_PTR(-EBADF);
-
-	pid = pidfd_pid(f.file);
-	if (!IS_ERR(pid))
-		get_pid(pid);
-
-	fdput(f);
-	return pid;
-}
-
 static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
 			  int options, struct rusage *ru)
 {
--- a/kernel/pid.c~pid-move-pidfd_get_pid-to-pidc
+++ a/kernel/pid.c
@@ -519,6 +519,23 @@ struct pid *find_ge_pid(int nr, struct p
 	return idr_get_next(&ns->idr, &nr);
 }
 
+struct pid *pidfd_get_pid(unsigned int fd)
+{
+	struct fd f;
+	struct pid *pid;
+
+	f = fdget(fd);
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	pid = pidfd_pid(f.file);
+	if (!IS_ERR(pid))
+		get_pid(pid);
+
+	fdput(f);
+	return pid;
+}
+
 /**
  * pidfd_create() - Create a new pid file descriptor.
  *
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 17/39] mm/madvise: introduce process_madvise() syscall: an external memory hinting API
  2020-08-15  0:29 incoming Andrew Morton
                   ` (15 preceding siblings ...)
  2020-08-15  0:30 ` [patch 16/39] pid: move pidfd_get_pid() to pid.c Andrew Morton
@ 2020-08-15  0:30 ` Andrew Morton
  2020-08-15  0:31 ` [patch 18/39] mm/madvise: check fatal signal pending of target process Andrew Morton
                   ` (122 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:30 UTC (permalink / raw)
  To: akpm, alexander.h.duyck, axboe, bgeffon, christian.brauner,
	christian, dancol, hannes, jannh, joaodias, joel, ktkhai,
	linux-man, linux-mm, mhocko, minchan, mm-commits, oleksandr,
	rientjes, shakeelb, sj38.park, sjpark, sonnyrao, sspatil, surenb,
	timmurray, torvalds, vbabka

From: Minchan Kim <minchan@kernel.org>
Subject: mm/madvise: introduce process_madvise() syscall: an external memory hinting API

There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.

The information required to make the reclaim decision is not known to
the app.  Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to
initiate reclaim on its own without any app involvement.

To solve the issue, this patch introduces a new syscall process_madvise(2).
It uses pidfd of an external process to give the hint. It also supports
vector address range because Android app has thousands of vmas due to
zygote so it's totally waste of CPU and power if we should call the
syscall one by one for each vma.(With testing 2000-vma syscall vs
1-vector syscall, it showed 15% performance improvement.  I think it
would be bigger in real practice because the testing ran very cache
friendly environment).

Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this
could benefit users like TCP receive zerocopy and malloc implementations.
In future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment.  With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.

ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the
same UID) gives it the right to ptrace the process could use it
successfully. The flag argument is reserved for future use if we need to
extend the API.

I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky.  Because we are not sure all hints
make sense from external process and implementation for the hint may
rely on the caller being in the current context so it could be
error-prone.  Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this
patch.

If someone want to add other hints, we could hear hear the usecase and
review it for each hint.  It's safer for maintenance rather than
introducing a buggy syscall but hard to fix it later.

So finally, the API is as follows,

      ssize_t process_madvise(int pidfd, const struct iovec *iovec,
                unsigned long vlen, int advice, unsigned int flags);

    DESCRIPTION
      The process_madvise() system call is used to give advice or directions
      to the kernel about the address ranges from external process as well as
      local process. It provides the advice to address ranges of process
      described by iovec and vlen. The goal of such advice is to improve system
      or application performance.

      The pidfd selects the process referred to by the PID file descriptor
      specified in pidfd. (See pidofd_open(2) for further information)

      The pointer iovec points to an array of iovec structures, defined in
      <sys/uio.h> as:

        struct iovec {
            void *iov_base;         /* starting address */
            size_t iov_len;         /* number of bytes to be advised */
        };

      The iovec describes address ranges beginning at address(iov_base)
      and with size length of bytes(iov_len).

      The vlen represents the number of elements in iovec.

      The advice is indicated in the advice argument, which is one of the
      following at this moment if the target process specified by pidfd is
      external.

        MADV_COLD
        MADV_PAGEOUT

      Permission to provide a hint to external process is governed by a
      ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

      The process_madvise supports every advice madvise(2) has if target
      process is in same thread group with calling process so user could
      use process_madvise(2) to extend existing madvise(2) to support
      vector address ranges.

    RETURN VALUE
      On success, process_madvise() returns the number of bytes advised.
      This return value may be less than the total number of requested
      bytes, if an error occurred. The caller should check return value
      to determine whether a partial advice occurred.

FAQ:

Q.1 - Why does any external entity have better knowledge?

Quote from Sandeep

"For Android, every application (including the special SystemServer)
are forked from Zygote.  The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.

After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.

In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.

So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.

Besides, we can never rely on applications to clean things up
themselves.  We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.

So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.

- ssp

Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?

process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called.  If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect.  It's the
responsibility of the process calling process_madvise to close this
race condition.  For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called.  Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process.  Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm.  The suggested API itself does not provide synchronization.  It
also apply other APIs like move_pages, process_vm_write.

The race isn't really a problem though.  Why is it so wrong to require
that callers do their own synchronization in some manner?  Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something.  Think about mmap.  It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before.  That's where we need synchronization by using other API or
design from userside.  It shouldn't be part of API itself.  If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3].  Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.

To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.

Q.3 - Why doesn't ptrace work?

Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA.  Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill.  It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.

[1] https://developer.android.com/topic/performance/memory"

[2] process_getinfo for getting the cookie which is updated whenever
    vma of process address layout are changed - Daniel Colascione -
    https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

[3] anonymous fd which is used for the object(i.e., address range)
    validation - Michal Hocko -
    https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

[minchan@kernel.org: fix process_madvise build break for arm64]
  Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
  Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: <linux-man@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd.h             |    2 
 arch/arm64/include/asm/unistd32.h           |    2 
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    2 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 include/linux/compat.h                      |    4 
 include/linux/syscalls.h                    |    2 
 include/uapi/asm-generic/unistd.h           |    4 
 kernel/sys_ni.c                             |    2 
 mm/madvise.c                                |  121 ++++++++++++++++++
 23 files changed, 152 insertions(+), 2 deletions(-)

--- a/arch/alpha/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/alpha/kernel/syscalls/syscall.tbl
@@ -479,3 +479,4 @@
 547	common	openat2				sys_openat2
 548	common	pidfd_getfd			sys_pidfd_getfd
 549	common	faccessat2			sys_faccessat2
+550	common	process_madvise			sys_process_madvise
--- a/arch/arm64/include/asm/unistd32.h~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/arm64/include/asm/unistd32.h
@@ -887,6 +887,8 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_process_madvise 440
+__SYSCALL(__NR_process_madvise, compat_sys_process_madvise)
 
 /*
  * Please add new compat syscalls above this comment and update
--- a/arch/arm64/include/asm/unistd.h~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		440
+#define __NR_compat_syscalls		441
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
--- a/arch/arm/tools/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/arm/tools/syscall.tbl
@@ -453,3 +453,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	process_madvise			sys_process_madvise
--- a/arch/ia64/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/ia64/kernel/syscalls/syscall.tbl
@@ -360,3 +360,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	process_madvise			sys_process_madvise
--- a/arch/m68k/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/m68k/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	process_madvise			sys_process_madvise
--- a/arch/microblaze/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -445,3 +445,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	process_madvise			sys_process_madvise
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -378,3 +378,4 @@
 437	n32	openat2				sys_openat2
 438	n32	pidfd_getfd			sys_pidfd_getfd
 439	n32	faccessat2			sys_faccessat2
+440	n32	process_madvise			compat_sys_process_madvise
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -354,3 +354,4 @@
 437	n64	openat2				sys_openat2
 438	n64	pidfd_getfd			sys_pidfd_getfd
 439	n64	faccessat2			sys_faccessat2
+440	n64	process_madvise			sys_process_madvise
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -427,3 +427,4 @@
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
 439	o32	faccessat2			sys_faccessat2
+440	o32	process_madvise			sys_process_madvise		compat_sys_process_madvise
--- a/arch/parisc/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/parisc/kernel/syscalls/syscall.tbl
@@ -437,3 +437,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	process_madvise			sys_process_madvise		compat_sys_process_madvise
--- a/arch/powerpc/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -529,3 +529,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	process_madvise			sys_process_madvise		compat_sys_process_madvise
--- a/arch/s390/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/s390/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@
 437  common	openat2			sys_openat2			sys_openat2
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
 439  common	faccessat2		sys_faccessat2			sys_faccessat2
+440	common	process_madvise		sys_process_madvise		compat_sys_process_madvise
--- a/arch/sh/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/sh/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	process_madvise			sys_process_madvise
--- a/arch/sparc/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/sparc/kernel/syscalls/syscall.tbl
@@ -485,3 +485,4 @@
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	process_madvise			sys_process_madvise		compat_sys_process_madvise
--- a/arch/x86/entry/syscalls/syscall_32.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/x86/entry/syscalls/syscall_32.tbl
@@ -444,3 +444,4 @@
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
+440	i386	process_madvise		sys_process_madvise		compat_sys_process_madvise
--- a/arch/x86/entry/syscalls/syscall_64.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/x86/entry/syscalls/syscall_64.tbl
@@ -361,6 +361,7 @@
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
+440	64	process_madvise		sys_process_madvise
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
@@ -404,3 +405,4 @@
 545	x32	execveat		compat_sys_execveat
 546	x32	preadv2			compat_sys_preadv64v2
 547	x32	pwritev2		compat_sys_pwritev64v2
+548	x32	process_madvise		compat_sys_process_madvise
--- a/arch/xtensa/kernel/syscalls/syscall.tbl~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -410,3 +410,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	process_madvise			sys_process_madvise
--- a/include/linux/compat.h~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/include/linux/compat.h
@@ -823,6 +823,10 @@ asmlinkage long compat_sys_pwritev64v2(u
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
 
+asmlinkage ssize_t compat_sys_process_madvise(compat_int_t pidfd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, compat_int_t behavior,
+		compat_uint_t flags);
 
 /*
  * Deprecated system calls which are still defined in
--- a/include/linux/syscalls.h~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/include/linux/syscalls.h
@@ -880,6 +880,8 @@ asmlinkage long sys_munlockall(void);
 asmlinkage long sys_mincore(unsigned long start, size_t len,
 				unsigned char __user * vec);
 asmlinkage long sys_madvise(unsigned long start, size_t len, int behavior);
+asmlinkage long sys_process_madvise(int pidfd, const struct iovec __user *vec,
+		unsigned long vlen, int behavior, unsigned int flags);
 asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
 			unsigned long prot, unsigned long pgoff,
 			unsigned long flags);
--- a/include/uapi/asm-generic/unistd.h~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/include/uapi/asm-generic/unistd.h
@@ -859,9 +859,11 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_process_madvise 440
+__SC_COMP(__NR_process_madvise, sys_process_madvise, compat_sys_process_madvise)
 
 #undef __NR_syscalls
-#define __NR_syscalls 440
+#define __NR_syscalls 441
 
 /*
  * 32 bit systems traditionally used different
--- a/kernel/sys_ni.c~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/kernel/sys_ni.c
@@ -280,6 +280,8 @@ COND_SYSCALL(mlockall);
 COND_SYSCALL(munlockall);
 COND_SYSCALL(mincore);
 COND_SYSCALL(madvise);
+COND_SYSCALL(process_madvise);
+COND_SYSCALL_COMPAT(process_madvise);
 COND_SYSCALL(remap_file_pages);
 COND_SYSCALL(mbind);
 COND_SYSCALL_COMPAT(mbind);
--- a/mm/madvise.c~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api
+++ a/mm/madvise.c
@@ -17,6 +17,7 @@
 #include <linux/falloc.h>
 #include <linux/fadvise.h>
 #include <linux/sched.h>
+#include <linux/sched/mm.h>
 #include <linux/ksm.h>
 #include <linux/fs.h>
 #include <linux/file.h>
@@ -995,6 +996,18 @@ madvise_behavior_valid(int behavior)
 	}
 }
 
+static bool
+process_madvise_behavior_valid(int behavior)
+{
+	switch (behavior) {
+	case MADV_COLD:
+	case MADV_PAGEOUT:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*
  * The madvise(2) system call.
  *
@@ -1042,6 +1055,11 @@ madvise_behavior_valid(int behavior)
  *  MADV_DONTDUMP - the application wants to prevent pages in the given range
  *		from being included in its core dump.
  *  MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump.
+ *  MADV_COLD - the application is not expected to use this memory soon,
+ *		deactivate pages in this range so that they can be reclaimed
+ *		easily if memory pressure hanppens.
+ *  MADV_PAGEOUT - the application is not expected to use this memory soon,
+ *		page out the pages in this range immediately.
  *
  * return values:
  *  zero    - success
@@ -1176,3 +1194,106 @@ SYSCALL_DEFINE3(madvise, unsigned long,
 {
 	return do_madvise(current, current->mm, start, len_in, behavior);
 }
+
+static int process_madvise_vec(struct task_struct *target_task,
+		struct mm_struct *mm, struct iov_iter *iter, int behavior)
+{
+	struct iovec iovec;
+	int ret = 0;
+
+	while (iov_iter_count(iter)) {
+		iovec = iov_iter_iovec(iter);
+		ret = do_madvise(target_task, mm, (unsigned long)iovec.iov_base,
+					iovec.iov_len, behavior);
+		if (ret < 0)
+			break;
+		iov_iter_advance(iter, iovec.iov_len);
+	}
+
+	return ret;
+}
+
+static ssize_t do_process_madvise(int pidfd, struct iov_iter *iter,
+				int behavior, unsigned int flags)
+{
+	ssize_t ret;
+	struct pid *pid;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	size_t total_len = iov_iter_count(iter);
+
+	if (flags != 0)
+		return -EINVAL;
+
+	pid = pidfd_get_pid(pidfd);
+	if (IS_ERR(pid))
+		return PTR_ERR(pid);
+
+	task = get_pid_task(pid, PIDTYPE_PID);
+	if (!task) {
+		ret = -ESRCH;
+		goto put_pid;
+	}
+
+	if (task->mm != current->mm &&
+			!process_madvise_behavior_valid(behavior)) {
+		ret = -EINVAL;
+		goto release_task;
+	}
+
+	mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
+	if (IS_ERR_OR_NULL(mm)) {
+		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
+		goto release_task;
+	}
+
+	ret = process_madvise_vec(task, mm, iter, behavior);
+	if (ret >= 0)
+		ret = total_len - iov_iter_count(iter);
+
+	mmput(mm);
+release_task:
+	put_task_struct(task);
+put_pid:
+	put_pid(pid);
+	return ret;
+}
+
+SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
+		unsigned long, vlen, int, behavior, unsigned int, flags)
+{
+	ssize_t ret;
+	struct iovec iovstack[UIO_FASTIOV];
+	struct iovec *iov = iovstack;
+	struct iov_iter iter;
+
+	ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
+	if (ret >= 0) {
+		ret = do_process_madvise(pidfd, &iter, behavior, flags);
+		kfree(iov);
+	}
+	return ret;
+}
+
+#ifdef CONFIG_COMPAT
+COMPAT_SYSCALL_DEFINE5(process_madvise, compat_int_t, pidfd,
+			const struct compat_iovec __user *, vec,
+			compat_ulong_t, vlen,
+			compat_int_t, behavior,
+			compat_uint_t, flags)
+
+{
+	ssize_t ret;
+	struct iovec iovstack[UIO_FASTIOV];
+	struct iovec *iov = iovstack;
+	struct iov_iter iter;
+
+	ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack),
+				&iov, &iter);
+	if (ret >= 0) {
+		ret = do_process_madvise(pidfd, &iter, behavior, flags);
+		kfree(iov);
+	}
+	return ret;
+}
+#endif
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 18/39] mm/madvise: check fatal signal pending of target process
  2020-08-15  0:29 incoming Andrew Morton
                   ` (16 preceding siblings ...)
  2020-08-15  0:30 ` [patch 17/39] mm/madvise: introduce process_madvise() syscall: an external memory hinting API Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 19/39] all arch: remove system call sys_sysctl Andrew Morton
                   ` (121 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, alexander.h.duyck, axboe, bgeffon, christian.brauner,
	christian, dancol, hannes, jannh, joaodias, joel, ktkhai,
	linux-man, linux-mm, mhocko, minchan, mm-commits, oleksandr,
	rientjes, shakeelb, sj38.park, sjpark, sonnyrao, sspatil, surenb,
	timmurray, torvalds, vbabka

From: Minchan Kim <minchan@kernel.org>
Subject: mm/madvise: check fatal signal pending of target process

Bail out to prevent unnecessary CPU overhead if target process has pending
fatal signal during (MADV_COLD|MADV_PAGEOUT) operation.

Link: http://lkml.kernel.org/r/20200302193630.68771-4-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200622192900.22757-5-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: <linux-man@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |   29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

--- a/mm/madvise.c~mm-madvise-check-fatal-signal-pending-of-target-process
+++ a/mm/madvise.c
@@ -39,6 +39,7 @@
 struct madvise_walk_private {
 	struct mmu_gather *tlb;
 	bool pageout;
+	struct task_struct *target_task;
 };
 
 /*
@@ -319,6 +320,10 @@ static int madvise_cold_or_pageout_pte_r
 	if (fatal_signal_pending(current))
 		return -EINTR;
 
+	if (private->target_task &&
+			fatal_signal_pending(private->target_task))
+		return -EINTR;
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	if (pmd_trans_huge(*pmd)) {
 		pmd_t orig_pmd;
@@ -480,12 +485,14 @@ static const struct mm_walk_ops cold_wal
 };
 
 static void madvise_cold_page_range(struct mmu_gather *tlb,
+			     struct task_struct *task,
 			     struct vm_area_struct *vma,
 			     unsigned long addr, unsigned long end)
 {
 	struct madvise_walk_private walk_private = {
 		.pageout = false,
 		.tlb = tlb,
+		.target_task = task,
 	};
 
 	tlb_start_vma(tlb, vma);
@@ -493,7 +500,8 @@ static void madvise_cold_page_range(stru
 	tlb_end_vma(tlb, vma);
 }
 
-static long madvise_cold(struct vm_area_struct *vma,
+static long madvise_cold(struct task_struct *task,
+			struct vm_area_struct *vma,
 			struct vm_area_struct **prev,
 			unsigned long start_addr, unsigned long end_addr)
 {
@@ -506,19 +514,21 @@ static long madvise_cold(struct vm_area_
 
 	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm, start_addr, end_addr);
-	madvise_cold_page_range(&tlb, vma, start_addr, end_addr);
+	madvise_cold_page_range(&tlb, task, vma, start_addr, end_addr);
 	tlb_finish_mmu(&tlb, start_addr, end_addr);
 
 	return 0;
 }
 
 static void madvise_pageout_page_range(struct mmu_gather *tlb,
+			     struct task_struct *task,
 			     struct vm_area_struct *vma,
 			     unsigned long addr, unsigned long end)
 {
 	struct madvise_walk_private walk_private = {
 		.pageout = true,
 		.tlb = tlb,
+		.target_task = task,
 	};
 
 	tlb_start_vma(tlb, vma);
@@ -542,7 +552,8 @@ static inline bool can_do_pageout(struct
 		inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0;
 }
 
-static long madvise_pageout(struct vm_area_struct *vma,
+static long madvise_pageout(struct task_struct *task,
+			struct vm_area_struct *vma,
 			struct vm_area_struct **prev,
 			unsigned long start_addr, unsigned long end_addr)
 {
@@ -558,7 +569,7 @@ static long madvise_pageout(struct vm_ar
 
 	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm, start_addr, end_addr);
-	madvise_pageout_page_range(&tlb, vma, start_addr, end_addr);
+	madvise_pageout_page_range(&tlb, task, vma, start_addr, end_addr);
 	tlb_finish_mmu(&tlb, start_addr, end_addr);
 
 	return 0;
@@ -938,7 +949,8 @@ static int madvise_inject_error(int beha
 #endif
 
 static long
-madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
+madvise_vma(struct task_struct *task, struct vm_area_struct *vma,
+		struct vm_area_struct **prev,
 		unsigned long start, unsigned long end, int behavior)
 {
 	switch (behavior) {
@@ -947,9 +959,9 @@ madvise_vma(struct vm_area_struct *vma,
 	case MADV_WILLNEED:
 		return madvise_willneed(vma, prev, start, end);
 	case MADV_COLD:
-		return madvise_cold(vma, prev, start, end);
+		return madvise_cold(task, vma, prev, start, end);
 	case MADV_PAGEOUT:
-		return madvise_pageout(vma, prev, start, end);
+		return madvise_pageout(task, vma, prev, start, end);
 	case MADV_FREE:
 	case MADV_DONTNEED:
 		return madvise_dontneed_free(vma, prev, start, end, behavior);
@@ -1166,7 +1178,8 @@ int do_madvise(struct task_struct *targe
 			tmp = end;
 
 		/* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
-		error = madvise_vma(vma, &prev, start, tmp, behavior);
+		error = madvise_vma(target_task, vma, &prev,
+					start, tmp, behavior);
 		if (error)
 			goto out;
 		start = tmp;
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 19/39] all arch: remove system call sys_sysctl
  2020-08-15  0:29 incoming Andrew Morton
                   ` (17 preceding siblings ...)
  2020-08-15  0:31 ` [patch 18/39] mm/madvise: check fatal signal pending of target process Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 20/39] mm/kmemleak: silence KCSAN splats in checksum Andrew Morton
                   ` (120 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: acme, ak, akpm, alexander.shishkin, arnd, axboe, bauerman, benh,
	bin.meng, borntraeger, bp, brgerst, catalin.marinas, chenzefeng2,
	chris, christian, cyphar, dalias, davem, deller, dhowells,
	dvyukov, ebiederm, elver, fenghua.yu, flameeyes, geert, gor,
	heiko.carstens, hpa, ink, James.Bottomley, jcmvbkbc, jolsa,
	jongk, keescook, krzk, linux-mm, linux, linux, luto,
	mark.rutland, martin.petersen, mattst88, mcgrof, minchan, mingo,
	mm-commits, monstr, mpe, mszeredi, namhyung, naveen.n.rao,
	nixiaoming, npiggin, oleg, olof, paulburton, paulmck, paulus,
	peterz, ravi.bangoria, rdunlap, rth, samitolvanen, sargun, sfr,
	sudeep.holla, svens, tglx, tony.luck, torvalds, tsbogend, vbabka,
	viro, will, yamada.masahiro, ysato, yzaikin, zhouyanjie

From: Xiaoming Ni <nixiaoming@huawei.com>
Subject: all arch: remove system call sys_sysctl

Since commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
sys_sysctl is actually unavailable: any input can only return an error.

We have been warning about people using the sysctl system call for years
and believe there are no more users.  Even if there are users of this
interface if they have not complained or fixed their code by now they
probably are not going to, so there is no point in warning them any
longer.

So completely remove sys_sysctl on all architectures.

[nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
 Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.com
Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.com
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Acked-by: Will Deacon <will@kernel.org>		[arm/arm64]
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: chenzefeng <chenzefeng2@huawei.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kars de Jong <jongk@linux-m68k.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Paul Burton <paulburton@kernel.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Sven Schnelle <svens@stackframe.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/alpha/kernel/syscalls/syscall.tbl             |    2 
 arch/arm/configs/am200epdkit_defconfig             |    1 
 arch/arm/tools/syscall.tbl                         |    2 
 arch/arm64/include/asm/unistd32.h                  |    4 
 arch/ia64/kernel/syscalls/syscall.tbl              |    2 
 arch/m68k/kernel/syscalls/syscall.tbl              |    2 
 arch/microblaze/kernel/syscalls/syscall.tbl        |    2 
 arch/mips/configs/cu1000-neo_defconfig             |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl          |    2 
 arch/mips/kernel/syscalls/syscall_n64.tbl          |    2 
 arch/mips/kernel/syscalls/syscall_o32.tbl          |    2 
 arch/parisc/kernel/syscalls/syscall.tbl            |    2 
 arch/powerpc/kernel/syscalls/syscall.tbl           |    2 
 arch/s390/kernel/syscalls/syscall.tbl              |    2 
 arch/sh/configs/dreamcast_defconfig                |    1 
 arch/sh/configs/espt_defconfig                     |    1 
 arch/sh/configs/hp6xx_defconfig                    |    1 
 arch/sh/configs/landisk_defconfig                  |    1 
 arch/sh/configs/lboxre2_defconfig                  |    1 
 arch/sh/configs/microdev_defconfig                 |    1 
 arch/sh/configs/migor_defconfig                    |    1 
 arch/sh/configs/r7780mp_defconfig                  |    1 
 arch/sh/configs/r7785rp_defconfig                  |    1 
 arch/sh/configs/rts7751r2d1_defconfig              |    1 
 arch/sh/configs/rts7751r2dplus_defconfig           |    1 
 arch/sh/configs/se7206_defconfig                   |    1 
 arch/sh/configs/se7343_defconfig                   |    1 
 arch/sh/configs/se7619_defconfig                   |    1 
 arch/sh/configs/se7705_defconfig                   |    1 
 arch/sh/configs/se7750_defconfig                   |    1 
 arch/sh/configs/se7751_defconfig                   |    1 
 arch/sh/configs/secureedge5410_defconfig           |    1 
 arch/sh/configs/sh03_defconfig                     |    1 
 arch/sh/configs/sh7710voipgw_defconfig             |    1 
 arch/sh/configs/sh7757lcr_defconfig                |    1 
 arch/sh/configs/sh7763rdp_defconfig                |    1 
 arch/sh/configs/shmin_defconfig                    |    1 
 arch/sh/configs/titan_defconfig                    |    1 
 arch/sh/kernel/syscalls/syscall.tbl                |    2 
 arch/sparc/kernel/syscalls/syscall.tbl             |    2 
 arch/x86/entry/syscalls/syscall_32.tbl             |    2 
 arch/x86/entry/syscalls/syscall_64.tbl             |    2 
 arch/xtensa/kernel/syscalls/syscall.tbl            |    2 
 include/linux/compat.h                             |    1 
 include/linux/syscalls.h                           |    2 
 include/linux/sysctl.h                             |    6 
 kernel/Makefile                                    |    2 
 kernel/sys_ni.c                                    |    1 
 kernel/sysctl_binary.c                             |  171 -----------
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |    2 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl    |    2 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |    2 
 52 files changed, 24 insertions(+), 227 deletions(-)

--- a/arch/alpha/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/alpha/kernel/syscalls/syscall.tbl
@@ -249,7 +249,7 @@
 316	common	mlockall			sys_mlockall
 317	common	munlockall			sys_munlockall
 318	common	sysinfo				sys_sysinfo
-319	common	_sysctl				sys_sysctl
+319	common	_sysctl				sys_ni_syscall
 # 320 was sys_idle
 321	common	oldumount			sys_oldumount
 322	common	swapon				sys_swapon
--- a/arch/arm64/include/asm/unistd32.h~all-arch-remove-system-call-sys_sysctl
+++ a/arch/arm64/include/asm/unistd32.h
@@ -308,8 +308,8 @@ __SYSCALL(__NR_writev, compat_sys_writev
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
 __SYSCALL(__NR_fdatasync, sys_fdatasync)
-#define __NR__sysctl 149
-__SYSCALL(__NR__sysctl, compat_sys_sysctl)
+			/* 149 was sys_sysctl */
+__SYSCALL(149, sys_ni_syscall)
 #define __NR_mlock 150
 __SYSCALL(__NR_mlock, sys_mlock)
 #define __NR_munlock 151
--- a/arch/arm/configs/am200epdkit_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/arm/configs/am200epdkit_defconfig
@@ -3,7 +3,6 @@ CONFIG_LOCALVERSION="gum"
 CONFIG_SYSVIPC=y
 CONFIG_SYSFS_DEPRECATED_V2=y
 CONFIG_EXPERT=y
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_EPOLL is not set
 # CONFIG_SHMEM is not set
 # CONFIG_VM_EVENT_COUNTERS is not set
--- a/arch/arm/tools/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/arm/tools/syscall.tbl
@@ -162,7 +162,7 @@
 146	common	writev			sys_writev
 147	common	getsid			sys_getsid
 148	common	fdatasync		sys_fdatasync
-149	common	_sysctl			sys_sysctl
+149	common	_sysctl			sys_ni_syscall
 150	common	mlock			sys_mlock
 151	common	munlock			sys_munlock
 152	common	mlockall		sys_mlockall
--- a/arch/ia64/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/ia64/kernel/syscalls/syscall.tbl
@@ -135,7 +135,7 @@
 123	common	writev				sys_writev
 124	common	pread64				sys_pread64
 125	common	pwrite64			sys_pwrite64
-126	common	_sysctl				sys_sysctl
+126	common	_sysctl				sys_ni_syscall
 127	common	mmap				sys_mmap
 128	common	munmap				sys_munmap
 129	common	mlock				sys_mlock
--- a/arch/m68k/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/m68k/kernel/syscalls/syscall.tbl
@@ -156,7 +156,7 @@
 146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
-149	common	_sysctl				sys_sysctl
+149	common	_sysctl				sys_ni_syscall
 150	common	mlock				sys_mlock
 151	common	munlock				sys_munlock
 152	common	mlockall			sys_mlockall
--- a/arch/microblaze/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -156,7 +156,7 @@
 146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
-149	common	_sysctl				sys_sysctl
+149	common	_sysctl				sys_ni_syscall
 150	common	mlock				sys_mlock
 151	common	munlock				sys_munlock
 152	common	mlockall			sys_mlockall
--- a/arch/mips/configs/cu1000-neo_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/mips/configs/cu1000-neo_defconfig
@@ -17,7 +17,6 @@ CONFIG_CGROUP_CPUACCT=y
 CONFIG_NAMESPACES=y
 CONFIG_USER_NS=y
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
-CONFIG_SYSCTL_SYSCALL=y
 CONFIG_KALLSYMS_ALL=y
 CONFIG_EMBEDDED=y
 # CONFIG_VM_EVENT_COUNTERS is not set
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -159,7 +159,7 @@
 149	n32	munlockall			sys_munlockall
 150	n32	vhangup				sys_vhangup
 151	n32	pivot_root			sys_pivot_root
-152	n32	_sysctl				compat_sys_sysctl
+152	n32	_sysctl				sys_ni_syscall
 153	n32	prctl				sys_prctl
 154	n32	adjtimex			sys_adjtimex_time32
 155	n32	setrlimit			compat_sys_setrlimit
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -159,7 +159,7 @@
 149	n64	munlockall			sys_munlockall
 150	n64	vhangup				sys_vhangup
 151	n64	pivot_root			sys_pivot_root
-152	n64	_sysctl				sys_sysctl
+152	n64	_sysctl				sys_ni_syscall
 153	n64	prctl				sys_prctl
 154	n64	adjtimex			sys_adjtimex
 155	n64	setrlimit			sys_setrlimit
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -164,7 +164,7 @@
 150	o32	unused150			sys_ni_syscall
 151	o32	getsid				sys_getsid
 152	o32	fdatasync			sys_fdatasync
-153	o32	_sysctl				sys_sysctl			compat_sys_sysctl
+153	o32	_sysctl				sys_ni_syscall
 154	o32	mlock				sys_mlock
 155	o32	munlock				sys_munlock
 156	o32	mlockall			sys_mlockall
--- a/arch/parisc/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/parisc/kernel/syscalls/syscall.tbl
@@ -163,7 +163,7 @@
 146	common	writev			sys_writev			compat_sys_writev
 147	common	getsid			sys_getsid
 148	common	fdatasync		sys_fdatasync
-149	common	_sysctl			sys_sysctl			compat_sys_sysctl
+149	common	_sysctl			sys_ni_syscall
 150	common	mlock			sys_mlock
 151	common	munlock			sys_munlock
 152	common	mlockall		sys_mlockall
--- a/arch/powerpc/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -197,7 +197,7 @@
 146	common	writev				sys_writev			compat_sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
-149	nospu	_sysctl				sys_sysctl			compat_sys_sysctl
+149	nospu	_sysctl				sys_ni_syscall
 150	common	mlock				sys_mlock
 151	common	munlock				sys_munlock
 152	common	mlockall			sys_mlockall
--- a/arch/s390/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/s390/kernel/syscalls/syscall.tbl
@@ -138,7 +138,7 @@
 146  common	writev			sys_writev			compat_sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
-149  common	_sysctl			sys_sysctl			compat_sys_sysctl
+149  common	_sysctl			-				-
 150  common	mlock			sys_mlock			sys_mlock
 151  common	munlock			sys_munlock			sys_munlock
 152  common	mlockall		sys_mlockall			sys_mlockall
--- a/arch/sh/configs/dreamcast_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/dreamcast_defconfig
@@ -1,7 +1,6 @@
 CONFIG_SYSVIPC=y
 CONFIG_BSD_PROCESS_ACCT=y
 CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 CONFIG_PROFILING=y
 CONFIG_MODULES=y
--- a/arch/sh/configs/espt_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/espt_defconfig
@@ -5,7 +5,6 @@ CONFIG_LOG_BUF_SHIFT=14
 CONFIG_NAMESPACES=y
 CONFIG_UTS_NS=y
 CONFIG_IPC_NS=y
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 CONFIG_PROFILING=y
 CONFIG_OPROFILE=y
--- a/arch/sh/configs/hp6xx_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/hp6xx_defconfig
@@ -3,7 +3,6 @@ CONFIG_IKCONFIG=y
 CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=14
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_CPU_SUBTYPE_SH7709=y
--- a/arch/sh/configs/landisk_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/landisk_defconfig
@@ -1,6 +1,5 @@
 CONFIG_SYSVIPC=y
 CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_KALLSYMS_EXTRA_PASS=y
 CONFIG_SLAB=y
 CONFIG_MODULES=y
--- a/arch/sh/configs/lboxre2_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/lboxre2_defconfig
@@ -1,6 +1,5 @@
 CONFIG_SYSVIPC=y
 CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_KALLSYMS_EXTRA_PASS=y
 CONFIG_SLAB=y
 CONFIG_MODULES=y
--- a/arch/sh/configs/microdev_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/microdev_defconfig
@@ -2,7 +2,6 @@ CONFIG_BSD_PROCESS_ACCT=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_BLK_DEV_INITRD=y
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_CPU_SUBTYPE_SH4_202=y
--- a/arch/sh/configs/migor_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/migor_defconfig
@@ -4,7 +4,6 @@ CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_BLK_DEV_INITRD=y
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 CONFIG_PROFILING=y
 CONFIG_OPROFILE=y
--- a/arch/sh/configs/r7780mp_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/r7780mp_defconfig
@@ -3,7 +3,6 @@ CONFIG_BSD_PROCESS_ACCT=y
 CONFIG_IKCONFIG=y
 CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_FUTEX is not set
 # CONFIG_EPOLL is not set
 CONFIG_SLAB=y
--- a/arch/sh/configs/r7785rp_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/r7785rp_defconfig
@@ -7,7 +7,6 @@ CONFIG_RCU_TRACE=y
 CONFIG_IKCONFIG=y
 CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 CONFIG_PROFILING=y
 CONFIG_OPROFILE=y
--- a/arch/sh/configs/rts7751r2d1_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/rts7751r2d1_defconfig
@@ -1,7 +1,6 @@
 CONFIG_SYSVIPC=y
 CONFIG_LOG_BUF_SHIFT=14
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 CONFIG_PROFILING=y
 CONFIG_OPROFILE=y
--- a/arch/sh/configs/rts7751r2dplus_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/rts7751r2dplus_defconfig
@@ -1,7 +1,6 @@
 CONFIG_SYSVIPC=y
 CONFIG_LOG_BUF_SHIFT=14
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 CONFIG_PROFILING=y
 CONFIG_OPROFILE=y
--- a/arch/sh/configs/se7206_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/se7206_defconfig
@@ -18,7 +18,6 @@ CONFIG_USER_NS=y
 CONFIG_PID_NS=y
 CONFIG_BLK_DEV_INITRD=y
 # CONFIG_UID16 is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_KALLSYMS_ALL=y
 # CONFIG_ELF_CORE is not set
 # CONFIG_COMPAT_BRK is not set
--- a/arch/sh/configs/se7343_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/se7343_defconfig
@@ -2,7 +2,6 @@
 CONFIG_SYSVIPC=y
 CONFIG_POSIX_MQUEUE=y
 CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_FUTEX is not set
 # CONFIG_EPOLL is not set
 # CONFIG_SHMEM is not set
--- a/arch/sh/configs/se7619_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/se7619_defconfig
@@ -1,7 +1,6 @@
 # CONFIG_LOCALVERSION_AUTO is not set
 CONFIG_LOG_BUF_SHIFT=14
 # CONFIG_UID16 is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_KALLSYMS is not set
 # CONFIG_HOTPLUG is not set
 # CONFIG_ELF_CORE is not set
--- a/arch/sh/configs/se7705_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/se7705_defconfig
@@ -2,7 +2,6 @@
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_BLK_DEV_INITRD=y
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_KALLSYMS is not set
 # CONFIG_HOTPLUG is not set
 CONFIG_SLAB=y
--- a/arch/sh/configs/se7750_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/se7750_defconfig
@@ -5,7 +5,6 @@ CONFIG_IKCONFIG=y
 CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=14
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_HOTPLUG is not set
 CONFIG_SLAB=y
 CONFIG_MODULES=y
--- a/arch/sh/configs/se7751_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/se7751_defconfig
@@ -3,7 +3,6 @@ CONFIG_BSD_PROCESS_ACCT=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_BLK_DEV_INITRD=y
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_HOTPLUG is not set
 CONFIG_SLAB=y
 CONFIG_MODULES=y
--- a/arch/sh/configs/secureedge5410_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/secureedge5410_defconfig
@@ -1,7 +1,6 @@
 # CONFIG_SWAP is not set
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_HOTPLUG is not set
 CONFIG_SLAB=y
 # CONFIG_BLK_DEV_BSG is not set
--- a/arch/sh/configs/sh03_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/sh03_defconfig
@@ -3,7 +3,6 @@ CONFIG_POSIX_MQUEUE=y
 CONFIG_BSD_PROCESS_ACCT=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 CONFIG_PROFILING=y
 CONFIG_OPROFILE=m
--- a/arch/sh/configs/sh7710voipgw_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/sh7710voipgw_defconfig
@@ -2,7 +2,6 @@
 CONFIG_SYSVIPC=y
 CONFIG_POSIX_MQUEUE=y
 CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_FUTEX is not set
 # CONFIG_EPOLL is not set
 # CONFIG_SHMEM is not set
--- a/arch/sh/configs/sh7757lcr_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/sh7757lcr_defconfig
@@ -8,7 +8,6 @@ CONFIG_TASK_XACCT=y
 CONFIG_TASK_IO_ACCOUNTING=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_KALLSYMS_ALL=y
 CONFIG_SLAB=y
 CONFIG_MODULES=y
--- a/arch/sh/configs/sh7763rdp_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/sh7763rdp_defconfig
@@ -5,7 +5,6 @@ CONFIG_LOG_BUF_SHIFT=14
 CONFIG_NAMESPACES=y
 CONFIG_UTS_NS=y
 CONFIG_IPC_NS=y
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 CONFIG_PROFILING=y
 CONFIG_OPROFILE=y
--- a/arch/sh/configs/shmin_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/shmin_defconfig
@@ -1,7 +1,6 @@
 # CONFIG_SWAP is not set
 CONFIG_LOG_BUF_SHIFT=14
 # CONFIG_UID16 is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 # CONFIG_KALLSYMS is not set
 # CONFIG_HOTPLUG is not set
 # CONFIG_BUG is not set
--- a/arch/sh/configs/titan_defconfig~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/configs/titan_defconfig
@@ -6,7 +6,6 @@ CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=16
 CONFIG_BLK_DEV_INITRD=y
 # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_SLAB=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
--- a/arch/sh/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sh/kernel/syscalls/syscall.tbl
@@ -156,7 +156,7 @@
 146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
-149	common	_sysctl				sys_sysctl
+149	common	_sysctl				sys_ni_syscall
 150	common	mlock				sys_mlock
 151	common	munlock				sys_munlock
 152	common	mlockall			sys_mlockall
--- a/arch/sparc/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/sparc/kernel/syscalls/syscall.tbl
@@ -300,7 +300,7 @@
 249	64	nanosleep		sys_nanosleep
 250	32	mremap			sys_mremap
 250	64	mremap			sys_64_mremap
-251	common	_sysctl			sys_sysctl			compat_sys_sysctl
+251	common	_sysctl			sys_ni_syscall
 252	common	getsid			sys_getsid
 253	common	fdatasync		sys_fdatasync
 254	32	nfsservctl		sys_ni_syscall			sys_nis_syscall
--- a/arch/x86/entry/syscalls/syscall_32.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/x86/entry/syscalls/syscall_32.tbl
@@ -160,7 +160,7 @@
 146	i386	writev			sys_writev			compat_sys_writev
 147	i386	getsid			sys_getsid
 148	i386	fdatasync		sys_fdatasync
-149	i386	_sysctl			sys_sysctl			compat_sys_sysctl
+149	i386	_sysctl			sys_ni_syscall
 150	i386	mlock			sys_mlock
 151	i386	munlock			sys_munlock
 152	i386	mlockall		sys_mlockall
--- a/arch/x86/entry/syscalls/syscall_64.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/x86/entry/syscalls/syscall_64.tbl
@@ -164,7 +164,7 @@
 153	common	vhangup			sys_vhangup
 154	common	modify_ldt		sys_modify_ldt
 155	common	pivot_root		sys_pivot_root
-156	64	_sysctl			sys_sysctl
+156	64	_sysctl			sys_ni_syscall
 157	common	prctl			sys_prctl
 158	common	arch_prctl		sys_arch_prctl
 159	common	adjtimex		sys_adjtimex
--- a/arch/xtensa/kernel/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -222,7 +222,7 @@
 204	common	quotactl			sys_quotactl
 # 205 was old nfsservctl
 205	common	nfsservctl			sys_ni_syscall
-206	common	_sysctl				sys_sysctl
+206	common	_sysctl				sys_ni_syscall
 207	common	bdflush				sys_bdflush
 208	common	uname				sys_newuname
 209	common	sysinfo				sys_sysinfo
--- a/include/linux/compat.h~all-arch-remove-system-call-sys_sysctl
+++ a/include/linux/compat.h
@@ -855,7 +855,6 @@ asmlinkage long compat_sys_select(int n,
 asmlinkage long compat_sys_ustat(unsigned dev, struct compat_ustat __user *u32);
 asmlinkage long compat_sys_recv(int fd, void __user *buf, compat_size_t len,
 				unsigned flags);
-asmlinkage long compat_sys_sysctl(struct compat_sysctl_args __user *args);
 
 /* obsolete: fs/readdir.c */
 asmlinkage long compat_sys_old_readdir(unsigned int fd,
--- a/include/linux/syscalls.h~all-arch-remove-system-call-sys_sysctl
+++ a/include/linux/syscalls.h
@@ -47,7 +47,6 @@ struct stat64;
 struct statfs;
 struct statfs64;
 struct statx;
-struct __sysctl_args;
 struct sysinfo;
 struct timespec;
 struct __kernel_old_timeval;
@@ -1119,7 +1118,6 @@ asmlinkage long sys_send(int, void __use
 asmlinkage long sys_bdflush(int func, long data);
 asmlinkage long sys_oldumount(char __user *name);
 asmlinkage long sys_uselib(const char __user *library);
-asmlinkage long sys_sysctl(struct __sysctl_args __user *args);
 asmlinkage long sys_sysfs(int option,
 				unsigned long arg1, unsigned long arg2);
 asmlinkage long sys_fork(void);
--- a/include/linux/sysctl.h~all-arch-remove-system-call-sys_sysctl
+++ a/include/linux/sysctl.h
@@ -74,15 +74,13 @@ int proc_do_static_key(struct ctl_table
  * sysctl names can be mirrored automatically under /proc/sys.  The
  * procname supplied controls /proc naming.
  *
- * The table's mode will be honoured both for sys_sysctl(2) and
- * proc-fs access.
+ * The table's mode will be honoured for proc-fs access.
  *
  * Leaf nodes in the sysctl tree will be represented by a single file
  * under /proc; non-leaf nodes will be represented by directories.  A
  * null procname disables /proc mirroring at this node.
  *
- * sysctl(2) can automatically manage read and write requests through
- * the sysctl table.  The data and maxlen fields of the ctl_table
+ * The data and maxlen fields of the ctl_table
  * struct enable minimal validation of the values being written to be
  * performed, and the mode field allows minimal authentication.
  * 
--- a/kernel/Makefile~all-arch-remove-system-call-sys_sysctl
+++ a/kernel/Makefile
@@ -5,7 +5,7 @@
 
 obj-y     = fork.o exec_domain.o panic.o \
 	    cpu.o exit.o softirq.o resource.o \
-	    sysctl.o sysctl_binary.o capability.o ptrace.o user.o \
+	    sysctl.o capability.o ptrace.o user.o \
 	    signal.o sys.o umh.o workqueue.o pid.o task_work.o \
 	    extable.o params.o \
 	    kthread.o sys_ni.o nsproxy.o \
--- a/kernel/sysctl_binary.c
+++ /dev/null
@@ -1,171 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/stat.h>
-#include <linux/sysctl.h>
-#include "../fs/xfs/xfs_sysctl.h"
-#include <linux/sunrpc/debug.h>
-#include <linux/string.h>
-#include <linux/syscalls.h>
-#include <linux/namei.h>
-#include <linux/mount.h>
-#include <linux/fs.h>
-#include <linux/nsproxy.h>
-#include <linux/pid_namespace.h>
-#include <linux/file.h>
-#include <linux/ctype.h>
-#include <linux/netdevice.h>
-#include <linux/kernel.h>
-#include <linux/uuid.h>
-#include <linux/slab.h>
-#include <linux/compat.h>
-
-static ssize_t binary_sysctl(const int *name, int nlen,
-	void __user *oldval, size_t oldlen, void __user *newval, size_t newlen)
-{
-	return -ENOSYS;
-}
-
-static void deprecated_sysctl_warning(const int *name, int nlen)
-{
-	int i;
-
-	/*
-	 * CTL_KERN/KERN_VERSION is used by older glibc and cannot
-	 * ever go away.
-	 */
-	if (nlen >= 2 && name[0] == CTL_KERN && name[1] == KERN_VERSION)
-		return;
-
-	if (printk_ratelimit()) {
-		printk(KERN_INFO
-			"warning: process `%s' used the deprecated sysctl "
-			"system call with ", current->comm);
-		for (i = 0; i < nlen; i++)
-			printk(KERN_CONT "%d.", name[i]);
-		printk(KERN_CONT "\n");
-	}
-	return;
-}
-
-#define WARN_ONCE_HASH_BITS 8
-#define WARN_ONCE_HASH_SIZE (1<<WARN_ONCE_HASH_BITS)
-
-static DECLARE_BITMAP(warn_once_bitmap, WARN_ONCE_HASH_SIZE);
-
-#define FNV32_OFFSET 2166136261U
-#define FNV32_PRIME 0x01000193
-
-/*
- * Print each legacy sysctl (approximately) only once.
- * To avoid making the tables non-const use a external
- * hash-table instead.
- * Worst case hash collision: 6, but very rarely.
- * NOTE! We don't use the SMP-safe bit tests. We simply
- * don't care enough.
- */
-static void warn_on_bintable(const int *name, int nlen)
-{
-	int i;
-	u32 hash = FNV32_OFFSET;
-
-	for (i = 0; i < nlen; i++)
-		hash = (hash ^ name[i]) * FNV32_PRIME;
-	hash %= WARN_ONCE_HASH_SIZE;
-	if (__test_and_set_bit(hash, warn_once_bitmap))
-		return;
-	deprecated_sysctl_warning(name, nlen);
-}
-
-static ssize_t do_sysctl(int __user *args_name, int nlen,
-	void __user *oldval, size_t oldlen, void __user *newval, size_t newlen)
-{
-	int name[CTL_MAXNAME];
-	int i;
-
-	/* Check args->nlen. */
-	if (nlen < 0 || nlen > CTL_MAXNAME)
-		return -ENOTDIR;
-	/* Read in the sysctl name for simplicity */
-	for (i = 0; i < nlen; i++)
-		if (get_user(name[i], args_name + i))
-			return -EFAULT;
-
-	warn_on_bintable(name, nlen);
-
-	return binary_sysctl(name, nlen, oldval, oldlen, newval, newlen);
-}
-
-SYSCALL_DEFINE1(sysctl, struct __sysctl_args __user *, args)
-{
-	struct __sysctl_args tmp;
-	size_t oldlen = 0;
-	ssize_t result;
-
-	if (copy_from_user(&tmp, args, sizeof(tmp)))
-		return -EFAULT;
-
-	if (tmp.oldval && !tmp.oldlenp)
-		return -EFAULT;
-
-	if (tmp.oldlenp && get_user(oldlen, tmp.oldlenp))
-		return -EFAULT;
-
-	result = do_sysctl(tmp.name, tmp.nlen, tmp.oldval, oldlen,
-			   tmp.newval, tmp.newlen);
-
-	if (result >= 0) {
-		oldlen = result;
-		result = 0;
-	}
-
-	if (tmp.oldlenp && put_user(oldlen, tmp.oldlenp))
-		return -EFAULT;
-
-	return result;
-}
-
-
-#ifdef CONFIG_COMPAT
-
-struct compat_sysctl_args {
-	compat_uptr_t	name;
-	int		nlen;
-	compat_uptr_t	oldval;
-	compat_uptr_t	oldlenp;
-	compat_uptr_t	newval;
-	compat_size_t	newlen;
-	compat_ulong_t	__unused[4];
-};
-
-COMPAT_SYSCALL_DEFINE1(sysctl, struct compat_sysctl_args __user *, args)
-{
-	struct compat_sysctl_args tmp;
-	compat_size_t __user *compat_oldlenp;
-	size_t oldlen = 0;
-	ssize_t result;
-
-	if (copy_from_user(&tmp, args, sizeof(tmp)))
-		return -EFAULT;
-
-	if (tmp.oldval && !tmp.oldlenp)
-		return -EFAULT;
-
-	compat_oldlenp = compat_ptr(tmp.oldlenp);
-	if (compat_oldlenp && get_user(oldlen, compat_oldlenp))
-		return -EFAULT;
-
-	result = do_sysctl(compat_ptr(tmp.name), tmp.nlen,
-			   compat_ptr(tmp.oldval), oldlen,
-			   compat_ptr(tmp.newval), tmp.newlen);
-
-	if (result >= 0) {
-		oldlen = result;
-		result = 0;
-	}
-
-	if (compat_oldlenp && put_user(oldlen, compat_oldlenp))
-		return -EFAULT;
-
-	return result;
-}
-
-#endif /* CONFIG_COMPAT */
--- a/kernel/sys_ni.c~all-arch-remove-system-call-sys_sysctl
+++ a/kernel/sys_ni.c
@@ -366,7 +366,6 @@ COND_SYSCALL(socketcall);
 COND_SYSCALL_COMPAT(socketcall);
 
 /* compat syscalls for arm64, x86, ... */
-COND_SYSCALL_COMPAT(sysctl);
 COND_SYSCALL_COMPAT(fanotify_mark);
 
 /* x86 */
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -193,7 +193,7 @@
 146	common	writev				sys_writev			compat_sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
-149	nospu	_sysctl				sys_sysctl			compat_sys_sysctl
+149	nospu	_sysctl				sys_ni_syscall
 150	common	mlock				sys_mlock
 151	common	munlock				sys_munlock
 152	common	mlockall			sys_mlockall
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -138,7 +138,7 @@
 146  common	writev			sys_writev			compat_sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
-149  common	_sysctl			sys_sysctl			compat_sys_sysctl
+149  common	_sysctl			-				-
 150  common	mlock			sys_mlock			compat_sys_mlock
 151  common	munlock			sys_munlock			compat_sys_munlock
 152  common	mlockall		sys_mlockall			sys_mlockall
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl~all-arch-remove-system-call-sys_sysctl
+++ a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -164,7 +164,7 @@
 153	common	vhangup			sys_vhangup
 154	common	modify_ldt		sys_modify_ldt
 155	common	pivot_root		sys_pivot_root
-156	64	_sysctl			sys_sysctl
+156	64	_sysctl			sys_ni_syscall
 157	common	prctl			sys_prctl
 158	common	arch_prctl		sys_arch_prctl
 159	common	adjtimex		sys_adjtimex
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 20/39] mm/kmemleak: silence KCSAN splats in checksum
  2020-08-15  0:29 incoming Andrew Morton
                   ` (18 preceding siblings ...)
  2020-08-15  0:31 ` [patch 19/39] all arch: remove system call sys_sysctl Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 21/39] mm/frontswap: mark various intentional data races Andrew Morton
                   ` (119 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, catalin.marinas, elver, linux-mm, mm-commits, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/kmemleak: silence KCSAN splats in checksum

Even if KCSAN is disabled for kmemleak, update_checksum() could still call
crc32() (which is outside of kmemleak.c) to dereference object->pointer. 
Thus, the value of object->pointer could be accessed concurrently as
noticed by KCSAN,

 BUG: KCSAN: data-race in crc32_le_base / do_raw_spin_lock

 write to 0xffffb0ea683a7d50 of 4 bytes by task 23575 on cpu 12:
  do_raw_spin_lock+0x114/0x200
  debug_spin_lock_after at kernel/locking/spinlock_debug.c:91
  (inlined by) do_raw_spin_lock at kernel/locking/spinlock_debug.c:115
  _raw_spin_lock+0x40/0x50
  __handle_mm_fault+0xa9e/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffffb0ea683a7d50 of 4 bytes by task 839 on cpu 60:
  crc32_le_base+0x67/0x350
  crc32_le_base+0x67/0x350:
  crc32_body at lib/crc32.c:106
  (inlined by) crc32_le_generic at lib/crc32.c:179
  (inlined by) crc32_le at lib/crc32.c:197
  kmemleak_scan+0x528/0xd90
  update_checksum at mm/kmemleak.c:1172
  (inlined by) kmemleak_scan at mm/kmemleak.c:1497
  kmemleak_scan_thread+0xcc/0xfa
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

If a shattered value was returned due to a data race, it will be corrected
in the next scan.  Thus, let KCSAN ignore all reads in the region to
silence KCSAN in case the write side is non-atomic.

Link: http://lkml.kernel.org/r/20200317182754.2180-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Suggested-by: Marco Elver <elver@google.com>
Acked-by: Marco Elver <elver@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kmemleak.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/mm/kmemleak.c~mm-kmemleak-silence-kcsan-splats-in-checksum
+++ a/mm/kmemleak.c
@@ -1169,8 +1169,10 @@ static bool update_checksum(struct kmeml
 	u32 old_csum = object->checksum;
 
 	kasan_disable_current();
+	kcsan_disable_current();
 	object->checksum = crc32(0, (void *)object->pointer, object->size);
 	kasan_enable_current();
+	kcsan_enable_current();
 
 	return object->checksum != old_csum;
 }
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 21/39] mm/frontswap: mark various intentional data races
  2020-08-15  0:29 incoming Andrew Morton
                   ` (19 preceding siblings ...)
  2020-08-15  0:31 ` [patch 20/39] mm/kmemleak: silence KCSAN splats in checksum Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 22/39] mm/page_io: " Andrew Morton
                   ` (118 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, elver, konrad.wilk, linux-mm, mm-commits, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/frontswap: mark various intentional data races

There are a few information counters that are intentionally not protected
against increment races, so just annotate them using the data_race()
macro.

 BUG: KCSAN: data-race in __frontswap_store / __frontswap_store

 write to 0xffffffff8b7174d8 of 8 bytes by task 6396 on cpu 103:
  __frontswap_store+0x2d0/0x344
  inc_frontswap_failed_stores at mm/frontswap.c:70
  (inlined by) __frontswap_store at mm/frontswap.c:280
  swap_writepage+0x83/0xf0
  pageout+0x33e/0xae0
  shrink_page_list+0x1f57/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffffffff8b7174d8 of 8 bytes by task 6405 on cpu 47:
  __frontswap_store+0x2b9/0x344
  inc_frontswap_failed_stores at mm/frontswap.c:70
  (inlined by) __frontswap_store at mm/frontswap.c:280
  swap_writepage+0x83/0xf0
  pageout+0x33e/0xae0
  shrink_page_list+0x1f57/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

Link: http://lkml.kernel.org/r/1581114499-5042-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Marco Elver <elver@google.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/frontswap.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/mm/frontswap.c~mm-frontswap-mark-various-intentional-data-races
+++ a/mm/frontswap.c
@@ -61,16 +61,16 @@ static u64 frontswap_failed_stores;
 static u64 frontswap_invalidates;
 
 static inline void inc_frontswap_loads(void) {
-	frontswap_loads++;
+	data_race(frontswap_loads++);
 }
 static inline void inc_frontswap_succ_stores(void) {
-	frontswap_succ_stores++;
+	data_race(frontswap_succ_stores++);
 }
 static inline void inc_frontswap_failed_stores(void) {
-	frontswap_failed_stores++;
+	data_race(frontswap_failed_stores++);
 }
 static inline void inc_frontswap_invalidates(void) {
-	frontswap_invalidates++;
+	data_race(frontswap_invalidates++);
 }
 #else
 static inline void inc_frontswap_loads(void) { }
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 22/39] mm/page_io: mark various intentional data races
  2020-08-15  0:29 incoming Andrew Morton
                   ` (20 preceding siblings ...)
  2020-08-15  0:31 ` [patch 21/39] mm/frontswap: mark various intentional data races Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 23/39] mm/swap_state: " Andrew Morton
                   ` (117 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, elver, linux-mm, mm-commits, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/page_io: mark various intentional data races

struct swap_info_struct si.flags could be accessed concurrently as noticed
by KCSAN,

 BUG: KCSAN: data-race in scan_swap_map_slots / swap_readpage

 write to 0xffff9c77b80ac400 of 8 bytes by task 91325 on cpu 16:
  scan_swap_map_slots+0x6fe/0xb50
  scan_swap_map_slots at mm/swapfile.c:887
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0x377/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1740/0x2820
  shrink_inactive_list+0x316/0x8b0
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9c77b80ac400 of 8 bytes by task 5422 on cpu 7:
  swap_readpage+0x204/0x6a0
  swap_readpage at mm/page_io.c:380
  read_swap_cache_async+0xa2/0xb0
  swapin_readahead+0x6a0/0x890
  do_swap_page+0x465/0xeb0
  __handle_mm_fault+0xc7a/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 7 PID: 5422 Comm: gmain Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

Other reads,

 read to 0xffff91ea33eac400 of 8 bytes by task 11276 on cpu 120:
  __swap_writepage+0x140/0xc20
  __swap_writepage at mm/page_io.c:289

 read to 0xffff91ea33eac400 of 8 bytes by task 11264 on cpu 16:
  swap_set_page_dirty+0x44/0x1f4
  swap_set_page_dirty at mm/page_io.c:442

The write is under &si->lock, but the reads are done as lockless.  Since
the reads only check for a specific bit in the flag, it is harmless even
if load tearing happens.  Thus, just mark them as intentional data races
using the data_race() macro.

[cai@lca.pw: add a missing annotation]
  Link: http://lkml.kernel.org/r/1581612585-5812-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/20200207003601.1526-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_io.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/mm/page_io.c~mm-page_io-mark-various-intentional-data-races
+++ a/mm/page_io.c
@@ -85,7 +85,7 @@ static void swap_slot_free_notify(struct
 		return;
 
 	sis = page_swap_info(page);
-	if (!(sis->flags & SWP_BLKDEV))
+	if (data_race(!(sis->flags & SWP_BLKDEV)))
 		return;
 
 	/*
@@ -302,7 +302,7 @@ int __swap_writepage(struct page *page,
 	struct swap_info_struct *sis = page_swap_info(page);
 
 	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
-	if (sis->flags & SWP_FS) {
+	if (data_race(sis->flags & SWP_FS)) {
 		struct kiocb kiocb;
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
@@ -393,7 +393,7 @@ int swap_readpage(struct page *page, boo
 		goto out;
 	}
 
-	if (sis->flags & SWP_FS) {
+	if (data_race(sis->flags & SWP_FS)) {
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
 
@@ -455,7 +455,7 @@ int swap_set_page_dirty(struct page *pag
 {
 	struct swap_info_struct *sis = page_swap_info(page);
 
-	if (sis->flags & SWP_FS) {
+	if (data_race(sis->flags & SWP_FS)) {
 		struct address_space *mapping = sis->swap_file->f_mapping;
 
 		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 23/39] mm/swap_state: mark various intentional data races
  2020-08-15  0:29 incoming Andrew Morton
                   ` (21 preceding siblings ...)
  2020-08-15  0:31 ` [patch 22/39] mm/page_io: " Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 24/39] mm/filemap.c: fix a data race in filemap_fault() Andrew Morton
                   ` (116 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, elver, linux-mm, mm-commits, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/swap_state: mark various intentional data races

swap_cache_info.* could be accessed concurrently as noticed by
KCSAN,

 BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache

 write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101:
  lookup_swap_cache+0x12e/0x460
  lookup_swap_cache at mm/swap_state.c:322
  do_swap_page+0x112/0xeb0
  __handle_mm_fault+0xc7a/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100:
  lookup_swap_cache+0x117/0x460
  lookup_swap_cache at mm/swap_state.c:322
  shmem_swapin_page+0xc7/0x9e0
  shmem_getpage_gfp+0x2ca/0x16c0
  shmem_fault+0xef/0x3c0
  __do_fault+0x9e/0x220
  do_fault+0x4a0/0x920
  __handle_mm_fault+0xc69/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

 write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87:
   __delete_from_swap_cache+0x681/0x8b0
   __delete_from_swap_cache at mm/swap_state.c:178

 read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53:
   __delete_from_swap_cache+0x66e/0x8b0
   __delete_from_swap_cache at mm/swap_state.c:178

Both the read and write are done as lockless. Since swap_cache_info.*
are only used to print out counter information, even if any of them
missed a few incremental due to data races, it will be harmless, so just
mark it as an intentional data race using the data_race() macro.

While at it, fix a checkpatch.pl warning,

WARNING: Single statement macros should not use a do {} while (0) loop

Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/swap_state.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/swap_state.c~mm-swap_state-mark-various-intentional-data-races
+++ a/mm/swap_state.c
@@ -57,8 +57,8 @@ static bool enable_vma_readahead __read_
 #define GET_SWAP_RA_VAL(vma)					\
 	(atomic_long_read(&(vma)->swap_readahead_info) ? : 4)
 
-#define INC_CACHE_INFO(x)	do { swap_cache_info.x++; } while (0)
-#define ADD_CACHE_INFO(x, nr)	do { swap_cache_info.x += (nr); } while (0)
+#define INC_CACHE_INFO(x)	data_race(swap_cache_info.x++)
+#define ADD_CACHE_INFO(x, nr)	data_race(swap_cache_info.x += (nr))
 
 static struct {
 	unsigned long add_total;
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 24/39] mm/filemap.c: fix a data race in filemap_fault()
  2020-08-15  0:29 incoming Andrew Morton
                   ` (22 preceding siblings ...)
  2020-08-15  0:31 ` [patch 23/39] mm/swap_state: " Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 25/39] mm/swapfile: fix and annotate various data races Andrew Morton
                   ` (115 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, elver, kirill, linux-mm, mm-commits, torvalds, willy

From: Kirill A. Shutemov <kirill@shutemov.name>
Subject: mm/filemap.c: fix a data race in filemap_fault()

struct file_ra_state ra.mmap_miss could be accessed concurrently during
page faults as noticed by KCSAN,

 BUG: KCSAN: data-race in filemap_fault / filemap_map_pages

 write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
  filemap_fault+0x920/0xfc0
  do_sync_mmap_readahead at mm/filemap.c:2384
  (inlined by) filemap_fault at mm/filemap.c:2486
  __xfs_filemap_fault+0x112/0x3e0 [xfs]
  xfs_filemap_fault+0x74/0x90 [xfs]
  __do_fault+0x9e/0x220
  do_fault+0x4a0/0x920
  __handle_mm_fault+0xc69/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
  filemap_map_pages+0xc2e/0xd80
  filemap_map_pages at mm/filemap.c:2625
  do_fault+0x3da/0x920
  __handle_mm_fault+0xc69/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G        W    L 5.5.0-next-20200210+ #1
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

ra.mmap_miss is used to contribute the readahead decisions, a data race
could be undesirable.  Both the read and write is only under non-exclusive
mmap_sem, two concurrent writers could even underflow the counter.  Fix
the underflow by writing to a local variable before committing a final
store to ra.mmap_miss given a small inaccuracy of the counter should be
acceptable.

Link: http://lkml.kernel.org/r/20200211030134.1847-1-cai@lca.pw
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Qian Cai <cai@lca.pw>
Tested-by: Qian Cai <cai@lca.pw>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/filemap.c |   20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

--- a/mm/filemap.c~mm-filemap-fix-a-data-race-in-filemap_fault
+++ a/mm/filemap.c
@@ -2468,6 +2468,7 @@ static struct file *do_sync_mmap_readahe
 	struct address_space *mapping = file->f_mapping;
 	struct file *fpin = NULL;
 	pgoff_t offset = vmf->pgoff;
+	unsigned int mmap_miss;
 
 	/* If we don't want any read-ahead, don't bother */
 	if (vmf->vma->vm_flags & VM_RAND_READ)
@@ -2483,14 +2484,15 @@ static struct file *do_sync_mmap_readahe
 	}
 
 	/* Avoid banging the cache line if not needed */
-	if (ra->mmap_miss < MMAP_LOTSAMISS * 10)
-		ra->mmap_miss++;
+	mmap_miss = READ_ONCE(ra->mmap_miss);
+	if (mmap_miss < MMAP_LOTSAMISS * 10)
+		WRITE_ONCE(ra->mmap_miss, ++mmap_miss);
 
 	/*
 	 * Do we miss much more than hit in this file? If so,
 	 * stop bothering with read-ahead. It will only hurt.
 	 */
-	if (ra->mmap_miss > MMAP_LOTSAMISS)
+	if (mmap_miss > MMAP_LOTSAMISS)
 		return fpin;
 
 	/*
@@ -2516,13 +2518,15 @@ static struct file *do_async_mmap_readah
 	struct file_ra_state *ra = &file->f_ra;
 	struct address_space *mapping = file->f_mapping;
 	struct file *fpin = NULL;
+	unsigned int mmap_miss;
 	pgoff_t offset = vmf->pgoff;
 
 	/* If we don't want any read-ahead, don't bother */
 	if (vmf->vma->vm_flags & VM_RAND_READ || !ra->ra_pages)
 		return fpin;
-	if (ra->mmap_miss > 0)
-		ra->mmap_miss--;
+	mmap_miss = READ_ONCE(ra->mmap_miss);
+	if (mmap_miss)
+		WRITE_ONCE(ra->mmap_miss, --mmap_miss);
 	if (PageReadahead(page)) {
 		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
 		page_cache_async_readahead(mapping, ra, file,
@@ -2688,6 +2692,7 @@ void filemap_map_pages(struct vm_fault *
 	unsigned long max_idx;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
 	struct page *page;
+	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
 
 	rcu_read_lock();
 	xas_for_each(&xas, page, end_pgoff) {
@@ -2724,8 +2729,8 @@ void filemap_map_pages(struct vm_fault *
 		if (page->index >= max_idx)
 			goto unlock;
 
-		if (file->f_ra.mmap_miss > 0)
-			file->f_ra.mmap_miss--;
+		if (mmap_miss > 0)
+			mmap_miss--;
 
 		vmf->address += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
 		if (vmf->pte)
@@ -2745,6 +2750,7 @@ next:
 			break;
 	}
 	rcu_read_unlock();
+	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
 }
 EXPORT_SYMBOL(filemap_map_pages);
 
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 25/39] mm/swapfile: fix and annotate various data races
  2020-08-15  0:29 incoming Andrew Morton
                   ` (23 preceding siblings ...)
  2020-08-15  0:31 ` [patch 24/39] mm/filemap.c: fix a data race in filemap_fault() Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 26/39] mm/page_counter: fix various data races at memsw Andrew Morton
                   ` (114 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, elver, hughd, linux-mm, mm-commits, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/swapfile: fix and annotate various data races

swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could
be accessed concurrently separately as noticed by KCSAN,

=== si.highest_bit ===

 write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24:
  swap_range_alloc+0x81/0x130
  swap_range_alloc at mm/swapfile.c:681
  scan_swap_map_slots+0x371/0xb90
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0xf2/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

 read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70:
  scan_swap_map_slots+0x4a6/0xb90
  scan_swap_map_slots at mm/swapfile.c:892
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0xf2/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 70 PID: 6672 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #3
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

=== si.swap_map[offset] ===

 write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86:
  __swap_entry_free_locked+0x8c/0x100
  __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4)
  __swap_entry_free.constprop.20+0x69/0xb0
  free_swap_and_cache+0x53/0xa0
  unmap_page_range+0x7f8/0x1d70
  unmap_single_vma+0xcd/0x170
  unmap_vmas+0x18b/0x220
  exit_mmap+0xee/0x220
  mmput+0x10e/0x270
  do_exit+0x59b/0xf40
  do_group_exit+0x8b/0x180

 read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20:
  _swap_info_get+0x81/0xa0
  _swap_info_get at mm/swapfile.c:1140
  free_swap_and_cache+0x40/0xa0
  unmap_page_range+0x7f8/0x1d70
  unmap_single_vma+0xcd/0x170
  unmap_vmas+0x18b/0x220
  exit_mmap+0xee/0x220
  mmput+0x10e/0x270
  do_exit+0x59b/0xf40
  do_group_exit+0x8b/0x180

=== si.flags ===

 write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23:
  scan_swap_map_slots+0x6fe/0xb50
  scan_swap_map_slots at mm/swapfile.c:887
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0x377/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

 read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63:
  _swap_info_get+0x41/0xa0
  __swap_info_get at mm/swapfile.c:1114
  put_swap_page+0x84/0x490
  __remove_mapping+0x384/0x5f0
  shrink_page_list+0xff1/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

The writes are under si->lock but the reads are not. For si.highest_bit
and si.swap_map[offset], data race could trigger logic bugs, so fix them
by having WRITE_ONCE() for the writes and READ_ONCE() for the reads
except those isolated reads where they compare against zero which a data
race would cause no harm. Thus, annotate them as intentional data races
using the data_race() macro.

For si.flags, the readers are only interested in a single bit where a
data race there would cause no issue there.

[cai@lca.pw: add a missing annotation for si->flags in memory.c]
  Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Marco Elver <elver@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory.c   |    4 ++--
 mm/swapfile.c |   31 ++++++++++++++++++-------------
 2 files changed, 20 insertions(+), 15 deletions(-)

--- a/mm/memory.c~mm-swapfile-fix-and-annotate-various-data-races
+++ a/mm/memory.c
@@ -3130,8 +3130,8 @@ vm_fault_t do_swap_page(struct vm_fault
 	if (!page) {
 		struct swap_info_struct *si = swp_swap_info(entry);
 
-		if (si->flags & SWP_SYNCHRONOUS_IO &&
-				__swap_count(entry) == 1) {
+		if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
+		    __swap_count(entry) == 1) {
 			/* skip swapcache */
 			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
 							vmf->address);
--- a/mm/swapfile.c~mm-swapfile-fix-and-annotate-various-data-races
+++ a/mm/swapfile.c
@@ -672,7 +672,7 @@ static void swap_range_alloc(struct swap
 	if (offset == si->lowest_bit)
 		si->lowest_bit += nr_entries;
 	if (end == si->highest_bit)
-		si->highest_bit -= nr_entries;
+		WRITE_ONCE(si->highest_bit, si->highest_bit - nr_entries);
 	si->inuse_pages += nr_entries;
 	if (si->inuse_pages == si->pages) {
 		si->lowest_bit = si->max;
@@ -705,7 +705,7 @@ static void swap_range_free(struct swap_
 	if (end > si->highest_bit) {
 		bool was_full = !si->highest_bit;
 
-		si->highest_bit = end;
+		WRITE_ONCE(si->highest_bit, end);
 		if (was_full && (si->flags & SWP_WRITEOK))
 			add_to_avail_list(si);
 	}
@@ -870,7 +870,7 @@ checks:
 		else
 			goto done;
 	}
-	si->swap_map[offset] = usage;
+	WRITE_ONCE(si->swap_map[offset], usage);
 	inc_cluster_info_page(si, si->cluster_info, offset);
 	unlock_cluster(ci);
 
@@ -929,12 +929,13 @@ done:
 
 scan:
 	spin_unlock(&si->lock);
-	while (++offset <= si->highest_bit) {
-		if (!si->swap_map[offset]) {
+	while (++offset <= READ_ONCE(si->highest_bit)) {
+		if (data_race(!si->swap_map[offset])) {
 			spin_lock(&si->lock);
 			goto checks;
 		}
-		if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
+		if (vm_swap_full() &&
+		    READ_ONCE(si->swap_map[offset]) == SWAP_HAS_CACHE) {
 			spin_lock(&si->lock);
 			goto checks;
 		}
@@ -946,11 +947,12 @@ scan:
 	}
 	offset = si->lowest_bit;
 	while (offset < scan_base) {
-		if (!si->swap_map[offset]) {
+		if (data_race(!si->swap_map[offset])) {
 			spin_lock(&si->lock);
 			goto checks;
 		}
-		if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
+		if (vm_swap_full() &&
+		    READ_ONCE(si->swap_map[offset]) == SWAP_HAS_CACHE) {
 			spin_lock(&si->lock);
 			goto checks;
 		}
@@ -1149,7 +1151,7 @@ static struct swap_info_struct *__swap_i
 	p = swp_swap_info(entry);
 	if (!p)
 		goto bad_nofile;
-	if (!(p->flags & SWP_USED))
+	if (data_race(!(p->flags & SWP_USED)))
 		goto bad_device;
 	offset = swp_offset(entry);
 	if (offset >= p->max)
@@ -1175,7 +1177,7 @@ static struct swap_info_struct *_swap_in
 	p = __swap_info_get(entry);
 	if (!p)
 		goto out;
-	if (!p->swap_map[swp_offset(entry)])
+	if (data_race(!p->swap_map[swp_offset(entry)]))
 		goto bad_free;
 	return p;
 
@@ -1244,7 +1246,10 @@ static unsigned char __swap_entry_free_l
 	}
 
 	usage = count | has_cache;
-	p->swap_map[offset] = usage ? : SWAP_HAS_CACHE;
+	if (usage)
+		WRITE_ONCE(p->swap_map[offset], usage);
+	else
+		WRITE_ONCE(p->swap_map[offset], SWAP_HAS_CACHE);
 
 	return usage;
 }
@@ -1296,7 +1301,7 @@ struct swap_info_struct *get_swap_device
 		goto bad_nofile;
 
 	rcu_read_lock();
-	if (!(si->flags & SWP_VALID))
+	if (data_race(!(si->flags & SWP_VALID)))
 		goto unlock_out;
 	offset = swp_offset(entry);
 	if (offset >= si->max)
@@ -3484,7 +3489,7 @@ static int __swap_duplicate(swp_entry_t
 	} else
 		err = -ENOENT;			/* unused swap entry */
 
-	p->swap_map[offset] = count | has_cache;
+	WRITE_ONCE(p->swap_map[offset], count | has_cache);
 
 unlock_out:
 	unlock_cluster_or_swap_info(p, ci);
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 26/39] mm/page_counter: fix various data races at memsw
  2020-08-15  0:29 incoming Andrew Morton
                   ` (24 preceding siblings ...)
  2020-08-15  0:31 ` [patch 25/39] mm/swapfile: fix and annotate various data races Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 27/39] mm/memcontrol: fix a data race in scan count Andrew Morton
                   ` (113 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, david, dvyukov, elver, hannes, linux-mm, mhocko,
	mm-commits, penguin-kernel, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/page_counter: fix various data races at memsw

Commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") could had
memcg->memsw->watermark and memcg->memsw->failcnt been accessed
concurrently as reported by KCSAN,

 BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge

 read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
  page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
  try_charge+0x131/0xd50 mm/memcontrol.c:2405
  __memcg_kmem_charge_memcg+0x58/0x140
  __memcg_kmem_charge+0xcc/0x280
  __alloc_pages_nodemask+0x1e1/0x450
  alloc_pages_current+0xa6/0x120
  pte_alloc_one+0x17/0xd0
  __pte_alloc+0x3a/0x1f0
  copy_p4d_range+0xc36/0x1990
  copy_page_range+0x21d/0x360
  dup_mmap+0x5f5/0x7a0
  dup_mm+0xa2/0x240
  copy_process+0x1b3f/0x3460
  _do_fork+0xaa/0xa20
  __x64_sys_clone+0x13b/0x170
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
  page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
  try_charge+0x131/0xd50 mm/memcontrol.c:2405
  mem_cgroup_try_charge+0x159/0x460
  mem_cgroup_try_charge_delay+0x3d/0xa0
  wp_page_copy+0x14d/0x930
  do_wp_page+0x107/0x7b0
  __handle_mm_fault+0xce6/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge

 write to 0xffff88809bbf2158 of 8 bytes by task 11782 on cpu 0:
  page_counter_try_charge+0x100/0x170 mm/page_counter.c:129
  try_charge+0x185/0xbf0 mm/memcontrol.c:2405
  __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
  __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
  __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780

 read to 0xffff88809bbf2158 of 8 bytes by task 11814 on cpu 1:
  page_counter_try_charge+0xef/0x170 mm/page_counter.c:129
  try_charge+0x185/0xbf0 mm/memcontrol.c:2405
  __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
  __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
  __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780

Since watermark could be compared or set to garbage due to a data race
which would change the code logic, fix it by adding a pair of READ_ONCE()
and WRITE_ONCE() in those places.

The "failcnt" counter is tolerant of some degree of inaccuracy and is only
used to report stats, a data race will not be harmful, thus mark it as an
intentional data race using the data_race() macro.

Link: http://lkml.kernel.org/r/1581519682-23594-1-git-send-email-cai@lca.pw
Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
Signed-off-by: Qian Cai <cai@lca.pw>
Reported-by: syzbot+f36cfe60b1006a94f9dc@syzkaller.appspotmail.com
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_counter.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

--- a/mm/page_counter.c~mm-page_counter-fix-various-data-races-at-memsw
+++ a/mm/page_counter.c
@@ -77,8 +77,8 @@ void page_counter_charge(struct page_cou
 		 * This is indeed racy, but we can live with some
 		 * inaccuracy in the watermark.
 		 */
-		if (new > c->watermark)
-			c->watermark = new;
+		if (new > READ_ONCE(c->watermark))
+			WRITE_ONCE(c->watermark, new);
 	}
 }
 
@@ -119,9 +119,10 @@ bool page_counter_try_charge(struct page
 			propagate_protected_usage(c, new);
 			/*
 			 * This is racy, but we can live with some
-			 * inaccuracy in the failcnt.
+			 * inaccuracy in the failcnt which is only used
+			 * to report stats.
 			 */
-			c->failcnt++;
+			data_race(c->failcnt++);
 			*fail = c;
 			goto failed;
 		}
@@ -130,8 +131,8 @@ bool page_counter_try_charge(struct page
 		 * Just like with failcnt, we can live with some
 		 * inaccuracy in the watermark.
 		 */
-		if (new > c->watermark)
-			c->watermark = new;
+		if (new > READ_ONCE(c->watermark))
+			WRITE_ONCE(c->watermark, new);
 	}
 	return true;
 
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 27/39] mm/memcontrol: fix a data race in scan count
  2020-08-15  0:29 incoming Andrew Morton
                   ` (25 preceding siblings ...)
  2020-08-15  0:31 ` [patch 26/39] mm/page_counter: fix various data races at memsw Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 28/39] mm/list_lru: fix a data race in list_lru_count_one Andrew Morton
                   ` (112 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, hannes, linux-mm, mhocko, mm-commits, torvalds, vdavydov.dev

From: Qian Cai <cai@lca.pw>
Subject: mm/memcontrol: fix a data race in scan count

struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
accessed concurrently as noticed by KCSAN,

 BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size

 write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
  mem_cgroup_update_lru_size+0x11c/0x1d0
  mem_cgroup_update_lru_size at mm/memcontrol.c:1266
  isolate_lru_pages+0x6a9/0xf30
  shrink_active_list+0x123/0xcc0
  shrink_lruvec+0x8fd/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
  lruvec_lru_size+0xbb/0x270
  mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
  (inlined by) lruvec_lru_size at mm/vmscan.c:326
  shrink_lruvec+0x1d0/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_current+0xa6/0x120
  alloc_slab_page+0x3b1/0x540
  allocate_slab+0x70/0x660
  new_slab+0x46/0x70
  ___slab_alloc+0x4ad/0x7d0
  __slab_alloc+0x43/0x70
  kmem_cache_alloc+0x2c3/0x420
  getname_flags+0x4c/0x230
  getname+0x22/0x30
  do_sys_openat2+0x205/0x3b0
  do_sys_open+0x9a/0xf0
  __x64_sys_openat+0x62/0x80
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The write is under lru_lock, but the read is done as lockless.  The scan
count is used to determine how aggressively the anon and file LRU lists
should be scanned.  Load tearing could generate an inefficient heuristic,
so fix it by adding READ_ONCE() for the read.

Link: http://lkml.kernel.org/r/20200206034945.2481-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-fix-a-data-race-in-scan-count
+++ a/include/linux/memcontrol.h
@@ -630,7 +630,7 @@ unsigned long mem_cgroup_get_zone_lru_si
 	struct mem_cgroup_per_node *mz;
 
 	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
-	return mz->lru_zone_size[zone_idx][lru];
+	return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
 }
 
 void mem_cgroup_handle_over_high(void);
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 28/39] mm/list_lru: fix a data race in list_lru_count_one
  2020-08-15  0:29 incoming Andrew Morton
                   ` (26 preceding siblings ...)
  2020-08-15  0:31 ` [patch 27/39] mm/memcontrol: fix a data race in scan count Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 29/39] mm/mempool: fix a data race in mempool_free() Andrew Morton
                   ` (111 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, elver, konrad.wilk, linux-mm, mm-commits, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/list_lru: fix a data race in list_lru_count_one

struct list_lru_one l.nr_items could be accessed concurrently as noticed
by KCSAN,

 BUG: KCSAN: data-race in list_lru_count_one / list_lru_isolate_move

 write to 0xffffa102789c4510 of 8 bytes by task 823 on cpu 39:
  list_lru_isolate_move+0xf9/0x130
  list_lru_isolate_move at mm/list_lru.c:180
  inode_lru_isolate+0x12b/0x2a0
  __list_lru_walk_one+0x122/0x3d0
  list_lru_walk_one+0x75/0xa0
  prune_icache_sb+0x8b/0xc0
  super_cache_scan+0x1b8/0x250
  do_shrink_slab+0x256/0x6d0
  shrink_slab+0x41b/0x4a0
  shrink_node+0x35c/0xd80
  balance_pgdat+0x652/0xd90
  kswapd+0x396/0x8d0
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

 read to 0xffffa102789c4510 of 8 bytes by task 6345 on cpu 56:
  list_lru_count_one+0x116/0x2f0
  list_lru_count_one at mm/list_lru.c:193
  super_cache_count+0xe8/0x170
  do_shrink_slab+0x95/0x6d0
  shrink_slab+0x41b/0x4a0
  shrink_node+0x35c/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 56 PID: 6345 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #4
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

A shattered l.nr_items could affect the shrinker behaviour due to a data
race. Fix it by adding READ_ONCE() for the read. Since the writes are
aligned and up to word-size, assume those are safe from data races to
avoid readability issues of writing WRITE_ONCE(var, var + val).

Link: http://lkml.kernel.org/r/1581114679-5488-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Marco Elver <elver@google.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/list_lru.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/list_lru.c~mm-list_lru-fix-a-data-race-in-list_lru_count_one
+++ a/mm/list_lru.c
@@ -180,7 +180,7 @@ unsigned long list_lru_count_one(struct
 
 	rcu_read_lock();
 	l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
-	count = l->nr_items;
+	count = READ_ONCE(l->nr_items);
 	rcu_read_unlock();
 
 	return count;
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 29/39] mm/mempool: fix a data race in mempool_free()
  2020-08-15  0:29 incoming Andrew Morton
                   ` (27 preceding siblings ...)
  2020-08-15  0:31 ` [patch 28/39] mm/list_lru: fix a data race in list_lru_count_one Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 30/39] mm/rmap: annotate a data race at tlb_flush_batched Andrew Morton
                   ` (110 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, elver, linux-mm, mm-commits, oleg, tj, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/mempool: fix a data race in mempool_free()

mempool_t pool.curr_nr could be accessed concurrently as noticed by
KCSAN,

 BUG: KCSAN: data-race in mempool_free / remove_element

 write to 0xffffffffa937638c of 4 bytes by task 6359 on cpu 113:
  remove_element+0x4a/0x1c0
  remove_element at mm/mempool.c:132
  mempool_alloc+0x102/0x210
  (inlined by) mempool_alloc at mm/mempool.c:399
  bio_alloc_bioset+0x106/0x2c0
  get_swap_bio+0x49/0x230
  __swap_writepage+0x680/0xc30
  swap_writepage+0x9c/0xf0
  pageout+0x33e/0xae0
  shrink_page_list+0x1f57/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  <snip>

 read to 0xffffffffa937638c of 4 bytes by interrupt on cpu 64:
  mempool_free+0x3e/0x150
  mempool_free at mm/mempool.c:492
  bio_free+0x192/0x280
  bio_put+0x91/0xd0
  end_swap_bio_write+0x1d8/0x280
  bio_endio+0x2c2/0x5b0
  dec_pending+0x22b/0x440 [dm_mod]
  clone_endio+0xe4/0x2c0 [dm_mod]
  bio_endio+0x2c2/0x5b0
  blk_update_request+0x217/0x940
  scsi_end_request+0x6b/0x4d0
  scsi_io_completion+0xb7/0x7e0
  scsi_finish_command+0x223/0x310
  scsi_softirq_done+0x1d5/0x210
  blk_mq_complete_request+0x224/0x250
  scsi_mq_done+0xc2/0x250
  pqi_raid_io_complete+0x5a/0x70 [smartpqi]
  pqi_irq_handler+0x150/0x1410 [smartpqi]
  __handle_irq_event_percpu+0x90/0x540
  handle_irq_event_percpu+0x49/0xd0
  handle_irq_event+0x85/0xca
  handle_edge_irq+0x13f/0x3e0
  do_IRQ+0x86/0x190
  <snip>

Since the write is under pool->lock but the read is done as lockless.
Even though the commit 5b990546e334 ("mempool: fix and document
synchronization and memory barrier usage") introduced the smp_wmb() and
smp_rmb() pair to improve the situation, it is adequate to protect it
from data races which could lead to a logic bug, so fix it by adding
READ_ONCE() for the read.

Link: http://lkml.kernel.org/r/1581446384-2131-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Marco Elver <elver@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempool.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/mempool.c~mm-mempool-fix-a-data-race-in-mempool_free
+++ a/mm/mempool.c
@@ -489,7 +489,7 @@ void mempool_free(void *element, mempool
 	 * ensures that there will be frees which return elements to the
 	 * pool waking up the waiters.
 	 */
-	if (unlikely(pool->curr_nr < pool->min_nr)) {
+	if (unlikely(READ_ONCE(pool->curr_nr) < pool->min_nr)) {
 		spin_lock_irqsave(&pool->lock, flags);
 		if (likely(pool->curr_nr < pool->min_nr)) {
 			add_element(pool, element);
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 30/39] mm/rmap: annotate a data race at tlb_flush_batched
  2020-08-15  0:29 incoming Andrew Morton
                   ` (28 preceding siblings ...)
  2020-08-15  0:31 ` [patch 29/39] mm/mempool: fix a data race in mempool_free() Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 31/39] mm/swap.c: annotate data races for lru_rotate_pvecs Andrew Morton
                   ` (109 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, elver, linux-mm, mm-commits, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/rmap: annotate a data race at tlb_flush_batched

mm->tlb_flush_batched could be accessed concurrently as noticed by
KCSAN,

 BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one

 write to 0xffff93f754880bd0 of 1 bytes by task 822 on cpu 6:
  try_to_unmap_one+0x59a/0x1ab0
  set_tlb_ubc_flush_pending at mm/rmap.c:635
  (inlined by) try_to_unmap_one at mm/rmap.c:1538
  rmap_walk_anon+0x296/0x650
  rmap_walk+0xdf/0x100
  try_to_unmap+0x18a/0x2f0
  shrink_page_list+0xef6/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  balance_pgdat+0x652/0xd90
  kswapd+0x396/0x8d0
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

 read to 0xffff93f754880bd0 of 1 bytes by task 6364 on cpu 4:
  flush_tlb_batched_pending+0x29/0x90
  flush_tlb_batched_pending at mm/rmap.c:682
  change_p4d_range+0x5dd/0x1030
  change_pte_range at mm/mprotect.c:44
  (inlined by) change_pmd_range at mm/mprotect.c:212
  (inlined by) change_pud_range at mm/mprotect.c:240
  (inlined by) change_p4d_range at mm/mprotect.c:260
  change_protection+0x222/0x310
  change_prot_numa+0x3e/0x60
  task_numa_work+0x219/0x350
  task_work_run+0xed/0x140
  prepare_exit_to_usermode+0x2cc/0x2e0
  ret_from_intr+0x32/0x42

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 4 PID: 6364 Comm: mtest01 Tainted: G        W    L 5.5.0-next-20200210+ #5
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

flush_tlb_batched_pending() is under PTL but the write is not, but
mm->tlb_flush_batched is only a bool type, so the value is unlikely to be
shattered.  Thus, mark it as an intentional data race by using the data
race macro.

Link: http://lkml.kernel.org/r/1581450783-8262-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/rmap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/rmap.c~mm-rmap-annotate-a-data-race-at-tlb_flush_batched
+++ a/mm/rmap.c
@@ -672,7 +672,7 @@ static bool should_defer_flush(struct mm
  */
 void flush_tlb_batched_pending(struct mm_struct *mm)
 {
-	if (mm->tlb_flush_batched) {
+	if (data_race(mm->tlb_flush_batched)) {
 		flush_tlb_mm(mm);
 
 		/*
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 31/39] mm/swap.c: annotate data races for lru_rotate_pvecs
  2020-08-15  0:29 incoming Andrew Morton
                   ` (29 preceding siblings ...)
  2020-08-15  0:31 ` [patch 30/39] mm/rmap: annotate a data race at tlb_flush_batched Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 32/39] mm: annotate a data race in page_zonenum() Andrew Morton
                   ` (108 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, elver, linux-mm, mm-commits, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/swap.c: annotate data races for lru_rotate_pvecs

Read to lru_add_pvec->nr could be interrupted and then write to the same
variable.  The write has local interrupt disabled, but the plain reads
result in data races.  However, it is unlikely the compilers could do much
damage here given that lru_add_pvec->nr is a "unsigned char" and there is
an existing compiler barrier.  Thus, annotate the reads using the
data_race() macro.  The data races were reported by KCSAN,

 BUG: KCSAN: data-race in lru_add_drain_cpu / rotate_reclaimable_page

 write to 0xffff9291ebcb8a40 of 1 bytes by interrupt on cpu 23:
  rotate_reclaimable_page+0x2df/0x490
  pagevec_add at include/linux/pagevec.h:81
  (inlined by) rotate_reclaimable_page at mm/swap.c:259
  end_page_writeback+0x1b5/0x2b0
  end_swap_bio_write+0x1d0/0x280
  bio_endio+0x297/0x560
  dec_pending+0x218/0x430 [dm_mod]
  clone_endio+0xe4/0x2c0 [dm_mod]
  bio_endio+0x297/0x560
  blk_update_request+0x201/0x920
  scsi_end_request+0x6b/0x4a0
  scsi_io_completion+0xb7/0x7e0
  scsi_finish_command+0x1ed/0x2a0
  scsi_softirq_done+0x1c9/0x1d0
  blk_done_softirq+0x181/0x1d0
  __do_softirq+0xd9/0x57c
  irq_exit+0xa2/0xc0
  do_IRQ+0x8b/0x190
  ret_from_intr+0x0/0x42
  delay_tsc+0x46/0x80
  __const_udelay+0x3c/0x40
  __udelay+0x10/0x20
  kcsan_setup_watchpoint+0x202/0x3a0
  __tsan_read1+0xc2/0x100
  lru_add_drain_cpu+0xb8/0x3f0
  lru_add_drain+0x25/0x40
  shrink_active_list+0xe1/0xc80
  shrink_lruvec+0x766/0xb70
  shrink_node+0x2d6/0xca0
  do_try_to_free_pages+0x1f7/0x9a0
  try_to_free_pages+0x252/0x5b0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x16e/0x6f0
  __handle_mm_fault+0xcd5/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 read to 0xffff9291ebcb8a40 of 1 bytes by task 37761 on cpu 23:
  lru_add_drain_cpu+0xb8/0x3f0
  lru_add_drain_cpu at mm/swap.c:602
  lru_add_drain+0x25/0x40
  shrink_active_list+0xe1/0xc80
  shrink_lruvec+0x766/0xb70
  shrink_node+0x2d6/0xca0
  do_try_to_free_pages+0x1f7/0x9a0
  try_to_free_pages+0x252/0x5b0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x16e/0x6f0
  __handle_mm_fault+0xcd5/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 2 locks held by oom02/37761:
  #0: ffff9281e5928808 (&mm->mmap_sem#2){++++}, at: do_page_fault
  #1: ffffffffb3ade380 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part
 irq event stamp: 1949217
 trace_hardirqs_on_thunk+0x1a/0x1c
 __do_softirq+0x2e7/0x57c
 __do_softirq+0x34c/0x57c
 irq_exit+0xa2/0xc0

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 23 PID: 37761 Comm: oom02 Not tainted 5.6.0-rc3-next-20200226+ #6
 Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018

Link: http://lkml.kernel.org/r/20200228044018.1263-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/swap.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/mm/swap.c~mm-swap-annotate-data-races-for-lru_rotate_pvecs
+++ a/mm/swap.c
@@ -632,7 +632,8 @@ void lru_add_drain_cpu(int cpu)
 		__pagevec_lru_add(pvec);
 
 	pvec = &per_cpu(lru_rotate.pvec, cpu);
-	if (pagevec_count(pvec)) {
+	/* Disabling interrupts below acts as a compiler barrier. */
+	if (data_race(pagevec_count(pvec))) {
 		unsigned long flags;
 
 		/* No harm done if a racing interrupt already did this */
@@ -793,7 +794,7 @@ void lru_add_drain_all(void)
 		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
 
 		if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
-		    pagevec_count(&per_cpu(lru_rotate.pvec, cpu)) ||
+		    data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) ||
 		    pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
 		    pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
 		    pagevec_count(&per_cpu(lru_pvecs.lru_lazyfree, cpu)) ||
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 32/39] mm: annotate a data race in page_zonenum()
  2020-08-15  0:29 incoming Andrew Morton
                   ` (30 preceding siblings ...)
  2020-08-15  0:31 ` [patch 31/39] mm/swap.c: annotate data races for lru_rotate_pvecs Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:31 ` [patch 33/39] include/asm-generic/vmlinux.lds.h: align ro_after_init Andrew Morton
                   ` (107 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, cai, dan.j.williams, david, elver, ira.weiny, jack,
	jhubbard, linux-mm, mm-commits, paulmck, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm: annotate a data race in page_zonenum()

 BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page

 write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
  page_cpupid_xchg_last+0x51/0x80
  page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
  wp_page_reuse+0x3e/0xc0
  wp_page_reuse at mm/memory.c:2453
  do_wp_page+0x472/0x7b0
  do_wp_page at mm/memory.c:2798
  __handle_mm_fault+0xcb0/0xd00
  handle_pte_fault at mm/memory.c:4049
  (inlined by) __handle_mm_fault at mm/memory.c:4163
  handle_mm_fault+0xfc/0x2f0
  handle_mm_fault at mm/memory.c:4200
  do_page_fault+0x263/0x6f9
  do_user_addr_fault at arch/x86/mm/fault.c:1465
  (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
  page_fault+0x34/0x40

 read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
  put_page+0x15a/0x1f0
  page_zonenum at include/linux/mm.h:923
  (inlined by) is_zone_device_page at include/linux/mm.h:929
  (inlined by) page_is_devmap_managed at include/linux/mm.h:948
  (inlined by) put_page at include/linux/mm.h:1023
  wp_page_copy+0x571/0x930
  wp_page_copy at mm/memory.c:2615
  do_wp_page+0x107/0x7b0
  __handle_mm_fault+0xcb0/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

A page never changes its zone number. The zone number happens to be
stored in the same word as other bits which are modified, but the zone
number bits will never be modified by any other write, so it can accept
a reload of the zone bits after an intervening write and it don't need
to use READ_ONCE(). Thus, annotate this data race using
ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
writes to it.

Link: http://lkml.kernel.org/r/1581619089-14472-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Suggested-by: Marco Elver <elver@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    1 +
 1 file changed, 1 insertion(+)

--- a/include/linux/mm.h~mm-annotate-a-data-race-in-page_zonenum
+++ a/include/linux/mm.h
@@ -1067,6 +1067,7 @@ vm_fault_t finish_mkwrite_fault(struct v
 
 static inline enum zone_type page_zonenum(const struct page *page)
 {
+	ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
 	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
 }
 
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 33/39] include/asm-generic/vmlinux.lds.h: align ro_after_init
  2020-08-15  0:29 incoming Andrew Morton
                   ` (31 preceding siblings ...)
  2020-08-15  0:31 ` [patch 32/39] mm: annotate a data race in page_zonenum() Andrew Morton
@ 2020-08-15  0:31 ` Andrew Morton
  2020-08-15  0:32 ` [patch 34/39] sh: clkfwk: remove r8/r16/r32 Andrew Morton
                   ` (106 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:31 UTC (permalink / raw)
  To: akpm, amodra, arnd, bin.meng, chenzhou10, dalias, geert+renesas,
	glaubitz, krzk, kuninori.morimoto.gx, linux-mm, mm-commits,
	romain.naour, sam, stable, torvalds, ysato

From: Romain Naour <romain.naour@gmail.com>
Subject: include/asm-generic/vmlinux.lds.h: align ro_after_init

Since the patch [1], building the kernel using a toolchain built with
binutils 2.33.1 prevents booting a sh4 system under Qemu.  Apply the patch
provided by Alan Modra [2] that fix alignment of rodata.

[1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
[2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.html

Link: https://marc.info/?l=linux-sh&m=158429470221261
Signed-off-by: Romain Naour <romain.naour@gmail.com>
Cc: Alan Modra <amodra@gmail.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/asm-generic/vmlinux.lds.h |    1 +
 1 file changed, 1 insertion(+)

--- a/include/asm-generic/vmlinux.lds.h~include-asm-generic-vmlinuxldsh-align-ro_after_init
+++ a/include/asm-generic/vmlinux.lds.h
@@ -394,6 +394,7 @@
  */
 #ifndef RO_AFTER_INIT_DATA
 #define RO_AFTER_INIT_DATA						\
+	. = ALIGN(8);							\
 	__start_ro_after_init = .;					\
 	*(.data..ro_after_init)						\
 	JUMP_TABLE_DATA							\
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 34/39] sh: clkfwk: remove r8/r16/r32
  2020-08-15  0:29 incoming Andrew Morton
                   ` (32 preceding siblings ...)
  2020-08-15  0:31 ` [patch 33/39] include/asm-generic/vmlinux.lds.h: align ro_after_init Andrew Morton
@ 2020-08-15  0:32 ` Andrew Morton
  2020-08-15  0:32 ` [patch 35/39] sh: use generic strncpy() Andrew Morton
                   ` (105 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:32 UTC (permalink / raw)
  To: akpm, amodra, bin.meng, chenzhou10, dalias, geert+renesas,
	glaubitz, krzk, kuninori.morimoto.gx, linux-mm, mm-commits,
	romain.naour, sam, torvalds, ysato

From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Subject: sh: clkfwk: remove r8/r16/r32

SH will get below warning

${LINUX}/drivers/sh/clk/cpg.c: In function 'r8':
${LINUX}/drivers/sh/clk/cpg.c:41:17: warning: passing argument 1 of 'ioread8'
 discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
  return ioread8(addr);
                 ^~~~
In file included from ${LINUX}/arch/sh/include/asm/io.h:21,
                 from ${LINUX}/include/linux/io.h:13,
                 from ${LINUX}/drivers/sh/clk/cpg.c:14:
${LINUX}/include/asm-generic/iomap.h:29:29: note: expected 'void *' but
argument is of type 'const void *'
 extern unsigned int ioread8(void __iomem *);
                             ^~~~~~~~~~~~~~

We don't need "const" for r8/r16/r32.  And we don't need r8/r16/r32
themselvs.  This patch cleanup these.

X-MARC-Message: https://marc.info/?l=linux-renesas-soc&m=157852973916903
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Alan Modra <amodra@gmail.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Romain Naour <romain.naour@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/sh/clk/cpg.c |   23 ++++-------------------
 1 file changed, 4 insertions(+), 19 deletions(-)

--- a/drivers/sh/clk/cpg.c~sh-clkfwk-remove-r8-r16-r32
+++ a/drivers/sh/clk/cpg.c
@@ -36,36 +36,21 @@ static void sh_clk_write(int value, stru
 		iowrite32(value, clk->mapped_reg);
 }
 
-static unsigned int r8(const void __iomem *addr)
-{
-	return ioread8(addr);
-}
-
-static unsigned int r16(const void __iomem *addr)
-{
-	return ioread16(addr);
-}
-
-static unsigned int r32(const void __iomem *addr)
-{
-	return ioread32(addr);
-}
-
 static int sh_clk_mstp_enable(struct clk *clk)
 {
 	sh_clk_write(sh_clk_read(clk) & ~(1 << clk->enable_bit), clk);
 	if (clk->status_reg) {
-		unsigned int (*read)(const void __iomem *addr);
+		unsigned int (*read)(void __iomem *addr);
 		int i;
 		void __iomem *mapped_status = (phys_addr_t)clk->status_reg -
 			(phys_addr_t)clk->enable_reg + clk->mapped_reg;
 
 		if (clk->flags & CLK_ENABLE_REG_8BIT)
-			read = r8;
+			read = ioread8;
 		else if (clk->flags & CLK_ENABLE_REG_16BIT)
-			read = r16;
+			read = ioread16;
 		else
-			read = r32;
+			read = ioread32;
 
 		for (i = 1000;
 		     (read(mapped_status) & (1 << clk->enable_bit)) && i;
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 35/39] sh: use generic strncpy()
  2020-08-15  0:29 incoming Andrew Morton
                   ` (33 preceding siblings ...)
  2020-08-15  0:32 ` [patch 34/39] sh: clkfwk: remove r8/r16/r32 Andrew Morton
@ 2020-08-15  0:32 ` Andrew Morton
  2020-08-15  0:32 ` [patch 36/39] iomap: constify ioreadX() iomem argument (as in generic implementation) Andrew Morton
                   ` (104 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:32 UTC (permalink / raw)
  To: akpm, amodra, bin.meng, chenzhou10, dalias, geert+renesas,
	glaubitz, krzk, kuninori.morimoto.gx, linux-mm, mm-commits,
	romain.naour, sam, torvalds, ysato

From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Subject: sh: use generic strncpy()

Current SH will get below warning at strncpy()

In file included from ${LINUX}/arch/sh/include/asm/string.h:3,
                 from ${LINUX}/include/linux/string.h:20,
                 from ${LINUX}/include/linux/bitmap.h:9,
                 from ${LINUX}/include/linux/nodemask.h:95,
                 from ${LINUX}/include/linux/mmzone.h:17,
                 from ${LINUX}/include/linux/gfp.h:6,
                 from ${LINUX}/innclude/linux/slab.h:15,
                 from ${LINUX}/linux/drivers/mmc/host/vub300.c:38:
${LINUX}/drivers/mmc/host/vub300.c: In function 'new_system_port_status':
${LINUX}/arch/sh/include/asm/string_32.h:51:42: warning: array subscript\
  80 is above array bounds of 'char[26]' [-Warray-bounds]
   : "0" (__dest), "1" (__src), "r" (__src+__n)
                                     ~~~~~^~~~

In general, strncpy() should behave like below.

	char dest[10];
	char *src = "12345";

	strncpy(dest, src, 10);
	// dest = {'1', '2', '3', '4', '5',
	           '\0','\0','\0','\0','\0'}

But, current SH strnpy() has 2 issues.
1st is it will access to out-of-memory (= src + 10).
2nd is it needs big fixup for it, and maintenance __asm__
code is difficult.

To solve these issues, this patch simply uses generic strncpy()
instead of architecture specific one.

Link: https://marc.info/?l=linux-renesas-soc&m=157664657013309
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Alan Modra <amodra@gmail.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Romain Naour <romain.naour@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/sh/include/asm/string_32.h |   26 --------------------------
 1 file changed, 26 deletions(-)

--- a/arch/sh/include/asm/string_32.h~sh-use-generic-strncpy
+++ a/arch/sh/include/asm/string_32.h
@@ -28,32 +28,6 @@ static inline char *strcpy(char *__dest,
 	return __xdest;
 }
 
-#define __HAVE_ARCH_STRNCPY
-static inline char *strncpy(char *__dest, const char *__src, size_t __n)
-{
-	register char *__xdest = __dest;
-	unsigned long __dummy;
-
-	if (__n == 0)
-		return __xdest;
-
-	__asm__ __volatile__(
-		"1:\n"
-		"mov.b	@%1+, %2\n\t"
-		"mov.b	%2, @%0\n\t"
-		"cmp/eq	#0, %2\n\t"
-		"bt/s	2f\n\t"
-		" cmp/eq	%5,%1\n\t"
-		"bf/s	1b\n\t"
-		" add	#1, %0\n"
-		"2:"
-		: "=r" (__dest), "=r" (__src), "=&z" (__dummy)
-		: "0" (__dest), "1" (__src), "r" (__src+__n)
-		: "memory", "t");
-
-	return __xdest;
-}
-
 #define __HAVE_ARCH_STRCMP
 static inline int strcmp(const char *__cs, const char *__ct)
 {
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 36/39] iomap: constify ioreadX() iomem argument (as in generic implementation)
  2020-08-15  0:29 incoming Andrew Morton
                   ` (34 preceding siblings ...)
  2020-08-15  0:32 ` [patch 35/39] sh: use generic strncpy() Andrew Morton
@ 2020-08-15  0:32 ` Andrew Morton
  2020-08-15  0:32 ` [patch 37/39] rtl818x: " Andrew Morton
                   ` (103 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:32 UTC (permalink / raw)
  To: akpm, allenbh, arnd, benh, dalias, dave.jiang, davem, deller,
	geert+renesas, geert, ink, James.Bottomley, jasowang, jdmason,
	krzk, kuba, kvalo, linux-mm, mattst88, mm-commits, mpe, mst,
	paulus, rth, torvalds, ysato

From: Krzysztof Kozlowski <krzk@kernel.org>
Subject: iomap: constify ioreadX() iomem argument (as in generic implementation)

Patch series "iomap: Constify ioreadX() iomem argument", v3.

The ioread8/16/32() and others have inconsistent interface among the
architectures: some taking address as const, some not.

It seems there is nothing really stopping all of them to take pointer to
const.


This patch (of 4):

The ioreadX() and ioreadX_rep() helpers have inconsistent interface.  On
some architectures void *__iomem address argument is a pointer to const,
on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

[krzk@kernel.org: sh: clk: fix assignment from incompatible pointer type for ioreadX()]
  Link: http://lkml.kernel.org/r/20200723082017.24053-1-krzk@kernel.org
[akpm@linux-foundation.org: fix drivers/mailbox/bcm-pdc-mailbox.c]
  Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%25lkp@intel.com
Link: http://lkml.kernel.org/r/20200709072837.5869-1-krzk@kernel.org
Link: http://lkml.kernel.org/r/20200709072837.5869-2-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/alpha/include/asm/core_apecs.h   |    6 +-
 arch/alpha/include/asm/core_cia.h     |    6 +-
 arch/alpha/include/asm/core_lca.h     |    6 +-
 arch/alpha/include/asm/core_marvel.h  |    4 -
 arch/alpha/include/asm/core_mcpcia.h  |    6 +-
 arch/alpha/include/asm/core_t2.h      |    2 
 arch/alpha/include/asm/io.h           |   12 ++--
 arch/alpha/include/asm/io_trivial.h   |   16 ++---
 arch/alpha/include/asm/jensen.h       |    2 
 arch/alpha/include/asm/machvec.h      |    6 +-
 arch/alpha/kernel/core_marvel.c       |    2 
 arch/alpha/kernel/io.c                |   12 ++--
 arch/parisc/include/asm/io.h          |    4 -
 arch/parisc/lib/iomap.c               |   72 ++++++++++++------------
 arch/powerpc/kernel/iomap.c           |   28 ++++-----
 arch/sh/kernel/iomap.c                |   22 +++----
 drivers/mailbox/bcm-pdc-mailbox.c     |    2 
 drivers/sh/clk/cpg.c                  |    2 
 include/asm-generic/iomap.h           |   28 ++++-----
 include/linux/io-64-nonatomic-hi-lo.h |    4 -
 include/linux/io-64-nonatomic-lo-hi.h |    4 -
 lib/iomap.c                           |   30 +++++-----
 22 files changed, 138 insertions(+), 138 deletions(-)

--- a/arch/alpha/include/asm/core_apecs.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/core_apecs.h
@@ -384,7 +384,7 @@ struct el_apecs_procdata
 		}						\
 	} while (0)
 
-__EXTERN_INLINE unsigned int apecs_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread8(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	unsigned long result, base_and_type;
@@ -420,7 +420,7 @@ __EXTERN_INLINE void apecs_iowrite8(u8 b
 	*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread16(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	unsigned long result, base_and_type;
@@ -456,7 +456,7 @@ __EXTERN_INLINE void apecs_iowrite16(u16
 	*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread32(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	if (addr < APECS_DENSE_MEM)
--- a/arch/alpha/include/asm/core_cia.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/core_cia.h
@@ -342,7 +342,7 @@ struct el_CIA_sysdata_mcheck {
 #define vuip	volatile unsigned int __force *
 #define vulp	volatile unsigned long __force *
 
-__EXTERN_INLINE unsigned int cia_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread8(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	unsigned long result, base_and_type;
@@ -374,7 +374,7 @@ __EXTERN_INLINE void cia_iowrite8(u8 b,
 	*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread16(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	unsigned long result, base_and_type;
@@ -404,7 +404,7 @@ __EXTERN_INLINE void cia_iowrite16(u16 b
 	*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread32(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	if (addr < CIA_DENSE_MEM)
--- a/arch/alpha/include/asm/core_lca.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/core_lca.h
@@ -230,7 +230,7 @@ union el_lca {
 	} while (0)
 
 
-__EXTERN_INLINE unsigned int lca_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread8(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	unsigned long result, base_and_type;
@@ -266,7 +266,7 @@ __EXTERN_INLINE void lca_iowrite8(u8 b,
 	*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int lca_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread16(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	unsigned long result, base_and_type;
@@ -302,7 +302,7 @@ __EXTERN_INLINE void lca_iowrite16(u16 b
 	*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int lca_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread32(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	if (addr < LCA_DENSE_MEM)
--- a/arch/alpha/include/asm/core_marvel.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/core_marvel.h
@@ -332,10 +332,10 @@ struct io7 {
 #define vucp	volatile unsigned char __force *
 #define vusp	volatile unsigned short __force *
 
-extern unsigned int marvel_ioread8(void __iomem *);
+extern unsigned int marvel_ioread8(const void __iomem *);
 extern void marvel_iowrite8(u8 b, void __iomem *);
 
-__EXTERN_INLINE unsigned int marvel_ioread16(void __iomem *addr)
+__EXTERN_INLINE unsigned int marvel_ioread16(const void __iomem *addr)
 {
 	return __kernel_ldwu(*(vusp)addr);
 }
--- a/arch/alpha/include/asm/core_mcpcia.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/core_mcpcia.h
@@ -267,7 +267,7 @@ extern inline int __mcpcia_is_mmio(unsig
 	return (addr & 0x80000000UL) == 0;
 }
 
-__EXTERN_INLINE unsigned int mcpcia_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int mcpcia_ioread8(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long)xaddr & MCPCIA_MEM_MASK;
 	unsigned long hose = (unsigned long)xaddr & ~MCPCIA_MEM_MASK;
@@ -291,7 +291,7 @@ __EXTERN_INLINE void mcpcia_iowrite8(u8
 	*(vuip) ((addr << 5) + hose + 0x00) = w;
 }
 
-__EXTERN_INLINE unsigned int mcpcia_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int mcpcia_ioread16(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long)xaddr & MCPCIA_MEM_MASK;
 	unsigned long hose = (unsigned long)xaddr & ~MCPCIA_MEM_MASK;
@@ -315,7 +315,7 @@ __EXTERN_INLINE void mcpcia_iowrite16(u1
 	*(vuip) ((addr << 5) + hose + 0x08) = w;
 }
 
-__EXTERN_INLINE unsigned int mcpcia_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int mcpcia_ioread32(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long)xaddr;
 
--- a/arch/alpha/include/asm/core_t2.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/core_t2.h
@@ -572,7 +572,7 @@ __EXTERN_INLINE int t2_is_mmio(const vol
    it doesn't make sense to merge the pio and mmio routines.  */
 
 #define IOPORT(OS, NS)							\
-__EXTERN_INLINE unsigned int t2_ioread##NS(void __iomem *xaddr)		\
+__EXTERN_INLINE unsigned int t2_ioread##NS(const void __iomem *xaddr)		\
 {									\
 	if (t2_is_mmio(xaddr))						\
 		return t2_read##OS(xaddr);				\
--- a/arch/alpha/include/asm/io.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/io.h
@@ -150,9 +150,9 @@ static inline void generic_##NAME(TYPE b
 	alpha_mv.mv_##NAME(b, addr);					\
 }
 
-REMAP1(unsigned int, ioread8, /**/)
-REMAP1(unsigned int, ioread16, /**/)
-REMAP1(unsigned int, ioread32, /**/)
+REMAP1(unsigned int, ioread8, const)
+REMAP1(unsigned int, ioread16, const)
+REMAP1(unsigned int, ioread32, const)
 REMAP1(u8, readb, const volatile)
 REMAP1(u16, readw, const volatile)
 REMAP1(u32, readl, const volatile)
@@ -307,7 +307,7 @@ static inline int __is_mmio(const volati
  */
 
 #if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
-extern inline unsigned int ioread8(void __iomem *addr)
+extern inline unsigned int ioread8(const void __iomem *addr)
 {
 	unsigned int ret;
 	mb();
@@ -316,7 +316,7 @@ extern inline unsigned int ioread8(void
 	return ret;
 }
 
-extern inline unsigned int ioread16(void __iomem *addr)
+extern inline unsigned int ioread16(const void __iomem *addr)
 {
 	unsigned int ret;
 	mb();
@@ -359,7 +359,7 @@ extern inline void outw(u16 b, unsigned
 #endif
 
 #if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
-extern inline unsigned int ioread32(void __iomem *addr)
+extern inline unsigned int ioread32(const void __iomem *addr)
 {
 	unsigned int ret;
 	mb();
--- a/arch/alpha/include/asm/io_trivial.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/io_trivial.h
@@ -7,15 +7,15 @@
 
 #if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
 __EXTERN_INLINE unsigned int
-IO_CONCAT(__IO_PREFIX,ioread8)(void __iomem *a)
+IO_CONCAT(__IO_PREFIX,ioread8)(const void __iomem *a)
 {
-	return __kernel_ldbu(*(volatile u8 __force *)a);
+	return __kernel_ldbu(*(const volatile u8 __force *)a);
 }
 
 __EXTERN_INLINE unsigned int
-IO_CONCAT(__IO_PREFIX,ioread16)(void __iomem *a)
+IO_CONCAT(__IO_PREFIX,ioread16)(const void __iomem *a)
 {
-	return __kernel_ldwu(*(volatile u16 __force *)a);
+	return __kernel_ldwu(*(const volatile u16 __force *)a);
 }
 
 __EXTERN_INLINE void
@@ -33,9 +33,9 @@ IO_CONCAT(__IO_PREFIX,iowrite16)(u16 b,
 
 #if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
 __EXTERN_INLINE unsigned int
-IO_CONCAT(__IO_PREFIX,ioread32)(void __iomem *a)
+IO_CONCAT(__IO_PREFIX,ioread32)(const void __iomem *a)
 {
-	return *(volatile u32 __force *)a;
+	return *(const volatile u32 __force *)a;
 }
 
 __EXTERN_INLINE void
@@ -73,14 +73,14 @@ IO_CONCAT(__IO_PREFIX,writew)(u16 b, vol
 __EXTERN_INLINE u8
 IO_CONCAT(__IO_PREFIX,readb)(const volatile void __iomem *a)
 {
-	void __iomem *addr = (void __iomem *)a;
+	const void __iomem *addr = (const void __iomem *)a;
 	return IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 }
 
 __EXTERN_INLINE u16
 IO_CONCAT(__IO_PREFIX,readw)(const volatile void __iomem *a)
 {
-	void __iomem *addr = (void __iomem *)a;
+	const void __iomem *addr = (const void __iomem *)a;
 	return IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 }
 
--- a/arch/alpha/include/asm/jensen.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/jensen.h
@@ -305,7 +305,7 @@ __EXTERN_INLINE int jensen_is_mmio(const
    that it doesn't make sense to merge them.  */
 
 #define IOPORT(OS, NS)							\
-__EXTERN_INLINE unsigned int jensen_ioread##NS(void __iomem *xaddr)	\
+__EXTERN_INLINE unsigned int jensen_ioread##NS(const void __iomem *xaddr)	\
 {									\
 	if (jensen_is_mmio(xaddr))					\
 		return jensen_read##OS(xaddr - 0x100000000ul);		\
--- a/arch/alpha/include/asm/machvec.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/include/asm/machvec.h
@@ -46,9 +46,9 @@ struct alpha_machine_vector
 	void (*mv_pci_tbi)(struct pci_controller *hose,
 			   dma_addr_t start, dma_addr_t end);
 
-	unsigned int (*mv_ioread8)(void __iomem *);
-	unsigned int (*mv_ioread16)(void __iomem *);
-	unsigned int (*mv_ioread32)(void __iomem *);
+	unsigned int (*mv_ioread8)(const void __iomem *);
+	unsigned int (*mv_ioread16)(const void __iomem *);
+	unsigned int (*mv_ioread32)(const void __iomem *);
 
 	void (*mv_iowrite8)(u8, void __iomem *);
 	void (*mv_iowrite16)(u16, void __iomem *);
--- a/arch/alpha/kernel/core_marvel.c~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/kernel/core_marvel.c
@@ -806,7 +806,7 @@ void __iomem *marvel_ioportmap (unsigned
 }
 
 unsigned int
-marvel_ioread8(void __iomem *xaddr)
+marvel_ioread8(const void __iomem *xaddr)
 {
 	unsigned long addr = (unsigned long) xaddr;
 	if (__marvel_is_port_kbd(addr))
--- a/arch/alpha/kernel/io.c~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/alpha/kernel/io.c
@@ -14,7 +14,7 @@
    "generic", which bumps through the machine vector.  */
 
 unsigned int
-ioread8(void __iomem *addr)
+ioread8(const void __iomem *addr)
 {
 	unsigned int ret;
 	mb();
@@ -23,7 +23,7 @@ ioread8(void __iomem *addr)
 	return ret;
 }
 
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
 	unsigned int ret;
 	mb();
@@ -32,7 +32,7 @@ unsigned int ioread16(void __iomem *addr
 	return ret;
 }
 
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
 	unsigned int ret;
 	mb();
@@ -257,7 +257,7 @@ EXPORT_SYMBOL(readq_relaxed);
 /*
  * Read COUNT 8-bit bytes from port PORT into memory starting at SRC.
  */
-void ioread8_rep(void __iomem *port, void *dst, unsigned long count)
+void ioread8_rep(const void __iomem *port, void *dst, unsigned long count)
 {
 	while ((unsigned long)dst & 0x3) {
 		if (!count)
@@ -300,7 +300,7 @@ EXPORT_SYMBOL(insb);
  * the interfaces seems to be slow: just using the inlined version
  * of the inw() breaks things.
  */
-void ioread16_rep(void __iomem *port, void *dst, unsigned long count)
+void ioread16_rep(const void __iomem *port, void *dst, unsigned long count)
 {
 	if (unlikely((unsigned long)dst & 0x3)) {
 		if (!count)
@@ -340,7 +340,7 @@ EXPORT_SYMBOL(insw);
  * but the interfaces seems to be slow: just using the inlined version
  * of the inl() breaks things.
  */
-void ioread32_rep(void __iomem *port, void *dst, unsigned long count)
+void ioread32_rep(const void __iomem *port, void *dst, unsigned long count)
 {
 	if (unlikely((unsigned long)dst & 0x3)) {
 		while (count--) {
--- a/arch/parisc/include/asm/io.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/parisc/include/asm/io.h
@@ -303,8 +303,8 @@ extern void outsl (unsigned long port, c
 #define ioread64be ioread64be
 #define iowrite64 iowrite64
 #define iowrite64be iowrite64be
-extern u64 ioread64(void __iomem *addr);
-extern u64 ioread64be(void __iomem *addr);
+extern u64 ioread64(const void __iomem *addr);
+extern u64 ioread64be(const void __iomem *addr);
 extern void iowrite64(u64 val, void __iomem *addr);
 extern void iowrite64be(u64 val, void __iomem *addr);
 
--- a/arch/parisc/lib/iomap.c~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/parisc/lib/iomap.c
@@ -43,13 +43,13 @@
 #endif
 
 struct iomap_ops {
-	unsigned int (*read8)(void __iomem *);
-	unsigned int (*read16)(void __iomem *);
-	unsigned int (*read16be)(void __iomem *);
-	unsigned int (*read32)(void __iomem *);
-	unsigned int (*read32be)(void __iomem *);
-	u64 (*read64)(void __iomem *);
-	u64 (*read64be)(void __iomem *);
+	unsigned int (*read8)(const void __iomem *);
+	unsigned int (*read16)(const void __iomem *);
+	unsigned int (*read16be)(const void __iomem *);
+	unsigned int (*read32)(const void __iomem *);
+	unsigned int (*read32be)(const void __iomem *);
+	u64 (*read64)(const void __iomem *);
+	u64 (*read64be)(const void __iomem *);
 	void (*write8)(u8, void __iomem *);
 	void (*write16)(u16, void __iomem *);
 	void (*write16be)(u16, void __iomem *);
@@ -57,9 +57,9 @@ struct iomap_ops {
 	void (*write32be)(u32, void __iomem *);
 	void (*write64)(u64, void __iomem *);
 	void (*write64be)(u64, void __iomem *);
-	void (*read8r)(void __iomem *, void *, unsigned long);
-	void (*read16r)(void __iomem *, void *, unsigned long);
-	void (*read32r)(void __iomem *, void *, unsigned long);
+	void (*read8r)(const void __iomem *, void *, unsigned long);
+	void (*read16r)(const void __iomem *, void *, unsigned long);
+	void (*read32r)(const void __iomem *, void *, unsigned long);
 	void (*write8r)(void __iomem *, const void *, unsigned long);
 	void (*write16r)(void __iomem *, const void *, unsigned long);
 	void (*write32r)(void __iomem *, const void *, unsigned long);
@@ -69,17 +69,17 @@ struct iomap_ops {
 
 #define ADDR2PORT(addr) ((unsigned long __force)(addr) & 0xffffff)
 
-static unsigned int ioport_read8(void __iomem *addr)
+static unsigned int ioport_read8(const void __iomem *addr)
 {
 	return inb(ADDR2PORT(addr));
 }
 
-static unsigned int ioport_read16(void __iomem *addr)
+static unsigned int ioport_read16(const void __iomem *addr)
 {
 	return inw(ADDR2PORT(addr));
 }
 
-static unsigned int ioport_read32(void __iomem *addr)
+static unsigned int ioport_read32(const void __iomem *addr)
 {
 	return inl(ADDR2PORT(addr));
 }
@@ -99,17 +99,17 @@ static void ioport_write32(u32 datum, vo
 	outl(datum, ADDR2PORT(addr));
 }
 
-static void ioport_read8r(void __iomem *addr, void *dst, unsigned long count)
+static void ioport_read8r(const void __iomem *addr, void *dst, unsigned long count)
 {
 	insb(ADDR2PORT(addr), dst, count);
 }
 
-static void ioport_read16r(void __iomem *addr, void *dst, unsigned long count)
+static void ioport_read16r(const void __iomem *addr, void *dst, unsigned long count)
 {
 	insw(ADDR2PORT(addr), dst, count);
 }
 
-static void ioport_read32r(void __iomem *addr, void *dst, unsigned long count)
+static void ioport_read32r(const void __iomem *addr, void *dst, unsigned long count)
 {
 	insl(ADDR2PORT(addr), dst, count);
 }
@@ -150,37 +150,37 @@ static const struct iomap_ops ioport_ops
 
 /* Legacy I/O memory ops */
 
-static unsigned int iomem_read8(void __iomem *addr)
+static unsigned int iomem_read8(const void __iomem *addr)
 {
 	return readb(addr);
 }
 
-static unsigned int iomem_read16(void __iomem *addr)
+static unsigned int iomem_read16(const void __iomem *addr)
 {
 	return readw(addr);
 }
 
-static unsigned int iomem_read16be(void __iomem *addr)
+static unsigned int iomem_read16be(const void __iomem *addr)
 {
 	return __raw_readw(addr);
 }
 
-static unsigned int iomem_read32(void __iomem *addr)
+static unsigned int iomem_read32(const void __iomem *addr)
 {
 	return readl(addr);
 }
 
-static unsigned int iomem_read32be(void __iomem *addr)
+static unsigned int iomem_read32be(const void __iomem *addr)
 {
 	return __raw_readl(addr);
 }
 
-static u64 iomem_read64(void __iomem *addr)
+static u64 iomem_read64(const void __iomem *addr)
 {
 	return readq(addr);
 }
 
-static u64 iomem_read64be(void __iomem *addr)
+static u64 iomem_read64be(const void __iomem *addr)
 {
 	return __raw_readq(addr);
 }
@@ -220,7 +220,7 @@ static void iomem_write64be(u64 datum, v
 	__raw_writel(datum, addr);
 }
 
-static void iomem_read8r(void __iomem *addr, void *dst, unsigned long count)
+static void iomem_read8r(const void __iomem *addr, void *dst, unsigned long count)
 {
 	while (count--) {
 		*(u8 *)dst = __raw_readb(addr);
@@ -228,7 +228,7 @@ static void iomem_read8r(void __iomem *a
 	}
 }
 
-static void iomem_read16r(void __iomem *addr, void *dst, unsigned long count)
+static void iomem_read16r(const void __iomem *addr, void *dst, unsigned long count)
 {
 	while (count--) {
 		*(u16 *)dst = __raw_readw(addr);
@@ -236,7 +236,7 @@ static void iomem_read16r(void __iomem *
 	}
 }
 
-static void iomem_read32r(void __iomem *addr, void *dst, unsigned long count)
+static void iomem_read32r(const void __iomem *addr, void *dst, unsigned long count)
 {
 	while (count--) {
 		*(u32 *)dst = __raw_readl(addr);
@@ -297,49 +297,49 @@ static const struct iomap_ops *iomap_ops
 };
 
 
-unsigned int ioread8(void __iomem *addr)
+unsigned int ioread8(const void __iomem *addr)
 {
 	if (unlikely(INDIRECT_ADDR(addr)))
 		return iomap_ops[ADDR_TO_REGION(addr)]->read8(addr);
 	return *((u8 *)addr);
 }
 
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
 	if (unlikely(INDIRECT_ADDR(addr)))
 		return iomap_ops[ADDR_TO_REGION(addr)]->read16(addr);
 	return le16_to_cpup((u16 *)addr);
 }
 
-unsigned int ioread16be(void __iomem *addr)
+unsigned int ioread16be(const void __iomem *addr)
 {
 	if (unlikely(INDIRECT_ADDR(addr)))
 		return iomap_ops[ADDR_TO_REGION(addr)]->read16be(addr);
 	return *((u16 *)addr);
 }
 
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
 	if (unlikely(INDIRECT_ADDR(addr)))
 		return iomap_ops[ADDR_TO_REGION(addr)]->read32(addr);
 	return le32_to_cpup((u32 *)addr);
 }
 
-unsigned int ioread32be(void __iomem *addr)
+unsigned int ioread32be(const void __iomem *addr)
 {
 	if (unlikely(INDIRECT_ADDR(addr)))
 		return iomap_ops[ADDR_TO_REGION(addr)]->read32be(addr);
 	return *((u32 *)addr);
 }
 
-u64 ioread64(void __iomem *addr)
+u64 ioread64(const void __iomem *addr)
 {
 	if (unlikely(INDIRECT_ADDR(addr)))
 		return iomap_ops[ADDR_TO_REGION(addr)]->read64(addr);
 	return le64_to_cpup((u64 *)addr);
 }
 
-u64 ioread64be(void __iomem *addr)
+u64 ioread64be(const void __iomem *addr)
 {
 	if (unlikely(INDIRECT_ADDR(addr)))
 		return iomap_ops[ADDR_TO_REGION(addr)]->read64be(addr);
@@ -411,7 +411,7 @@ void iowrite64be(u64 datum, void __iomem
 
 /* Repeating interfaces */
 
-void ioread8_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	if (unlikely(INDIRECT_ADDR(addr))) {
 		iomap_ops[ADDR_TO_REGION(addr)]->read8r(addr, dst, count);
@@ -423,7 +423,7 @@ void ioread8_rep(void __iomem *addr, voi
 	}
 }
 
-void ioread16_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	if (unlikely(INDIRECT_ADDR(addr))) {
 		iomap_ops[ADDR_TO_REGION(addr)]->read16r(addr, dst, count);
@@ -435,7 +435,7 @@ void ioread16_rep(void __iomem *addr, vo
 	}
 }
 
-void ioread32_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	if (unlikely(INDIRECT_ADDR(addr))) {
 		iomap_ops[ADDR_TO_REGION(addr)]->read32r(addr, dst, count);
--- a/arch/powerpc/kernel/iomap.c~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/powerpc/kernel/iomap.c
@@ -15,23 +15,23 @@
  * Here comes the ppc64 implementation of the IOMAP 
  * interfaces.
  */
-unsigned int ioread8(void __iomem *addr)
+unsigned int ioread8(const void __iomem *addr)
 {
 	return readb(addr);
 }
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
 	return readw(addr);
 }
-unsigned int ioread16be(void __iomem *addr)
+unsigned int ioread16be(const void __iomem *addr)
 {
 	return readw_be(addr);
 }
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
 	return readl(addr);
 }
-unsigned int ioread32be(void __iomem *addr)
+unsigned int ioread32be(const void __iomem *addr)
 {
 	return readl_be(addr);
 }
@@ -41,27 +41,27 @@ EXPORT_SYMBOL(ioread16be);
 EXPORT_SYMBOL(ioread32);
 EXPORT_SYMBOL(ioread32be);
 #ifdef __powerpc64__
-u64 ioread64(void __iomem *addr)
+u64 ioread64(const void __iomem *addr)
 {
 	return readq(addr);
 }
-u64 ioread64_lo_hi(void __iomem *addr)
+u64 ioread64_lo_hi(const void __iomem *addr)
 {
 	return readq(addr);
 }
-u64 ioread64_hi_lo(void __iomem *addr)
+u64 ioread64_hi_lo(const void __iomem *addr)
 {
 	return readq(addr);
 }
-u64 ioread64be(void __iomem *addr)
+u64 ioread64be(const void __iomem *addr)
 {
 	return readq_be(addr);
 }
-u64 ioread64be_lo_hi(void __iomem *addr)
+u64 ioread64be_lo_hi(const void __iomem *addr)
 {
 	return readq_be(addr);
 }
-u64 ioread64be_hi_lo(void __iomem *addr)
+u64 ioread64be_hi_lo(const void __iomem *addr)
 {
 	return readq_be(addr);
 }
@@ -139,15 +139,15 @@ EXPORT_SYMBOL(iowrite64be_hi_lo);
  * FIXME! We could make these do EEH handling if we really
  * wanted. Not clear if we do.
  */
-void ioread8_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	readsb(addr, dst, count);
 }
-void ioread16_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	readsw(addr, dst, count);
 }
-void ioread32_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	readsl(addr, dst, count);
 }
--- a/arch/sh/kernel/iomap.c~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/arch/sh/kernel/iomap.c
@@ -8,31 +8,31 @@
 #include <linux/module.h>
 #include <linux/io.h>
 
-unsigned int ioread8(void __iomem *addr)
+unsigned int ioread8(const void __iomem *addr)
 {
 	return readb(addr);
 }
 EXPORT_SYMBOL(ioread8);
 
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
 	return readw(addr);
 }
 EXPORT_SYMBOL(ioread16);
 
-unsigned int ioread16be(void __iomem *addr)
+unsigned int ioread16be(const void __iomem *addr)
 {
 	return be16_to_cpu(__raw_readw(addr));
 }
 EXPORT_SYMBOL(ioread16be);
 
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
 	return readl(addr);
 }
 EXPORT_SYMBOL(ioread32);
 
-unsigned int ioread32be(void __iomem *addr)
+unsigned int ioread32be(const void __iomem *addr)
 {
 	return be32_to_cpu(__raw_readl(addr));
 }
@@ -74,7 +74,7 @@ EXPORT_SYMBOL(iowrite32be);
  * convert to CPU byte order. We write in "IO byte
  * order" (we also don't have IO barriers).
  */
-static inline void mmio_insb(void __iomem *addr, u8 *dst, int count)
+static inline void mmio_insb(const void __iomem *addr, u8 *dst, int count)
 {
 	while (--count >= 0) {
 		u8 data = __raw_readb(addr);
@@ -83,7 +83,7 @@ static inline void mmio_insb(void __iome
 	}
 }
 
-static inline void mmio_insw(void __iomem *addr, u16 *dst, int count)
+static inline void mmio_insw(const void __iomem *addr, u16 *dst, int count)
 {
 	while (--count >= 0) {
 		u16 data = __raw_readw(addr);
@@ -92,7 +92,7 @@ static inline void mmio_insw(void __iome
 	}
 }
 
-static inline void mmio_insl(void __iomem *addr, u32 *dst, int count)
+static inline void mmio_insl(const void __iomem *addr, u32 *dst, int count)
 {
 	while (--count >= 0) {
 		u32 data = __raw_readl(addr);
@@ -125,19 +125,19 @@ static inline void mmio_outsl(void __iom
 	}
 }
 
-void ioread8_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	mmio_insb(addr, dst, count);
 }
 EXPORT_SYMBOL(ioread8_rep);
 
-void ioread16_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	mmio_insw(addr, dst, count);
 }
 EXPORT_SYMBOL(ioread16_rep);
 
-void ioread32_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	mmio_insl(addr, dst, count);
 }
--- a/drivers/mailbox/bcm-pdc-mailbox.c~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/drivers/mailbox/bcm-pdc-mailbox.c
@@ -679,7 +679,7 @@ pdc_receive(struct pdc_state *pdcs)
 
 	/* read last_rx_curr from register once */
 	pdcs->last_rx_curr =
-	    (ioread32(&pdcs->rxregs_64->status0) &
+	    (ioread32((const void __iomem *)&pdcs->rxregs_64->status0) &
 	     CRYPTO_D64_RS0_CD_MASK) / RING_ENTRY_SIZE;
 
 	do {
--- a/drivers/sh/clk/cpg.c~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/drivers/sh/clk/cpg.c
@@ -40,7 +40,7 @@ static int sh_clk_mstp_enable(struct clk
 {
 	sh_clk_write(sh_clk_read(clk) & ~(1 << clk->enable_bit), clk);
 	if (clk->status_reg) {
-		unsigned int (*read)(void __iomem *addr);
+		unsigned int (*read)(const void __iomem *addr);
 		int i;
 		void __iomem *mapped_status = (phys_addr_t)clk->status_reg -
 			(phys_addr_t)clk->enable_reg + clk->mapped_reg;
--- a/include/asm-generic/iomap.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/include/asm-generic/iomap.h
@@ -26,14 +26,14 @@
  * in the low address range. Architectures for which this is not
  * true can't use this generic implementation.
  */
-extern unsigned int ioread8(void __iomem *);
-extern unsigned int ioread16(void __iomem *);
-extern unsigned int ioread16be(void __iomem *);
-extern unsigned int ioread32(void __iomem *);
-extern unsigned int ioread32be(void __iomem *);
+extern unsigned int ioread8(const void __iomem *);
+extern unsigned int ioread16(const void __iomem *);
+extern unsigned int ioread16be(const void __iomem *);
+extern unsigned int ioread32(const void __iomem *);
+extern unsigned int ioread32be(const void __iomem *);
 #ifdef CONFIG_64BIT
-extern u64 ioread64(void __iomem *);
-extern u64 ioread64be(void __iomem *);
+extern u64 ioread64(const void __iomem *);
+extern u64 ioread64be(const void __iomem *);
 #endif
 
 #ifdef readq
@@ -41,10 +41,10 @@ extern u64 ioread64be(void __iomem *);
 #define ioread64_hi_lo ioread64_hi_lo
 #define ioread64be_lo_hi ioread64be_lo_hi
 #define ioread64be_hi_lo ioread64be_hi_lo
-extern u64 ioread64_lo_hi(void __iomem *addr);
-extern u64 ioread64_hi_lo(void __iomem *addr);
-extern u64 ioread64be_lo_hi(void __iomem *addr);
-extern u64 ioread64be_hi_lo(void __iomem *addr);
+extern u64 ioread64_lo_hi(const void __iomem *addr);
+extern u64 ioread64_hi_lo(const void __iomem *addr);
+extern u64 ioread64be_lo_hi(const void __iomem *addr);
+extern u64 ioread64be_hi_lo(const void __iomem *addr);
 #endif
 
 extern void iowrite8(u8, void __iomem *);
@@ -79,9 +79,9 @@ extern void iowrite64be_hi_lo(u64 val, v
  * memory across multiple ports, use "memcpy_toio()"
  * and friends.
  */
-extern void ioread8_rep(void __iomem *port, void *buf, unsigned long count);
-extern void ioread16_rep(void __iomem *port, void *buf, unsigned long count);
-extern void ioread32_rep(void __iomem *port, void *buf, unsigned long count);
+extern void ioread8_rep(const void __iomem *port, void *buf, unsigned long count);
+extern void ioread16_rep(const void __iomem *port, void *buf, unsigned long count);
+extern void ioread32_rep(const void __iomem *port, void *buf, unsigned long count);
 
 extern void iowrite8_rep(void __iomem *port, const void *buf, unsigned long count);
 extern void iowrite16_rep(void __iomem *port, const void *buf, unsigned long count);
--- a/include/linux/io-64-nonatomic-hi-lo.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/include/linux/io-64-nonatomic-hi-lo.h
@@ -57,7 +57,7 @@ static inline void hi_lo_writeq_relaxed(
 
 #ifndef ioread64_hi_lo
 #define ioread64_hi_lo ioread64_hi_lo
-static inline u64 ioread64_hi_lo(void __iomem *addr)
+static inline u64 ioread64_hi_lo(const void __iomem *addr)
 {
 	u32 low, high;
 
@@ -79,7 +79,7 @@ static inline void iowrite64_hi_lo(u64 v
 
 #ifndef ioread64be_hi_lo
 #define ioread64be_hi_lo ioread64be_hi_lo
-static inline u64 ioread64be_hi_lo(void __iomem *addr)
+static inline u64 ioread64be_hi_lo(const void __iomem *addr)
 {
 	u32 low, high;
 
--- a/include/linux/io-64-nonatomic-lo-hi.h~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/include/linux/io-64-nonatomic-lo-hi.h
@@ -57,7 +57,7 @@ static inline void lo_hi_writeq_relaxed(
 
 #ifndef ioread64_lo_hi
 #define ioread64_lo_hi ioread64_lo_hi
-static inline u64 ioread64_lo_hi(void __iomem *addr)
+static inline u64 ioread64_lo_hi(const void __iomem *addr)
 {
 	u32 low, high;
 
@@ -79,7 +79,7 @@ static inline void iowrite64_lo_hi(u64 v
 
 #ifndef ioread64be_lo_hi
 #define ioread64be_lo_hi ioread64be_lo_hi
-static inline u64 ioread64be_lo_hi(void __iomem *addr)
+static inline u64 ioread64be_lo_hi(const void __iomem *addr)
 {
 	u32 low, high;
 
--- a/lib/iomap.c~iomap-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/lib/iomap.c
@@ -70,27 +70,27 @@ static void bad_io_access(unsigned long
 #define mmio_read64be(addr) swab64(readq(addr))
 #endif
 
-unsigned int ioread8(void __iomem *addr)
+unsigned int ioread8(const void __iomem *addr)
 {
 	IO_COND(addr, return inb(port), return readb(addr));
 	return 0xff;
 }
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
 	IO_COND(addr, return inw(port), return readw(addr));
 	return 0xffff;
 }
-unsigned int ioread16be(void __iomem *addr)
+unsigned int ioread16be(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr));
 	return 0xffff;
 }
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
 	IO_COND(addr, return inl(port), return readl(addr));
 	return 0xffffffff;
 }
-unsigned int ioread32be(void __iomem *addr)
+unsigned int ioread32be(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr));
 	return 0xffffffff;
@@ -142,26 +142,26 @@ static u64 pio_read64be_hi_lo(unsigned l
 	return lo | (hi << 32);
 }
 
-u64 ioread64_lo_hi(void __iomem *addr)
+u64 ioread64_lo_hi(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read64_lo_hi(port), return readq(addr));
 	return 0xffffffffffffffffULL;
 }
 
-u64 ioread64_hi_lo(void __iomem *addr)
+u64 ioread64_hi_lo(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read64_hi_lo(port), return readq(addr));
 	return 0xffffffffffffffffULL;
 }
 
-u64 ioread64be_lo_hi(void __iomem *addr)
+u64 ioread64be_lo_hi(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read64be_lo_hi(port),
 		return mmio_read64be(addr));
 	return 0xffffffffffffffffULL;
 }
 
-u64 ioread64be_hi_lo(void __iomem *addr)
+u64 ioread64be_hi_lo(const void __iomem *addr)
 {
 	IO_COND(addr, return pio_read64be_hi_lo(port),
 		return mmio_read64be(addr));
@@ -275,7 +275,7 @@ EXPORT_SYMBOL(iowrite64be_hi_lo);
  * order" (we also don't have IO barriers).
  */
 #ifndef mmio_insb
-static inline void mmio_insb(void __iomem *addr, u8 *dst, int count)
+static inline void mmio_insb(const void __iomem *addr, u8 *dst, int count)
 {
 	while (--count >= 0) {
 		u8 data = __raw_readb(addr);
@@ -283,7 +283,7 @@ static inline void mmio_insb(void __iome
 		dst++;
 	}
 }
-static inline void mmio_insw(void __iomem *addr, u16 *dst, int count)
+static inline void mmio_insw(const void __iomem *addr, u16 *dst, int count)
 {
 	while (--count >= 0) {
 		u16 data = __raw_readw(addr);
@@ -291,7 +291,7 @@ static inline void mmio_insw(void __iome
 		dst++;
 	}
 }
-static inline void mmio_insl(void __iomem *addr, u32 *dst, int count)
+static inline void mmio_insl(const void __iomem *addr, u32 *dst, int count)
 {
 	while (--count >= 0) {
 		u32 data = __raw_readl(addr);
@@ -325,15 +325,15 @@ static inline void mmio_outsl(void __iom
 }
 #endif
 
-void ioread8_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	IO_COND(addr, insb(port,dst,count), mmio_insb(addr, dst, count));
 }
-void ioread16_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	IO_COND(addr, insw(port,dst,count), mmio_insw(addr, dst, count));
 }
-void ioread32_rep(void __iomem *addr, void *dst, unsigned long count)
+void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count)
 {
 	IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count));
 }
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 37/39] rtl818x: constify ioreadX() iomem argument (as in generic implementation)
  2020-08-15  0:29 incoming Andrew Morton
                   ` (35 preceding siblings ...)
  2020-08-15  0:32 ` [patch 36/39] iomap: constify ioreadX() iomem argument (as in generic implementation) Andrew Morton
@ 2020-08-15  0:32 ` Andrew Morton
  2020-08-15  0:32 ` [patch 38/39] ntb: intel: " Andrew Morton
                   ` (102 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:32 UTC (permalink / raw)
  To: akpm, allenbh, arnd, benh, dalias, dave.jiang, davem, deller,
	geert+renesas, geert, ink, James.Bottomley, jasowang, jdmason,
	krzk, kuba, kvalo, linux-mm, mattst88, mm-commits, mpe, mst,
	paulus, rth, torvalds, ysato

From: Krzysztof Kozlowski <krzk@kernel.org>
Subject: rtl818x: constify ioreadX() iomem argument (as in generic implementation)

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

Link: http://lkml.kernel.org/r/20200709072837.5869-3-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h~rtl818x-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h
@@ -150,17 +150,17 @@ void rtl8180_write_phy(struct ieee80211_
 void rtl8180_set_anaparam(struct rtl8180_priv *priv, u32 anaparam);
 void rtl8180_set_anaparam2(struct rtl8180_priv *priv, u32 anaparam2);
 
-static inline u8 rtl818x_ioread8(struct rtl8180_priv *priv, u8 __iomem *addr)
+static inline u8 rtl818x_ioread8(struct rtl8180_priv *priv, const u8 __iomem *addr)
 {
 	return ioread8(addr);
 }
 
-static inline u16 rtl818x_ioread16(struct rtl8180_priv *priv, __le16 __iomem *addr)
+static inline u16 rtl818x_ioread16(struct rtl8180_priv *priv, const __le16 __iomem *addr)
 {
 	return ioread16(addr);
 }
 
-static inline u32 rtl818x_ioread32(struct rtl8180_priv *priv, __le32 __iomem *addr)
+static inline u32 rtl818x_ioread32(struct rtl8180_priv *priv, const __le32 __iomem *addr)
 {
 	return ioread32(addr);
 }
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 38/39] ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
  2020-08-15  0:29 incoming Andrew Morton
                   ` (36 preceding siblings ...)
  2020-08-15  0:32 ` [patch 37/39] rtl818x: " Andrew Morton
@ 2020-08-15  0:32 ` Andrew Morton
  2020-08-15  0:32 ` [patch 39/39] virtio: pci: " Andrew Morton
                   ` (101 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:32 UTC (permalink / raw)
  To: akpm, allenbh, arnd, benh, dalias, dave.jiang, davem, deller,
	geert+renesas, geert, ink, James.Bottomley, jasowang, jdmason,
	krzk, kuba, kvalo, linux-mm, mattst88, mm-commits, mpe, mst,
	paulus, rth, torvalds, ysato

From: Krzysztof Kozlowski <krzk@kernel.org>
Subject: ntb: intel: constify ioreadX() iomem argument (as in generic implementation)

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

Link: http://lkml.kernel.org/r/20200709072837.5869-4-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/ntb/hw/intel/ntb_hw_gen1.c  |    2 +-
 drivers/ntb/hw/intel/ntb_hw_gen3.h  |    2 +-
 drivers/ntb/hw/intel/ntb_hw_intel.h |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/ntb/hw/intel/ntb_hw_gen1.c~ntb-intel-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/drivers/ntb/hw/intel/ntb_hw_gen1.c
@@ -1205,7 +1205,7 @@ int intel_ntb_peer_spad_write(struct ntb
 			       ndev->peer_reg->spad);
 }
 
-static u64 xeon_db_ioread(void __iomem *mmio)
+static u64 xeon_db_ioread(const void __iomem *mmio)
 {
 	return (u64)ioread16(mmio);
 }
--- a/drivers/ntb/hw/intel/ntb_hw_gen3.h~ntb-intel-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/drivers/ntb/hw/intel/ntb_hw_gen3.h
@@ -91,7 +91,7 @@
 #define GEN3_DB_TOTAL_SHIFT		33
 #define GEN3_SPAD_COUNT			16
 
-static inline u64 gen3_db_ioread(void __iomem *mmio)
+static inline u64 gen3_db_ioread(const void __iomem *mmio)
 {
 	return ioread64(mmio);
 }
--- a/drivers/ntb/hw/intel/ntb_hw_intel.h~ntb-intel-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/drivers/ntb/hw/intel/ntb_hw_intel.h
@@ -103,7 +103,7 @@ struct intel_ntb_dev;
 struct intel_ntb_reg {
 	int (*poll_link)(struct intel_ntb_dev *ndev);
 	int (*link_is_up)(struct intel_ntb_dev *ndev);
-	u64 (*db_ioread)(void __iomem *mmio);
+	u64 (*db_ioread)(const void __iomem *mmio);
 	void (*db_iowrite)(u64 db_bits, void __iomem *mmio);
 	unsigned long			ntb_ctl;
 	resource_size_t			db_size;
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* [patch 39/39] virtio: pci: constify ioreadX() iomem argument (as in generic implementation)
  2020-08-15  0:29 incoming Andrew Morton
                   ` (37 preceding siblings ...)
  2020-08-15  0:32 ` [patch 38/39] ntb: intel: " Andrew Morton
@ 2020-08-15  0:32 ` Andrew Morton
  2020-08-18 23:03 ` + mailmap-add-andi-kleen.patch added to -mm tree Andrew Morton
                   ` (100 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-15  0:32 UTC (permalink / raw)
  To: akpm, allenbh, arnd, benh, dalias, dave.jiang, davem, deller,
	geert+renesas, geert, ink, James.Bottomley, jasowang, jdmason,
	krzk, kuba, kvalo, linux-mm, mattst88, mm-commits, mpe, mst,
	paulus, rth, torvalds, ysato

From: Krzysztof Kozlowski <krzk@kernel.org>
Subject: virtio: pci: constify ioreadX() iomem argument (as in generic implementation)

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

Link: http://lkml.kernel.org/r/20200709072837.5869-5-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/virtio/virtio_pci_modern.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/virtio/virtio_pci_modern.c~virtio-pci-constify-ioreadx-iomem-argument-as-in-generic-implementation
+++ a/drivers/virtio/virtio_pci_modern.c
@@ -27,16 +27,16 @@
  * method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses
  * for 16-bit fields and 8-bit accesses for 8-bit fields.
  */
-static inline u8 vp_ioread8(u8 __iomem *addr)
+static inline u8 vp_ioread8(const u8 __iomem *addr)
 {
 	return ioread8(addr);
 }
-static inline u16 vp_ioread16 (__le16 __iomem *addr)
+static inline u16 vp_ioread16 (const __le16 __iomem *addr)
 {
 	return ioread16(addr);
 }
 
-static inline u32 vp_ioread32(__le32 __iomem *addr)
+static inline u32 vp_ioread32(const __le32 __iomem *addr)
 {
 	return ioread32(addr);
 }
_

^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mailmap-add-andi-kleen.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (38 preceding siblings ...)
  2020-08-15  0:32 ` [patch 39/39] virtio: pci: " Andrew Morton
@ 2020-08-18 23:03 ` Andrew Morton
  2020-08-18 23:05 ` + mm-account-pmd-tables-like-pte-tables.patch " Andrew Morton
                   ` (99 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:03 UTC (permalink / raw)
  To: ak, corbet, keescook, mm-commits, ndesaulniers, qperret


The patch titled
     Subject: mailmap: add Andi Kleen
has been added to the -mm tree.  Its filename is
     mailmap-add-andi-kleen.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mailmap-add-andi-kleen.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mailmap-add-andi-kleen.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Nick Desaulniers <ndesaulniers@google.com>
Subject: mailmap: add Andi Kleen

I keep getting bounce back from the suse.de address.

Link: http://lkml.kernel.org/r/20200818203214.659955-1-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Quentin Perret <qperret@qperret.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .mailmap |    1 +
 1 file changed, 1 insertion(+)

--- a/.mailmap~mailmap-add-andi-kleen
+++ a/.mailmap
@@ -32,6 +32,7 @@ Alex Shi <alex.shi@linux.alibaba.com> <a
 Alex Shi <alex.shi@linux.alibaba.com> <alex.shi@linaro.org>
 Al Viro <viro@ftp.linux.org.uk>
 Al Viro <viro@zenIV.linux.org.uk>
+Andi Kleen <ak@linux.intel.com> <ak@suse.de>
 Andi Shyti <andi@etezian.org> <andi.shyti@samsung.com>
 Andreas Herrmann <aherrman@de.ibm.com>
 Andrew Morton <akpm@linux-foundation.org>
_

Patches currently in -mm which might be from ndesaulniers@google.com are

mailmap-add-andi-kleen.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-account-pmd-tables-like-pte-tables.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (39 preceding siblings ...)
  2020-08-18 23:03 ` + mailmap-add-andi-kleen.patch added to -mm tree Andrew Morton
@ 2020-08-18 23:05 ` Andrew Morton
  2020-08-18 23:09 ` + mm-remove-activate_page-from-unuse_pte.patch " Andrew Morton
                   ` (98 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:05 UTC (permalink / raw)
  To: abdhalee, arnd, christophe.leroy, jcmvbkbc, joro, luto,
	mm-commits, peterz, rppt, sathnaga, shorne, willy


The patch titled
     Subject: mm: account PMD tables like PTE tables
has been added to the -mm tree.  Its filename is
     mm-account-pmd-tables-like-pte-tables.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-account-pmd-tables-like-pte-tables.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-account-pmd-tables-like-pte-tables.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Matthew Wilcox <willy@infradead.org>
Subject: mm: account PMD tables like PTE tables

We account the PTE level of the page tables to the process in order to
make smarter OOM decisions and help diagnose why memory is fragmented. 
For these same reasons, we should account pages allocated for PMDs.  With
larger process address spaces and ASLR, the number of PMDs in use is
higher than it used to be so the inaccuracy is starting to matter.

Link: http://lkml.kernel.org/r/20200627184642.GF25039@casper.infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

--- a/include/linux/mm.h~mm-account-pmd-tables-like-pte-tables
+++ a/include/linux/mm.h
@@ -2239,7 +2239,7 @@ static inline spinlock_t *pmd_lockptr(st
 	return ptlock_ptr(pmd_to_page(pmd));
 }
 
-static inline bool pgtable_pmd_page_ctor(struct page *page)
+static inline bool pmd_ptlock_init(struct page *page)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	page->pmd_huge_pte = NULL;
@@ -2247,7 +2247,7 @@ static inline bool pgtable_pmd_page_ctor
 	return ptlock_init(page);
 }
 
-static inline void pgtable_pmd_page_dtor(struct page *page)
+static inline void pmd_ptlock_free(struct page *page)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
@@ -2264,8 +2264,8 @@ static inline spinlock_t *pmd_lockptr(st
 	return &mm->page_table_lock;
 }
 
-static inline bool pgtable_pmd_page_ctor(struct page *page) { return true; }
-static inline void pgtable_pmd_page_dtor(struct page *page) {}
+static inline bool pmd_ptlock_init(struct page *page) { return true; }
+static inline void pmd_ptlock_free(struct page *page) {}
 
 #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
 
@@ -2278,6 +2278,22 @@ static inline spinlock_t *pmd_lock(struc
 	return ptl;
 }
 
+static inline bool pgtable_pmd_page_ctor(struct page *page)
+{
+	if (!pmd_ptlock_init(page))
+		return false;
+	__SetPageTable(page);
+	inc_zone_page_state(page, NR_PAGETABLE);
+	return true;
+}
+
+static inline void pgtable_pmd_page_dtor(struct page *page)
+{
+	pmd_ptlock_free(page);
+	__ClearPageTable(page);
+	dec_zone_page_state(page, NR_PAGETABLE);
+}
+
 /*
  * No scalability reason to split PUD locks yet, but follow the same pattern
  * as the PMD locks to make it easier if we decide to.  The VM should not be
_

Patches currently in -mm which might be from willy@infradead.org are

mm-account-pmd-tables-like-pte-tables.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-remove-activate_page-from-unuse_pte.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (40 preceding siblings ...)
  2020-08-18 23:05 ` + mm-account-pmd-tables-like-pte-tables.patch " Andrew Morton
@ 2020-08-18 23:09 ` Andrew Morton
  2020-08-18 23:09 ` + mm-remove-superfluous-__clearpageactive.patch " Andrew Morton
                   ` (97 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:09 UTC (permalink / raw)
  To: alexander.h.duyck, cai, david, hughd, iamjoonsoo.kim, mgorman,
	mhocko, mm-commits, npiggin, yang.shi, ying.huang, yuzhao


The patch titled
     Subject: mm: remove activate_page() from unuse_pte()
has been added to the -mm tree.  Its filename is
     mm-remove-activate_page-from-unuse_pte.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-remove-activate_page-from-unuse_pte.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-remove-activate_page-from-unuse_pte.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Yu Zhao <yuzhao@google.com>
Subject: mm: remove activate_page() from unuse_pte()

We don't initially add anon pages to active lruvec after commit
b518154e59aa ("mm/vmscan: protect the workingset on anonymous LRU"). 
Remove activate_page() from unuse_pte(), which seems to be missed by the
commit.  And make the function static while we are at it.

Before the commit, we called lru_cache_add_active_or_unevictable() to add
new ksm pages to active lruvec.  Therefore, activate_page() wasn't
necessary for them in the first place.

Link: http://lkml.kernel.org/r/20200818184704.3625199-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/swap.h |    1 -
 mm/swap.c            |    4 ++--
 mm/swapfile.c        |    5 -----
 3 files changed, 2 insertions(+), 8 deletions(-)

--- a/include/linux/swap.h~mm-remove-activate_page-from-unuse_pte
+++ a/include/linux/swap.h
@@ -340,7 +340,6 @@ extern void lru_note_cost_page(struct pa
 extern void lru_cache_add(struct page *);
 extern void lru_add_page_tail(struct page *page, struct page *page_tail,
 			 struct lruvec *lruvec, struct list_head *head);
-extern void activate_page(struct page *);
 extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
--- a/mm/swap.c~mm-remove-activate_page-from-unuse_pte
+++ a/mm/swap.c
@@ -348,7 +348,7 @@ static bool need_activate_page_drain(int
 	return pagevec_count(&per_cpu(lru_pvecs.activate_page, cpu)) != 0;
 }
 
-void activate_page(struct page *page)
+static void activate_page(struct page *page)
 {
 	page = compound_head(page);
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
@@ -368,7 +368,7 @@ static inline void activate_page_drain(i
 {
 }
 
-void activate_page(struct page *page)
+static void activate_page(struct page *page)
 {
 	pg_data_t *pgdat = page_pgdat(page);
 
--- a/mm/swapfile.c~mm-remove-activate_page-from-unuse_pte
+++ a/mm/swapfile.c
@@ -1925,11 +1925,6 @@ static int unuse_pte(struct vm_area_stru
 		lru_cache_add_inactive_or_unevictable(page, vma);
 	}
 	swap_free(entry);
-	/*
-	 * Move the page to the active list so it is not
-	 * immediately swapped out again after swapon.
-	 */
-	activate_page(page);
 out:
 	pte_unmap_unlock(pte, ptl);
 	if (page != swapcache) {
_

Patches currently in -mm which might be from yuzhao@google.com are

mm-remove-activate_page-from-unuse_pte.patch
mm-remove-superfluous-__clearpageactive.patch
mm-remove-superfluous-__clearpagewaiters.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-remove-superfluous-__clearpageactive.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (41 preceding siblings ...)
  2020-08-18 23:09 ` + mm-remove-activate_page-from-unuse_pte.patch " Andrew Morton
@ 2020-08-18 23:09 ` Andrew Morton
  2020-08-18 23:09 ` + mm-remove-superfluous-__clearpagewaiters.patch " Andrew Morton
                   ` (96 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:09 UTC (permalink / raw)
  To: alexander.h.duyck, cai, david, hughd, iamjoonsoo.kim, mgorman,
	mhocko, mm-commits, npiggin, yang.shi, ying.huang, yuzhao


The patch titled
     Subject: mm: remove superfluous __ClearPageActive()
has been added to the -mm tree.  Its filename is
     mm-remove-superfluous-__clearpageactive.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-remove-superfluous-__clearpageactive.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-remove-superfluous-__clearpageactive.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Yu Zhao <yuzhao@google.com>
Subject: mm: remove superfluous __ClearPageActive()

To activate a page, mark_page_accessed() always holds a reference on it. 
It either gets a new reference when adding a page to
lru_pvecs.activate_page or reuses an existing one it previously got when
it added a page to lru_pvecs.lru_add.  So it doesn't call SetPageActive()
on a page that doesn't have any reference left.  Therefore, the race is
impossible these days (I didn't brother to dig into its history).

For other paths, namely reclaim and migration, a reference count is always
held while calling SetPageActive() on a page.

SetPageSlabPfmemalloc() also uses SetPageActive(), but it's irrelevant to
LRU pages.

Link: http://lkml.kernel.org/r/20200818184704.3625199-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memremap.c |    2 --
 mm/swap.c     |    2 --
 2 files changed, 4 deletions(-)

--- a/mm/memremap.c~mm-remove-superfluous-__clearpageactive
+++ a/mm/memremap.c
@@ -471,8 +471,6 @@ void free_devmap_managed_page(struct pag
 		return;
 	}
 
-	/* Clear Active bit in case of parallel mark_page_accessed */
-	__ClearPageActive(page);
 	__ClearPageWaiters(page);
 
 	mem_cgroup_uncharge(page);
--- a/mm/swap.c~mm-remove-superfluous-__clearpageactive
+++ a/mm/swap.c
@@ -900,8 +900,6 @@ void release_pages(struct page **pages,
 			del_page_from_lru_list(page, lruvec, page_off_lru(page));
 		}
 
-		/* Clear Active bit in case of parallel mark_page_accessed */
-		__ClearPageActive(page);
 		__ClearPageWaiters(page);
 
 		list_add(&page->lru, &pages_to_free);
_

Patches currently in -mm which might be from yuzhao@google.com are

mm-remove-activate_page-from-unuse_pte.patch
mm-remove-superfluous-__clearpageactive.patch
mm-remove-superfluous-__clearpagewaiters.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-remove-superfluous-__clearpagewaiters.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (42 preceding siblings ...)
  2020-08-18 23:09 ` + mm-remove-superfluous-__clearpageactive.patch " Andrew Morton
@ 2020-08-18 23:09 ` Andrew Morton
  2020-08-18 23:49 ` + mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix.patch " Andrew Morton
                   ` (95 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:09 UTC (permalink / raw)
  To: alexander.h.duyck, cai, david, hughd, iamjoonsoo.kim, mgorman,
	mhocko, mm-commits, npiggin, yang.shi, ying.huang, yuzhao


The patch titled
     Subject: mm: remove superfluous __ClearPageWaiters()
has been added to the -mm tree.  Its filename is
     mm-remove-superfluous-__clearpagewaiters.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-remove-superfluous-__clearpagewaiters.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-remove-superfluous-__clearpagewaiters.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Yu Zhao <yuzhao@google.com>
Subject: mm: remove superfluous __ClearPageWaiters()

Presumably __ClearPageWaiters() was added to follow the previously removed
__ClearPageActive() pattern.

Only flags that are in PAGE_FLAGS_CHECK_AT_FREE needs to be properly
cleared because otherwise we think there may be some kind of leak. 
PG_waiters is not one of those flags and leaving the clearing to
PAGE_FLAGS_CHECK_AT_PREP is more appropriate.

Link: http://lkml.kernel.org/r/20200818184704.3625199-3-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/page-flags.h |    2 +-
 mm/filemap.c               |    2 ++
 mm/memremap.c              |    2 --
 mm/swap.c                  |    3 ---
 4 files changed, 3 insertions(+), 6 deletions(-)

--- a/include/linux/page-flags.h~mm-remove-superfluous-__clearpagewaiters
+++ a/include/linux/page-flags.h
@@ -318,7 +318,7 @@ static inline int TestClearPage##uname(s
 	TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)
 
 __PAGEFLAG(Locked, locked, PF_NO_TAIL)
-PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, PF_ONLY_HEAD)
+PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD)
 PAGEFLAG(Error, error, PF_NO_TAIL) TESTCLEARFLAG(Error, error, PF_NO_TAIL)
 PAGEFLAG(Referenced, referenced, PF_HEAD)
 	TESTCLEARFLAG(Referenced, referenced, PF_HEAD)
--- a/mm/filemap.c~mm-remove-superfluous-__clearpagewaiters
+++ a/mm/filemap.c
@@ -1079,6 +1079,8 @@ static void wake_up_page_bit(struct page
 		 * other pages on it.
 		 *
 		 * That's okay, it's a rare case. The next waker will clear it.
+		 * Otherwise the bit will be cleared by PAGE_FLAGS_CHECK_AT_PREP
+		 * when the page is being freed.
 		 */
 	}
 	spin_unlock_irqrestore(&q->lock, flags);
--- a/mm/memremap.c~mm-remove-superfluous-__clearpagewaiters
+++ a/mm/memremap.c
@@ -471,8 +471,6 @@ void free_devmap_managed_page(struct pag
 		return;
 	}
 
-	__ClearPageWaiters(page);
-
 	mem_cgroup_uncharge(page);
 
 	/*
--- a/mm/swap.c~mm-remove-superfluous-__clearpagewaiters
+++ a/mm/swap.c
@@ -90,7 +90,6 @@ static void __page_cache_release(struct
 		del_page_from_lru_list(page, lruvec, page_off_lru(page));
 		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
 	}
-	__ClearPageWaiters(page);
 }
 
 static void __put_single_page(struct page *page)
@@ -900,8 +899,6 @@ void release_pages(struct page **pages,
 			del_page_from_lru_list(page, lruvec, page_off_lru(page));
 		}
 
-		__ClearPageWaiters(page);
-
 		list_add(&page->lru, &pages_to_free);
 	}
 	if (locked_pgdat)
_

Patches currently in -mm which might be from yuzhao@google.com are

mm-remove-activate_page-from-unuse_pte.patch
mm-remove-superfluous-__clearpageactive.patch
mm-remove-superfluous-__clearpagewaiters.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (43 preceding siblings ...)
  2020-08-18 23:09 ` + mm-remove-superfluous-__clearpagewaiters.patch " Andrew Morton
@ 2020-08-18 23:49 ` Andrew Morton
  2020-08-18 23:50 ` + mm-slab-remove-duplicate-include.patch " Andrew Morton
                   ` (94 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:49 UTC (permalink / raw)
  To: minchan, mm-commits, yuehaibing


The patch titled
     Subject: mm/madvise: Remove duplicate include
has been added to the -mm tree.  Its filename is
     mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: YueHaibing <yuehaibing@huawei.com>
Subject: mm/madvise: Remove duplicate include

Remove duplicate header which is included twice.

Link: http://lkml.kernel.org/r/20200818114448.45492-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |    1 -
 1 file changed, 1 deletion(-)

--- a/mm/madvise.c~mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix
+++ a/mm/madvise.c
@@ -17,7 +17,6 @@
 #include <linux/falloc.h>
 #include <linux/fadvise.h>
 #include <linux/sched.h>
-#include <linux/sched/mm.h>
 #include <linux/ksm.h>
 #include <linux/fs.h>
 #include <linux/file.h>
_

Patches currently in -mm which might be from yuehaibing@huawei.com are

mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-slab-remove-duplicate-include.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (44 preceding siblings ...)
  2020-08-18 23:49 ` + mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix.patch " Andrew Morton
@ 2020-08-18 23:50 ` Andrew Morton
  2020-08-18 23:53 ` + mm-memory-fix-typo-in-__do_fault-comment.patch " Andrew Morton
                   ` (93 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:50 UTC (permalink / raw)
  To: mm-commits, yuehaibing


The patch titled
     Subject: mm/slab.h: remove duplicate include
has been added to the -mm tree.  Its filename is
     mm-slab-remove-duplicate-include.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-slab-remove-duplicate-include.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-slab-remove-duplicate-include.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: YueHaibing <yuehaibing@huawei.com>
Subject: mm/slab.h: remove duplicate include

Remove duplicate header which is included twice.

Link: http://lkml.kernel.org/r/20200818114323.58156-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slab.h |    1 -
 1 file changed, 1 deletion(-)

--- a/mm/slab.h~mm-slab-remove-duplicate-include
+++ a/mm/slab.h
@@ -46,7 +46,6 @@ struct kmem_cache {
 #include <linux/kmemleak.h>
 #include <linux/random.h>
 #include <linux/sched/mm.h>
-#include <linux/kmemleak.h>
 
 /*
  * State of the slab allocator.
_

Patches currently in -mm which might be from yuehaibing@huawei.com are

mm-slab-remove-duplicate-include.patch
mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memory-fix-typo-in-__do_fault-comment.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (45 preceding siblings ...)
  2020-08-18 23:50 ` + mm-slab-remove-duplicate-include.patch " Andrew Morton
@ 2020-08-18 23:53 ` Andrew Morton
  2020-08-18 23:56 ` + proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command.patch " Andrew Morton
                   ` (92 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:53 UTC (permalink / raw)
  To: david, mm-commits, yanfei.xu


The patch titled
     Subject: mm/memory.c: fix typo in __do_fault() comment
has been added to the -mm tree.  Its filename is
     mm-memory-fix-typo-in-__do_fault-comment.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memory-fix-typo-in-__do_fault-comment.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memory-fix-typo-in-__do_fault-comment.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Yanfei Xu <yanfei.xu@windriver.com>
Subject: mm/memory.c: fix typo in __do_fault() comment

It's "pte_alloc_one", not "pte_alloc_pne". Let's fix that.

Link: http://lkml.kernel.org/r/20200818104339.5310-1-yanfei.xu@windriver.com
Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/memory.c~mm-memory-fix-typo-in-__do_fault-comment
+++ a/mm/memory.c
@@ -3446,7 +3446,7 @@ static vm_fault_t __do_fault(struct vm_f
 	 *				unlock_page(A)
 	 * lock_page(B)
 	 *				lock_page(B)
-	 * pte_alloc_pne
+	 * pte_alloc_one
 	 *   shrink_page_list
 	 *     wait_on_page_writeback(A)
 	 *				SetPageWriteback(B)
_

Patches currently in -mm which might be from yanfei.xu@windriver.com are

mm-memory-fix-typo-in-__do_fault-comment.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (46 preceding siblings ...)
  2020-08-18 23:53 ` + mm-memory-fix-typo-in-__do_fault-comment.patch " Andrew Morton
@ 2020-08-18 23:56 ` Andrew Morton
  2020-08-18 23:56 ` + tasks-add-headers-and-improve-spacing-format.patch " Andrew Morton
                   ` (91 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:56 UTC (permalink / raw)
  To: jan.kiszka, kbingham, mm-commits, riteshh


The patch titled
     Subject: scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
has been added to the -mm tree.  Its filename is
     proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Ritesh Harjani <riteshh@linux.ibm.com>
Subject: scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command

This is many times found useful while debugging some FS related
issue.

<e.g. output>
      mount          super_block     devname pathname fstype options
0xffff888a0bfa4b40 0xffff888a0bfc1000 none / rootfs rw 0 0
0xffff888a033f75c0 0xffff8889fcf65000 /dev/root / ext4 rw,relatime 0 0
0xffff8889fc8ce040 0xffff888a0bb51000 devtmpfs /dev devtmpfs rw,relatime 0 0

Link: http://lkml.kernel.org/r/a3c4177e1597b3e06d66d55e07d72c0c46a03571.1597742951.git.riteshh@linux.ibm.com
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/gdb/linux/proc.py |   15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

--- a/scripts/gdb/linux/proc.py~proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command
+++ a/scripts/gdb/linux/proc.py
@@ -167,6 +167,9 @@ values of that process namespace"""
         if not namespace:
             raise gdb.GdbError("No namespace for current process")
 
+        gdb.write("{:^18} {:^15} {:>9} {} {} options\n".format(
+                  "mount", "super_block", "devname", "pathname", "fstype"))
+
         for vfs in lists.list_for_each_entry(namespace['list'],
                                              mount_ptr_type, "mnt_list"):
             devname = vfs['mnt_devname'].string()
@@ -190,14 +193,10 @@ values of that process namespace"""
             m_flags = int(vfs['mnt']['mnt_flags'])
             rd = "ro" if (s_flags & constants.LX_SB_RDONLY) else "rw"
 
-            gdb.write(
-                "{} {} {} {}{}{} 0 0\n"
-                .format(devname,
-                        pathname,
-                        fstype,
-                        rd,
-                        info_opts(FS_INFO, s_flags),
-                        info_opts(MNT_INFO, m_flags)))
+            gdb.write("{} {} {} {} {} {}{}{} 0 0\n".format(
+                      vfs.format_string(), superblock.format_string(), devname,
+                      pathname, fstype, rd, info_opts(FS_INFO, s_flags),
+                      info_opts(MNT_INFO, m_flags)))
 
 
 LxMounts()
_

Patches currently in -mm which might be from riteshh@linux.ibm.com are

proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command.patch
tasks-add-headers-and-improve-spacing-format.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + tasks-add-headers-and-improve-spacing-format.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (47 preceding siblings ...)
  2020-08-18 23:56 ` + proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command.patch " Andrew Morton
@ 2020-08-18 23:56 ` Andrew Morton
  2020-08-18 23:57 ` + mm-memoryc-replace-vmf-vma-with-variable-vma.patch " Andrew Morton
                   ` (90 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:56 UTC (permalink / raw)
  To: jan.kiszka, kbingham, mm-commits, riteshh


The patch titled
     Subject: scripts/gdb/tasks: add headers and improve spacing format
has been added to the -mm tree.  Its filename is
     tasks-add-headers-and-improve-spacing-format.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/tasks-add-headers-and-improve-spacing-format.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/tasks-add-headers-and-improve-spacing-format.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Ritesh Harjani <riteshh@linux.ibm.com>
Subject: scripts/gdb/tasks: add headers and improve spacing format

With the patch.
<e.g. o/p>
      TASK          PID    COMM
0xffffffff82c2b8c0   0   swapper/0
0xffff888a0ba20040   1   systemd
0xffff888a0ba24040   2   kthreadd
0xffff888a0ba28040   3   rcu_gp

w/o
0xffffffff82c2b8c0 <init_task> 0 swapper/0
0xffff888a0ba20040 1 systemd
0xffff888a0ba24040 2 kthreadd
0xffff888a0ba28040 3 rcu_gp

Link: http://lkml.kernel.org/r/54c868c79b5fc364a8be7799891934a6fe6d1464.1597742951.git.riteshh@linux.ibm.com
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/gdb/linux/tasks.py |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/scripts/gdb/linux/tasks.py~tasks-add-headers-and-improve-spacing-format
+++ a/scripts/gdb/linux/tasks.py
@@ -73,11 +73,12 @@ class LxPs(gdb.Command):
         super(LxPs, self).__init__("lx-ps", gdb.COMMAND_DATA)
 
     def invoke(self, arg, from_tty):
+        gdb.write("{:>10} {:>12} {:>7}\n".format("TASK", "PID", "COMM"))
         for task in task_lists():
-            gdb.write("{address} {pid} {comm}\n".format(
-                address=task,
-                pid=task["pid"],
-                comm=task["comm"].string()))
+            gdb.write("{} {:^5} {}\n".format(
+                task.format_string().split()[0],
+                task["pid"].format_string(),
+                task["comm"].string()))
 
 
 LxPs()
_

Patches currently in -mm which might be from riteshh@linux.ibm.com are

proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command.patch
tasks-add-headers-and-improve-spacing-format.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memoryc-replace-vmf-vma-with-variable-vma.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (48 preceding siblings ...)
  2020-08-18 23:56 ` + tasks-add-headers-and-improve-spacing-format.patch " Andrew Morton
@ 2020-08-18 23:57 ` Andrew Morton
  2020-08-19  1:30 ` + mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch " Andrew Morton
                   ` (89 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-18 23:57 UTC (permalink / raw)
  To: mm-commits, willy, yanfei.xu


The patch titled
     Subject: mm/memory.c: replace vmf->vma with variable vma
has been added to the -mm tree.  Its filename is
     mm-memoryc-replace-vmf-vma-with-variable-vma.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memoryc-replace-vmf-vma-with-variable-vma.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memoryc-replace-vmf-vma-with-variable-vma.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Yanfei Xu <yanfei.xu@windriver.com>
Subject: mm/memory.c: replace vmf->vma with variable vma

The code has declared a vma_struct named vma which is assigned a value of
vmf->vma.  Thus, use variable vma directly here.

Link: http://lkml.kernel.org/r/20200818084607.37616-1-yanfei.xu@windriver.com
Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/memory.c~mm-memoryc-replace-vmf-vma-with-variable-vma
+++ a/mm/memory.c
@@ -3454,7 +3454,7 @@ static vm_fault_t __do_fault(struct vm_f
 	 *				# flush A, B to clear the writeback
 	 */
 	if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) {
-		vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm);
+		vmf->prealloc_pte = pte_alloc_one(vma->vm_mm);
 		if (!vmf->prealloc_pte)
 			return VM_FAULT_OOM;
 		smp_wmb(); /* See comment in __pte_alloc() */
_

Patches currently in -mm which might be from yanfei.xu@windriver.com are

mm-memory-fix-typo-in-__do_fault-comment.patch
mm-memoryc-replace-vmf-vma-with-variable-vma.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (49 preceding siblings ...)
  2020-08-18 23:57 ` + mm-memoryc-replace-vmf-vma-with-variable-vma.patch " Andrew Morton
@ 2020-08-19  1:30 ` Andrew Morton
  2020-08-19  1:31 ` + checkpatch-add-kconfig-prefix.patch " Andrew Morton
                   ` (88 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  1:30 UTC (permalink / raw)
  To: alexander.h.duyck, david, mm-commits, richard.weiyang


The patch titled
     Subject: mm/page_reporting.c: drop stale list head check in page_reporting_cycle
has been added to the -mm tree.  Its filename is
     mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Wei Yang <richard.weiyang@linux.alibaba.com>
Subject: mm/page_reporting.c: drop stale list head check in page_reporting_cycle

list_for_each_entry_safe() guarantees that we will never stumble over the
list head; "&page->lru != list" will always evaluate to true.  Let's
simplify.

[david@redhat.com: Changelog refinements]
Link: http://lkml.kernel.org/r/20200818084448.33969-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_reporting.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/page_reporting.c~mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle
+++ a/mm/page_reporting.c
@@ -178,7 +178,7 @@ page_reporting_cycle(struct page_reporti
 		 * the new head of the free list before we release the
 		 * zone lock.
 		 */
-		if (&page->lru != list && !list_is_first(&page->lru, list))
+		if (!list_is_first(&page->lru, list))
 			list_rotate_to_front(&page->lru, list);
 
 		/* release lock before waiting on report processing */
_

Patches currently in -mm which might be from richard.weiyang@linux.alibaba.com are

mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + checkpatch-add-kconfig-prefix.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (50 preceding siblings ...)
  2020-08-19  1:30 ` + mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch " Andrew Morton
@ 2020-08-19  1:31 ` Andrew Morton
  2020-08-19  1:32 ` + mm-memory-failure-do-pgoff-calculation-before-for_each_process.patch " Andrew Morton
                   ` (87 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  1:31 UTC (permalink / raw)
  To: jerome, joe, mm-commits


The patch titled
     Subject: checkpatch: add --kconfig-prefix
has been added to the -mm tree.  Its filename is
     checkpatch-add-kconfig-prefix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/checkpatch-add-kconfig-prefix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/checkpatch-add-kconfig-prefix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Jerome Forissier <jerome@forissier.org>
Subject: checkpatch: add --kconfig-prefix

Kconfig allows to customize the CONFIG_ prefix via the $CONFIG_
environment variable.  Out-of-tree projects may therefore use Kconfig with
a different prefix, or they may use a custom configuration tool which does
not use the CONFIG_ prefix at all.  Such projects may still want to adhere
to the Linux kernel coding style and run checkpatch.pl.

One example is OP-TEE [1] which does not use Kconfig but does have
configuration options prefixed with CFG_.  It also mostly follows the
kernel coding style and therefore being able to use checkpatch is quite
valuable.

To make this possible, add the --kconfig-prefix command line option.

[1] https://github.com/OP-TEE/optee_os

Link: http://lkml.kernel.org/r/20200818081732.800449-1-jerome@forissier.org
Signed-off-by: Jerome Forissier <jerome@forissier.org>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/scripts/checkpatch.pl~checkpatch-add-kconfig-prefix
+++ a/scripts/checkpatch.pl
@@ -67,6 +67,7 @@ my $allow_c99_comments = 1; # Can be ove
 # git output parsing needs US English output, so first set backtick child process LANGUAGE
 my $git_command ='export LANGUAGE=en_US.UTF-8; git';
 my $tabsize = 8;
+my ${CONFIG_} = "CONFIG_";
 
 sub help {
 	my ($exitcode) = @_;
@@ -129,6 +130,8 @@ Options:
   --typedefsfile             Read additional types from this file
   --color[=WHEN]             Use colors 'always', 'never', or only when output
                              is a terminal ('auto'). Default is 'auto'.
+  --kconfig-prefix=WORD      use WORD as a prefix for Kconfig symbols (default
+                             ${CONFIG_})
   -h, --help, --version      display this help and exit
 
 When FILE is - read standard input.
@@ -237,6 +240,7 @@ GetOptions(
 	'color=s'	=> \$color,
 	'no-color'	=> \$color,	#keep old behaviors of -nocolor
 	'nocolor'	=> \$color,	#keep old behaviors of -nocolor
+	'kconfig-prefix=s'	=> \${CONFIG_},
 	'h|help'	=> \$help,
 	'version'	=> \$help
 ) or help(1);
@@ -6526,16 +6530,16 @@ sub process {
 		}
 
 # check for IS_ENABLED() without CONFIG_<FOO> ($rawline for comments too)
-		if ($rawline =~ /\bIS_ENABLED\s*\(\s*(\w+)\s*\)/ && $1 !~ /^CONFIG_/) {
+		if ($rawline =~ /\bIS_ENABLED\s*\(\s*(\w+)\s*\)/ && $1 !~ /^${CONFIG_}/) {
 			WARN("IS_ENABLED_CONFIG",
-			     "IS_ENABLED($1) is normally used as IS_ENABLED(CONFIG_$1)\n" . $herecurr);
+			     "IS_ENABLED($1) is normally used as IS_ENABLED(${CONFIG_}$1)\n" . $herecurr);
 		}
 
 # check for #if defined CONFIG_<FOO> || defined CONFIG_<FOO>_MODULE
-		if ($line =~ /^\+\s*#\s*if\s+defined(?:\s*\(?\s*|\s+)(CONFIG_[A-Z_]+)\s*\)?\s*\|\|\s*defined(?:\s*\(?\s*|\s+)\1_MODULE\s*\)?\s*$/) {
+		if ($line =~ /^\+\s*#\s*if\s+defined(?:\s*\(?\s*|\s+)(${CONFIG_}[A-Z_]+)\s*\)?\s*\|\|\s*defined(?:\s*\(?\s*|\s+)\1_MODULE\s*\)?\s*$/) {
 			my $config = $1;
 			if (WARN("PREFER_IS_ENABLED",
-				 "Prefer IS_ENABLED(<FOO>) to CONFIG_<FOO> || CONFIG_<FOO>_MODULE\n" . $herecurr) &&
+				 "Prefer IS_ENABLED(<FOO>) to ${CONFIG_}<FOO> || ${CONFIG_}<FOO>_MODULE\n" . $herecurr) &&
 			    $fix) {
 				$fixed[$fixlinenr] = "\+#if IS_ENABLED($config)";
 			}
_

Patches currently in -mm which might be from jerome@forissier.org are

checkpatch-add-kconfig-prefix.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memory-failure-do-pgoff-calculation-before-for_each_process.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (51 preceding siblings ...)
  2020-08-19  1:31 ` + checkpatch-add-kconfig-prefix.patch " Andrew Morton
@ 2020-08-19  1:32 ` Andrew Morton
  2020-08-19  1:41 ` + hugetlb_cgroup-convert-comma-to-semicolon.patch " Andrew Morton
                   ` (86 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  1:32 UTC (permalink / raw)
  To: mm-commits, naoya.horiguchi, tian.xianting


The patch titled
     Subject: mm/memory-failure: do pgoff calculation before for_each_process()
has been added to the -mm tree.  Its filename is
     mm-memory-failure-do-pgoff-calculation-before-for_each_process.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memory-failure-do-pgoff-calculation-before-for_each_process.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memory-failure-do-pgoff-calculation-before-for_each_process.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Xianting Tian <tian.xianting@h3c.com>
Subject: mm/memory-failure: do pgoff calculation before for_each_process()

There is no need to calculate pgoff in each loop of for_each_process(), so
move it to the place before for_each_process(), which can save some CPU
cycles.

Link: http://lkml.kernel.org/r/20200818082647.34322-1-tian.xianting@h3c.com
Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory-failure.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/memory-failure.c~mm-memory-failure-do-pgoff-calculation-before-for_each_process
+++ a/mm/memory-failure.c
@@ -484,11 +484,12 @@ static void collect_procs_file(struct pa
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
 	struct address_space *mapping = page->mapping;
+	pgoff_t pgoff;
 
 	i_mmap_lock_read(mapping);
 	read_lock(&tasklist_lock);
+	pgoff = page_to_pgoff(page);
 	for_each_process(tsk) {
-		pgoff_t pgoff = page_to_pgoff(page);
 		struct task_struct *t = task_early_kill(tsk, force_early);
 
 		if (!t)
_

Patches currently in -mm which might be from tian.xianting@h3c.com are

mm-memory-failure-do-pgoff-calculation-before-for_each_process.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + hugetlb_cgroup-convert-comma-to-semicolon.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (52 preceding siblings ...)
  2020-08-19  1:32 ` + mm-memory-failure-do-pgoff-calculation-before-for_each_process.patch " Andrew Morton
@ 2020-08-19  1:41 ` Andrew Morton
  2020-08-19  1:42 ` + checkpatch-move-repeated-word-test.patch " Andrew Morton
                   ` (85 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  1:41 UTC (permalink / raw)
  To: gscrivan, mm-commits, tj, vulab


The patch titled
     Subject: hugetlb_cgroup: convert comma to semicolon
has been added to the -mm tree.  Its filename is
     hugetlb_cgroup-convert-comma-to-semicolon.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/hugetlb_cgroup-convert-comma-to-semicolon.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/hugetlb_cgroup-convert-comma-to-semicolon.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Xu Wang <vulab@iscas.ac.cn>
Subject: hugetlb_cgroup: convert comma to semicolon

Replace a comma between expression statements by a semicolon.

Link: http://lkml.kernel.org/r/20200818064333.21759-1-vulab@iscas.ac.cn
Fixes: faced7e0806cf4 ("mm: hugetlb controller for cgroups v2")
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Cc: Tejun Heo <tj@kernel.org>
Cc: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hugetlb_cgroup.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/hugetlb_cgroup.c~hugetlb_cgroup-convert-comma-to-semicolon
+++ a/mm/hugetlb_cgroup.c
@@ -655,7 +655,7 @@ static void __init __hugetlb_cgroup_file
 	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.events", buf);
 	cft->private = MEMFILE_PRIVATE(idx, 0);
 	cft->seq_show = hugetlb_events_show;
-	cft->file_offset = offsetof(struct hugetlb_cgroup, events_file[idx]),
+	cft->file_offset = offsetof(struct hugetlb_cgroup, events_file[idx]);
 	cft->flags = CFTYPE_NOT_ON_ROOT;
 
 	/* Add the events.local file */
@@ -664,7 +664,7 @@ static void __init __hugetlb_cgroup_file
 	cft->private = MEMFILE_PRIVATE(idx, 0);
 	cft->seq_show = hugetlb_events_local_show;
 	cft->file_offset = offsetof(struct hugetlb_cgroup,
-				    events_local_file[idx]),
+				    events_local_file[idx]);
 	cft->flags = CFTYPE_NOT_ON_ROOT;
 
 	/* NULL terminate the last cft */
_

Patches currently in -mm which might be from vulab@iscas.ac.cn are

hugetlb_cgroup-convert-comma-to-semicolon.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + checkpatch-move-repeated-word-test.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (53 preceding siblings ...)
  2020-08-19  1:41 ` + hugetlb_cgroup-convert-comma-to-semicolon.patch " Andrew Morton
@ 2020-08-19  1:42 ` Andrew Morton
  2020-08-19  1:55 ` + mmap-locking-api-add-mmap_lock_is_contended.patch " Andrew Morton
                   ` (84 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  1:42 UTC (permalink / raw)
  To: joe, mm-commits


The patch titled
     Subject: checkpatch: move repeated word test
has been added to the -mm tree.  Its filename is
     checkpatch-move-repeated-word-test.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/checkpatch-move-repeated-word-test.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/checkpatch-move-repeated-word-test.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Joe Perches <joe@perches.com>
Subject: checkpatch: move repeated word test

Currently this test only works on .[ch] files.

Move the test to check more file types and the commit log.

Link: http://lkml.kernel.org/r/180b3b5677771c902b2e2f7a2b7090ede65fe004.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |   72 ++++++++++++++++++++--------------------
 1 file changed, 36 insertions(+), 36 deletions(-)

--- a/scripts/checkpatch.pl~checkpatch-move-repeated-word-test
+++ a/scripts/checkpatch.pl
@@ -2993,6 +2993,42 @@ sub process {
 			}
 		}
 
+# check for repeated words separated by a single space
+		if ($rawline =~ /^\+/ || $in_commit_log) {
+			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
+
+				my $first = $1;
+				my $second = $2;
+
+				if ($first =~ /(?:struct|union|enum)/) {
+					pos($rawline) += length($first) + length($second) + 1;
+					next;
+				}
+
+				next if ($first ne $second);
+				next if ($first eq 'long');
+
+				if (WARN("REPEATED_WORD",
+					 "Possible repeated word: '$first'\n" . $herecurr) &&
+				    $fix) {
+					$fixed[$fixlinenr] =~ s/\b$first $second\b/$first/;
+				}
+			}
+
+			# if it's a repeated word on consecutive lines in a comment block
+			if ($prevline =~ /$;+\s*$/ &&
+			    $prevrawline =~ /($word_pattern)\s*$/) {
+				my $last_word = $1;
+				if ($rawline =~ /^\+\s*\*\s*$last_word /) {
+					if (WARN("REPEATED_WORD",
+						 "Possible repeated word: '$last_word'\n" . $hereprev) &&
+					    $fix) {
+						$fixed[$fixlinenr] =~ s/(\+\s*\*\s*)$last_word /$1/;
+					}
+				}
+			}
+		}
+
 # ignore non-hunk lines and lines being removed
 		next if (!$hunk_line || $line =~ /^-/);
 
@@ -3316,42 +3352,6 @@ sub process {
 			}
 		}
 
-# check for repeated words separated by a single space
-		if ($rawline =~ /^\+/) {
-			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
-
-				my $first = $1;
-				my $second = $2;
-
-				if ($first =~ /(?:struct|union|enum)/) {
-					pos($rawline) += length($first) + length($second) + 1;
-					next;
-				}
-
-				next if ($first ne $second);
-				next if ($first eq 'long');
-
-				if (WARN("REPEATED_WORD",
-					 "Possible repeated word: '$first'\n" . $herecurr) &&
-				    $fix) {
-					$fixed[$fixlinenr] =~ s/\b$first $second\b/$first/;
-				}
-			}
-
-			# if it's a repeated word on consecutive lines in a comment block
-			if ($prevline =~ /$;+\s*$/ &&
-			    $prevrawline =~ /($word_pattern)\s*$/) {
-				my $last_word = $1;
-				if ($rawline =~ /^\+\s*\*\s*$last_word /) {
-					if (WARN("REPEATED_WORD",
-						 "Possible repeated word: '$last_word'\n" . $hereprev) &&
-					    $fix) {
-						$fixed[$fixlinenr] =~ s/(\+\s*\*\s*)$last_word /$1/;
-					}
-				}
-			}
-		}
-
 # check for space before tabs.
 		if ($rawline =~ /^\+/ && $rawline =~ / \t/) {
 			my $herevet = "$here\n" . cat_vet($rawline) . "\n";
_

Patches currently in -mm which might be from joe@perches.com are

checkpatch-test-git_dir-changes.patch
checkpatch-move-repeated-word-test.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mmap-locking-api-add-mmap_lock_is_contended.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (54 preceding siblings ...)
  2020-08-19  1:42 ` + checkpatch-move-repeated-word-test.patch " Andrew Morton
@ 2020-08-19  1:55 ` Andrew Morton
  2020-08-19  1:55 ` + mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch " Andrew Morton
                   ` (83 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  1:55 UTC (permalink / raw)
  To: adobriyan, chinwen.chang, daniel.kiss, daniel.m.jordan, dbueso,
	jgg, jimmyassarsson, ldufour, matthias.bgg, mm-commits,
	songliubraving, steven.price, vbabka, walken, willy, ying.huang


The patch titled
     Subject: mmap locking API: add mmap_lock_is_contended()
has been added to the -mm tree.  Its filename is
     mmap-locking-api-add-mmap_lock_is_contended.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mmap-locking-api-add-mmap_lock_is_contended.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mmap-locking-api-add-mmap_lock_is_contended.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Chinwen Chang <chinwen.chang@mediatek.com>
Subject: mmap locking API: add mmap_lock_is_contended()

Patch series "Try to release mmap_lock temporarily in smaps_rollup", v4.

Recently, we have observed some janky issues caused by unpleasantly long
contention on mmap_lock which is held by smaps_rollup when probing large
processes.  To address the problem, we let smaps_rollup detect if anyone
wants to acquire mmap_lock for write attempts.  If yes, just release the
lock temporarily to ease the contention.

smaps_rollup is a procfs interface which allows users to summarize the
process's memory usage without the overhead of seq_* calls.  Android uses
it to sample the memory usage of various processes to balance its memory
pool sizes.  If no one wants to take the lock for write requests,
smaps_rollup with this patch will behave like the original one.

Although there are on-going mmap_lock optimizations like range-based
locks, the lock applied to smaps_rollup would be the coarse one, which is
hard to avoid the occurrence of aforementioned issues.  So the detection
and temporary release for write attempts on mmap_lock in smaps_rollup is
still necessary.


This patch (of 3):

Add new API to query if someone wants to acquire mmap_lock for write
attempts.

Using this instead of rwsem_is_contended makes it more tolerant of future
changes to the lock type.

Link: http://lkml.kernel.org/r/1597715898-3854-1-git-send-email-chinwen.chang@mediatek.com
Link: http://lkml.kernel.org/r/1597715898-3854-2-git-send-email-chinwen.chang@mediatek.com
Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jimmy Assarsson <jimmyassarsson@gmail.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmap_lock.h |    5 +++++
 1 file changed, 5 insertions(+)

--- a/include/linux/mmap_lock.h~mmap-locking-api-add-mmap_lock_is_contended
+++ a/include/linux/mmap_lock.h
@@ -87,4 +87,9 @@ static inline void mmap_assert_write_loc
 	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
 }
 
+static inline int mmap_lock_is_contended(struct mm_struct *mm)
+{
+	return rwsem_is_contended(&mm->mmap_lock);
+}
+
 #endif /* _LINUX_MMAP_LOCK_H */
_

Patches currently in -mm which might be from chinwen.chang@mediatek.com are

mmap-locking-api-add-mmap_lock_is_contended.patch
mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch
mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (55 preceding siblings ...)
  2020-08-19  1:55 ` + mmap-locking-api-add-mmap_lock_is_contended.patch " Andrew Morton
@ 2020-08-19  1:55 ` Andrew Morton
  2020-08-19  1:55 ` + mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch " Andrew Morton
                   ` (82 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  1:55 UTC (permalink / raw)
  To: adobriyan, chinwen.chang, daniel.kiss, daniel.m.jordan, dbueso,
	jgg, jimmyassarsson, ldufour, matthias.bgg, mm-commits,
	songliubraving, steven.price, vbabka, walken, willy, ying.huang


The patch titled
     Subject: mm: smaps*: extend smap_gather_stats to support specified beginning
has been added to the -mm tree.  Its filename is
     mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Chinwen Chang <chinwen.chang@mediatek.com>
Subject: mm: smaps*: extend smap_gather_stats to support specified beginning

Extend smap_gather_stats to support indicated beginning address at which
it should start gathering.  To achieve the goal, we add a new parameter
@start assigned by the caller and try to refactor it for simplicity.

If @start is 0, it will use the range of @vma for gathering.

Link: http://lkml.kernel.org/r/1597715898-3854-3-git-send-email-chinwen.chang@mediatek.com
Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jimmy Assarsson <jimmyassarsson@gmail.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/task_mmu.c |   30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

--- a/fs/proc/task_mmu.c~mm-smaps-extend-smap_gather_stats-to-support-specified-beginning
+++ a/fs/proc/task_mmu.c
@@ -723,9 +723,21 @@ static const struct mm_walk_ops smaps_sh
 	.pte_hole		= smaps_pte_hole,
 };
 
+/*
+ * Gather mem stats from @vma with the indicated beginning
+ * address @start, and keep them in @mss.
+ *
+ * Use vm_start of @vma as the beginning address if @start is 0.
+ */
 static void smap_gather_stats(struct vm_area_struct *vma,
-			     struct mem_size_stats *mss)
+		struct mem_size_stats *mss, unsigned long start)
 {
+	const struct mm_walk_ops *ops = &smaps_walk_ops;
+
+	/* Invalid start */
+	if (start >= vma->vm_end)
+		return;
+
 #ifdef CONFIG_SHMEM
 	/* In case of smaps_rollup, reset the value from previous vma */
 	mss->check_shmem_swap = false;
@@ -742,18 +754,20 @@ static void smap_gather_stats(struct vm_
 		 */
 		unsigned long shmem_swapped = shmem_swap_usage(vma);
 
-		if (!shmem_swapped || (vma->vm_flags & VM_SHARED) ||
-					!(vma->vm_flags & VM_WRITE)) {
+		if (!start && (!shmem_swapped || (vma->vm_flags & VM_SHARED) ||
+					!(vma->vm_flags & VM_WRITE))) {
 			mss->swap += shmem_swapped;
 		} else {
 			mss->check_shmem_swap = true;
-			walk_page_vma(vma, &smaps_shmem_walk_ops, mss);
-			return;
+			ops = &smaps_shmem_walk_ops;
 		}
 	}
 #endif
 	/* mmap_lock is held in m_start */
-	walk_page_vma(vma, &smaps_walk_ops, mss);
+	if (!start)
+		walk_page_vma(vma, ops, mss);
+	else
+		walk_page_range(vma->vm_mm, start, vma->vm_end, ops, mss);
 }
 
 #define SEQ_PUT_DEC(str, val) \
@@ -805,7 +819,7 @@ static int show_smap(struct seq_file *m,
 
 	memset(&mss, 0, sizeof(mss));
 
-	smap_gather_stats(vma, &mss);
+	smap_gather_stats(vma, &mss, 0);
 
 	show_map_vma(m, vma);
 
@@ -854,7 +868,7 @@ static int show_smaps_rollup(struct seq_
 	hold_task_mempolicy(priv);
 
 	for (vma = priv->mm->mmap; vma; vma = vma->vm_next) {
-		smap_gather_stats(vma, &mss);
+		smap_gather_stats(vma, &mss, 0);
 		last_vma_end = vma->vm_end;
 	}
 
_

Patches currently in -mm which might be from chinwen.chang@mediatek.com are

mmap-locking-api-add-mmap_lock_is_contended.patch
mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch
mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (56 preceding siblings ...)
  2020-08-19  1:55 ` + mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch " Andrew Morton
@ 2020-08-19  1:55 ` Andrew Morton
  2020-08-19  2:18 ` + romfs-fix-uninitialized-memory-leak-in-romfs_dev_read.patch " Andrew Morton
                   ` (81 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  1:55 UTC (permalink / raw)
  To: adobriyan, chinwen.chang, daniel.kiss, daniel.m.jordan, dbueso,
	jgg, jimmyassarsson, ldufour, matthias.bgg, mm-commits,
	songliubraving, steven.price, vbabka, walken, willy, ying.huang


The patch titled
     Subject: mm: proc: smaps_rollup: do not stall write attempts on mmap_lock
has been added to the -mm tree.  Its filename is
     mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Chinwen Chang <chinwen.chang@mediatek.com>
Subject: mm: proc: smaps_rollup: do not stall write attempts on mmap_lock

smaps_rollup will try to grab mmap_lock and go through the whole vma list
until it finishes the iterating.  When encountering large processes, the
mmap_lock will be held for a longer time, which may block other write
requests like mmap and munmap from progressing smoothly.

There are upcoming mmap_lock optimizations like range-based locks, but the
lock applied to smaps_rollup would be the coarse type, which doesn't avoid
the occurrence of unpleasant contention.

To solve aforementioned issue, we add a check which detects whether anyone
wants to grab mmap_lock for write attempts.

Link: http://lkml.kernel.org/r/1597715898-3854-4-git-send-email-chinwen.chang@mediatek.com
Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Song Liu <songliubraving@fb.com>
Cc: Jimmy Assarsson <jimmyassarsson@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/task_mmu.c |   66 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 65 insertions(+), 1 deletion(-)

--- a/fs/proc/task_mmu.c~mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock
+++ a/fs/proc/task_mmu.c
@@ -867,9 +867,73 @@ static int show_smaps_rollup(struct seq_
 
 	hold_task_mempolicy(priv);
 
-	for (vma = priv->mm->mmap; vma; vma = vma->vm_next) {
+	for (vma = priv->mm->mmap; vma;) {
 		smap_gather_stats(vma, &mss, 0);
 		last_vma_end = vma->vm_end;
+
+		/*
+		 * Release mmap_lock temporarily if someone wants to
+		 * access it for write request.
+		 */
+		if (mmap_lock_is_contended(mm)) {
+			mmap_read_unlock(mm);
+			ret = mmap_read_lock_killable(mm);
+			if (ret) {
+				release_task_mempolicy(priv);
+				goto out_put_mm;
+			}
+
+			/*
+			 * After dropping the lock, there are four cases to
+			 * consider. See the following example for explanation.
+			 *
+			 *   +------+------+-----------+
+			 *   | VMA1 | VMA2 | VMA3      |
+			 *   +------+------+-----------+
+			 *   |      |      |           |
+			 *  4k     8k     16k         400k
+			 *
+			 * Suppose we drop the lock after reading VMA2 due to
+			 * contention, then we get:
+			 *
+			 *	last_vma_end = 16k
+			 *
+			 * 1) VMA2 is freed, but VMA3 exists:
+			 *
+			 *    find_vma(mm, 16k - 1) will return VMA3.
+			 *    In this case, just continue from VMA3.
+			 *
+			 * 2) VMA2 still exists:
+			 *
+			 *    find_vma(mm, 16k - 1) will return VMA2.
+			 *    Iterate the loop like the original one.
+			 *
+			 * 3) No more VMAs can be found:
+			 *
+			 *    find_vma(mm, 16k - 1) will return NULL.
+			 *    No more things to do, just break.
+			 *
+			 * 4) (last_vma_end - 1) is the middle of a vma (VMA'):
+			 *
+			 *    find_vma(mm, 16k - 1) will return VMA' whose range
+			 *    contains last_vma_end.
+			 *    Iterate VMA' from last_vma_end.
+			 */
+			vma = find_vma(mm, last_vma_end - 1);
+			/* Case 3 above */
+			if (!vma)
+				break;
+
+			/* Case 1 above */
+			if (vma->vm_start >= last_vma_end)
+				continue;
+
+			/* Case 4 above */
+			if (vma->vm_end > last_vma_end)
+				smap_gather_stats(vma, &mss, last_vma_end);
+		}
+		/* Case 2 above */
+		vma = vma->vm_next;
 	}
 
 	show_vma_header_prefix(m, priv->mm->mmap->vm_start,
_

Patches currently in -mm which might be from chinwen.chang@mediatek.com are

mmap-locking-api-add-mmap_lock_is_contended.patch
mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch
mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + romfs-fix-uninitialized-memory-leak-in-romfs_dev_read.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (57 preceding siblings ...)
  2020-08-19  1:55 ` + mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch " Andrew Morton
@ 2020-08-19  2:18 ` Andrew Morton
  2020-08-19  2:23 ` + mm-util-update-the-kerneldoc-for-kstrdup_const.patch " Andrew Morton
                   ` (80 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:18 UTC (permalink / raw)
  To: dhowells, gregkh, jannh, mm-commits, stable


The patch titled
     Subject: romfs: fix uninitialized memory leak in romfs_dev_read()
has been added to the -mm tree.  Its filename is
     romfs-fix-uninitialized-memory-leak-in-romfs_dev_read.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/romfs-fix-uninitialized-memory-leak-in-romfs_dev_read.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/romfs-fix-uninitialized-memory-leak-in-romfs_dev_read.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Jann Horn <jannh@google.com>
Subject: romfs: fix uninitialized memory leak in romfs_dev_read()

romfs has a superblock field that limits the size of the filesystem; data
beyond that limit is never accessed.

romfs_dev_read() fetches a caller-supplied number of bytes from the
backing device.  It returns 0 on success or an error code on failure;
therefore, its API can't represent short reads, it's all-or-nothing.

However, when romfs_dev_read() detects that the requested operation would
cross the filesystem size limit, it currently silently truncates the
requested number of bytes.  This e.g.  means that when the content of a
file with size 0x1000 starts one byte before the filesystem size limit,
->readpage() will only fill a single byte of the supplied page while
leaving the rest uninitialized, leaking that uninitialized memory to
userspace.

Fix it by returning an error code instead of truncating the read when the
requested read operation would go beyond the end of the filesystem.

Link: http://lkml.kernel.org/r/20200818013202.2246365-1-jannh@google.com
Fixes: da4458bda237 ("NOMMU: Make it possible for RomFS to use MTD devices directly")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/romfs/storage.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/fs/romfs/storage.c~romfs-fix-uninitialized-memory-leak-in-romfs_dev_read
+++ a/fs/romfs/storage.c
@@ -217,10 +217,8 @@ int romfs_dev_read(struct super_block *s
 	size_t limit;
 
 	limit = romfs_maxsize(sb);
-	if (pos >= limit)
+	if (pos >= limit || buflen > limit - pos)
 		return -EIO;
-	if (buflen > limit - pos)
-		buflen = limit - pos;
 
 #ifdef CONFIG_ROMFS_ON_MTD
 	if (sb->s_mtd)
_

Patches currently in -mm which might be from jannh@google.com are

romfs-fix-uninitialized-memory-leak-in-romfs_dev_read.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-util-update-the-kerneldoc-for-kstrdup_const.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (58 preceding siblings ...)
  2020-08-19  2:18 ` + romfs-fix-uninitialized-memory-leak-in-romfs_dev_read.patch " Andrew Morton
@ 2020-08-19  2:23 ` Andrew Morton
  2020-08-19  2:39 ` + kernel-relayc-fix-memleak-on-destroy-relay-channel.patch " Andrew Morton
                   ` (79 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:23 UTC (permalink / raw)
  To: bgolaszewski, mm-commits


The patch titled
     Subject: mm/util.c: update the kerneldoc for kstrdup_const()
has been added to the -mm tree.  Its filename is
     mm-util-update-the-kerneldoc-for-kstrdup_const.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-util-update-the-kerneldoc-for-kstrdup_const.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-util-update-the-kerneldoc-for-kstrdup_const.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Subject: mm/util.c: update the kerneldoc for kstrdup_const()

Memory allocated with kstrdup_const() must not be passed to regular
krealloc() as it is not aware of the possibility of the chunk residing in
.rodata.  Since there are no potential users of krealloc_const() at the
moment, let's just update the doc to make it explicit.

Link: http://lkml.kernel.org/r/20200817173927.23389-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/util.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/util.c~mm-util-update-the-kerneldoc-for-kstrdup_const
+++ a/mm/util.c
@@ -69,7 +69,8 @@ EXPORT_SYMBOL(kstrdup);
  * @s: the string to duplicate
  * @gfp: the GFP mask used in the kmalloc() call when allocating memory
  *
- * Note: Strings allocated by kstrdup_const should be freed by kfree_const.
+ * Note: Strings allocated by kstrdup_const should be freed by kfree_const and
+ * must not be passed to krealloc().
  *
  * Return: source string if it is in .rodata section otherwise
  * fallback to kstrdup.
_

Patches currently in -mm which might be from bgolaszewski@baylibre.com are

mm-util-update-the-kerneldoc-for-kstrdup_const.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + kernel-relayc-fix-memleak-on-destroy-relay-channel.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (59 preceding siblings ...)
  2020-08-19  2:23 ` + mm-util-update-the-kerneldoc-for-kstrdup_const.patch " Andrew Morton
@ 2020-08-19  2:39 ` Andrew Morton
  2020-08-19  2:44 ` + device-dax-fix-mismatches-of-request_mem_region.patch " Andrew Morton
                   ` (78 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:39 UTC (permalink / raw)
  To: akash.goel, chris, dja, hulkci, mm-commits, mpe, rientjes,
	stable, tglx, viro, walken, weiyongjun1


The patch titled
     Subject: kernel/relay.c: fix memleak on destroy relay channel
has been added to the -mm tree.  Its filename is
     kernel-relayc-fix-memleak-on-destroy-relay-channel.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/kernel-relayc-fix-memleak-on-destroy-relay-channel.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/kernel-relayc-fix-memleak-on-destroy-relay-channel.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Wei Yongjun <weiyongjun1@huawei.com>
Subject: kernel/relay.c: fix memleak on destroy relay channel

kmemleak report memory leak as follows:

unreferenced object 0x607ee4e5f948 (size 8):
comm "syz-executor.1", pid 2098, jiffies 4295031601 (age 288.468s)
hex dump (first 8 bytes):
00 00 00 00 00 00 00 00 ........
backtrace:
[<00000000ca1de2fa>] relay_open kernel/relay.c:583 [inline]
[<00000000ca1de2fa>] relay_open+0xb6/0x970 kernel/relay.c:563
[<0000000038ae5a4b>] do_blk_trace_setup+0x4a8/0xb20 kernel/trace/blktrace.c:557
[<00000000d5e778e9>] __blk_trace_setup+0xb6/0x150 kernel/trace/blktrace.c:597
[<0000000038fdf803>] blk_trace_ioctl+0x146/0x280 kernel/trace/blktrace.c:738
[<00000000ce25a0ca>] blkdev_ioctl+0xb2/0x6a0 block/ioctl.c:613
[<00000000579e47e0>] block_ioctl+0xe5/0x120 fs/block_dev.c:1871
[<00000000b1588c11>] vfs_ioctl fs/ioctl.c:48 [inline]
[<00000000b1588c11>] __do_sys_ioctl fs/ioctl.c:753 [inline]
[<00000000b1588c11>] __se_sys_ioctl fs/ioctl.c:739 [inline]
[<00000000b1588c11>] __x64_sys_ioctl+0x170/0x1ce fs/ioctl.c:739
[<0000000088fc9942>] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
[<000000004f6dd57a>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

'chan->buf' is malloced in relay_open() by alloc_percpu() but not free
while destroy the relay channel. Fix it by adding free_percpu() before
return from relay_destroy_channel().

Link: http://lkml.kernel.org/r/20200817122826.48518-1-weiyongjun1@huawei.com
Fixes: 017c59c042d0 ("relay: Use per CPU constructs for the relay channel buffer pointers")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David Rientjes <rientjes@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Akash Goel <akash.goel@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/relay.c |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/relay.c~kernel-relayc-fix-memleak-on-destroy-relay-channel
+++ a/kernel/relay.c
@@ -197,6 +197,7 @@ free_buf:
 static void relay_destroy_channel(struct kref *kref)
 {
 	struct rchan *chan = container_of(kref, struct rchan, kref);
+	free_percpu(chan->buf);
 	kfree(chan);
 }
 
_

Patches currently in -mm which might be from weiyongjun1@huawei.com are

kernel-relayc-fix-memleak-on-destroy-relay-channel.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + device-dax-fix-mismatches-of-request_mem_region.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (60 preceding siblings ...)
  2020-08-19  2:39 ` + kernel-relayc-fix-memleak-on-destroy-relay-channel.patch " Andrew Morton
@ 2020-08-19  2:44 ` Andrew Morton
  2020-08-19  2:49 ` + uprobes-__replace_page-avoid-bug-in-munlock_vma_page.patch " Andrew Morton
                   ` (77 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:44 UTC (permalink / raw)
  To: akpm, dan.j.williams, dave.jiang, ira.weiny, mm-commits,
	thunder.leizhen, vishal.l.verma


The patch titled
     Subject: drivers/dax/kmem.c: fix mismatches of request_mem_region()
has been added to the -mm tree.  Its filename is
     device-dax-fix-mismatches-of-request_mem_region.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/device-dax-fix-mismatches-of-request_mem_region.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/device-dax-fix-mismatches-of-request_mem_region.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Zhen Lei <thunder.leizhen@huawei.com>
Subject: drivers/dax/kmem.c: fix mismatches of request_mem_region()

For neatness's sake, The resources allocated by request_mem_region()
should be freed with release_mem_region().  These two functions are
paired.

Link: http://lkml.kernel.org/r/20200817065926.2239-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/dax/kmem.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/drivers/dax/kmem.c~device-dax-fix-mismatches-of-request_mem_region
+++ a/drivers/dax/kmem.c
@@ -82,8 +82,7 @@ int dev_dax_kmem_probe(struct device *de
 	rc = add_memory_driver_managed(numa_node, new_res->start,
 				       resource_size(new_res), kmem_name);
 	if (rc) {
-		release_resource(new_res);
-		kfree(new_res);
+		release_mem_region(kmem_start, kmem_size);
 		kfree(new_res_name);
 		return rc;
 	}
@@ -118,8 +117,7 @@ static int dev_dax_kmem_remove(struct de
 	}
 
 	/* Release and free dax resources */
-	release_resource(res);
-	kfree(res);
+	release_mem_region(kmem_start, kmem_size);
 	kfree(res_name);
 	dev_dax->dax_kmem_res = NULL;
 
_

Patches currently in -mm which might be from thunder.leizhen@huawei.com are

device-dax-fix-mismatches-of-request_mem_region.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + uprobes-__replace_page-avoid-bug-in-munlock_vma_page.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (61 preceding siblings ...)
  2020-08-19  2:44 ` + device-dax-fix-mismatches-of-request_mem_region.patch " Andrew Morton
@ 2020-08-19  2:49 ` Andrew Morton
  2020-08-19  2:55 ` + mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch " Andrew Morton
                   ` (76 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:49 UTC (permalink / raw)
  To: hughd, kirill.shutemov, mm-commits, oleg, songliubraving, srikar,
	stable, syzkaller


The patch titled
     Subject: uprobes: __replace_page() avoid BUG in munlock_vma_page()
has been added to the -mm tree.  Its filename is
     uprobes-__replace_page-avoid-bug-in-munlock_vma_page.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/uprobes-__replace_page-avoid-bug-in-munlock_vma_page.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/uprobes-__replace_page-avoid-bug-in-munlock_vma_page.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Hugh Dickins <hughd@google.com>
Subject: uprobes: __replace_page() avoid BUG in munlock_vma_page()

syzbot crashed on the VM_BUG_ON_PAGE(PageTail) in munlock_vma_page(), when
called from uprobes __replace_page().  Which of many ways to fix it? 
Settled on not calling when PageCompound (since Head and Tail are equals
in this context, PageCompound the usual check in uprobes.c, and the prior
use of FOLL_SPLIT_PMD will have cleared PageMlocked already).

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008161338360.20413@eggly.anvils
Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>	[5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/events/uprobes.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/events/uprobes.c~uprobes-__replace_page-avoid-bug-in-munlock_vma_page
+++ a/kernel/events/uprobes.c
@@ -205,7 +205,7 @@ static int __replace_page(struct vm_area
 		try_to_free_swap(old_page);
 	page_vma_mapped_walk_done(&pvmw);
 
-	if (vma->vm_flags & VM_LOCKED)
+	if ((vma->vm_flags & VM_LOCKED) && !PageCompound(old_page))
 		munlock_vma_page(old_page);
 	put_page(old_page);
 
_

Patches currently in -mm which might be from hughd@google.com are

uprobes-__replace_page-avoid-bug-in-munlock_vma_page.patch
khugepaged-adjust-vm_bug_on_mm-in-__khugepaged_enter.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (62 preceding siblings ...)
  2020-08-19  2:49 ` + uprobes-__replace_page-avoid-bug-in-munlock_vma_page.patch " Andrew Morton
@ 2020-08-19  2:55 ` Andrew Morton
  2020-08-19  2:55 ` + mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch " Andrew Morton
                   ` (75 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:55 UTC (permalink / raw)
  To: bhe, cai, david, jasowang, mhocko, mike.kravetz, mm-commits, mst,
	pankaj.gupta.linux, rppt


The patch titled
     Subject: mm/page_alloc: tweak comments in has_unmovable_pages()
has been added to the -mm tree.  Its filename is
     mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/page_alloc: tweak comments in has_unmovable_pages()

Patch series "mm / virtio-mem: support ZONE_MOVABLE", v5.

When introducing virtio-mem, the semantics of ZONE_MOVABLE were rather
unclear, which is why we special-cased ZONE_MOVABLE such that partially
plugged blocks would never end up in ZONE_MOVABLE.

Now that the semantics are much clearer (and are documented in patch #6),
let's support partially plugged memory blocks in ZONE_MOVABLE, allowing
partially plugged memory blocks to be online to ZONE_MOVABLE and also
unplugging from such memory blocks.  This avoids surprises when onlining
of memory blocks suddenly fails, just because they are not completely
populated by virtio-mem (yet).

This is especially helpful for testing, but also paves the way for
virtio-mem optimizations, allowing more memory to get reliably unplugged.

Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing
better documentation of how ZONE_MOVABLE interacts with different kind of
unmovable pages (memory offlining vs.  alloc_contig_range()).


This patch (of 6):

Let's move the split comment regarding bootmem allocations and memory
holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
check.

Link: http://lkml.kernel.org/r/20200816125333.7434-1-david@redhat.com
Link: http://lkml.kernel.org/r/20200816125333.7434-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |   22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

--- a/mm/page_alloc.c~mm-page_alloc-tweak-comments-in-has_unmovable_pages
+++ a/mm/page_alloc.c
@@ -8219,14 +8219,6 @@ struct page *has_unmovable_pages(struct
 	unsigned long iter = 0;
 	unsigned long pfn = page_to_pfn(page);
 
-	/*
-	 * TODO we could make this much more efficient by not checking every
-	 * page in the range if we know all of them are in MOVABLE_ZONE and
-	 * that the movable zone guarantees that pages are migratable but
-	 * the later is not the case right now unfortunatelly. E.g. movablecore
-	 * can still lead to having bootmem allocations in zone_movable.
-	 */
-
 	if (is_migrate_cma_page(page)) {
 		/*
 		 * CMA allocations (alloc_contig_range) really need to mark
@@ -8245,6 +8237,12 @@ struct page *has_unmovable_pages(struct
 
 		page = pfn_to_page(pfn + iter);
 
+		/*
+		 * Both, bootmem allocations and memory holes are marked
+		 * PG_reserved and are unmovable. We can even have unmovable
+		 * allocations inside ZONE_MOVABLE, for example when
+		 * specifying "movablecore".
+		 */
 		if (PageReserved(page))
 			return page;
 
@@ -8318,14 +8316,6 @@ struct page *has_unmovable_pages(struct
 		 * it.  But now, memory offline itself doesn't call
 		 * shrink_node_slabs() and it still to be fixed.
 		 */
-		/*
-		 * If the page is not RAM, page_count()should be 0.
-		 * we don't need more check. This is an _used_ not-movable page.
-		 *
-		 * The problematic thing here is PG_reserved pages. PG_reserved
-		 * is set to both of a memory hole page and a _used_ kernel
-		 * page at boot.
-		 */
 		return page;
 	}
 	return NULL;
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (63 preceding siblings ...)
  2020-08-19  2:55 ` + mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch " Andrew Morton
@ 2020-08-19  2:55 ` Andrew Morton
  2020-08-19  2:55 ` + mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch " Andrew Morton
                   ` (74 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:55 UTC (permalink / raw)
  To: bhe, cai, david, jasowang, mhocko, mike.kravetz, mm-commits, mst,
	pankaj.gupta.linux, rppt


The patch titled
     Subject: mm/page_isolation: exit early when pageblock is isolated in set_migratetype_isolate()
has been added to the -mm tree.  Its filename is
     mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/page_isolation: exit early when pageblock is isolated in set_migratetype_isolate()

Right now, if we have two isolations racing on a pageblock that's in the
MOVABLE zone, we would trigger the WARN_ON_ONCE().  Let's just return
directly, simplifying error handling.

The change was introduced in commit 3d680bdf60a5 ("mm/page_isolation: fix
potential warning from user").  As far as I can see, we currently don't
have alloc_contig_range() users that use the ZONE_MOVABLE (anymore), so
it's currently more a cleanup and a preparation for the future than a fix.

Link: http://lkml.kernel.org/r/20200816125333.7434-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_isolation.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/mm/page_isolation.c~mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate
+++ a/mm/page_isolation.c
@@ -29,10 +29,12 @@ static int set_migratetype_isolate(struc
 	/*
 	 * We assume the caller intended to SET migrate type to isolate.
 	 * If it is already set, then someone else must have raced and
-	 * set it before us.  Return -EBUSY
+	 * set it before us.
 	 */
-	if (is_migrate_isolate_page(page))
-		goto out;
+	if (is_migrate_isolate_page(page)) {
+		spin_unlock_irqrestore(&zone->lock, flags);
+		return -EBUSY;
+	}
 
 	/*
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
@@ -52,7 +54,6 @@ static int set_migratetype_isolate(struc
 		ret = 0;
 	}
 
-out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 	if (!ret) {
 		drain_all_pages(zone);
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (64 preceding siblings ...)
  2020-08-19  2:55 ` + mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch " Andrew Morton
@ 2020-08-19  2:55 ` Andrew Morton
  2020-08-19  2:55 ` + mm-page_isolation-cleanup-set_migratetype_isolate.patch " Andrew Morton
                   ` (73 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:55 UTC (permalink / raw)
  To: bhe, cai, david, jasowang, mhocko, mike.kravetz, mm-commits, mst,
	pankaj.gupta.linux, rppt


The patch titled
     Subject: mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
has been added to the -mm tree.  Its filename is
     mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()

Inside has_unmovable_pages(), we have a comment describing how unmovable
data could end up in ZONE_MOVABLE - via "movablecore".  Also, besides
checking if the first page in the pageblock is reserved, we don't perform
any further checks in case of ZONE_MOVABLE.

In case of memory offlining, we set REPORT_FAILURE, properly dump_page()
the page and handle the error gracefully.  alloc_contig_pages() users
currently never allocate from ZONE_MOVABLE.  E.g., hugetlb uses
alloc_contig_pages() for the allocation of gigantic pages only, which will
never end up on the MOVABLE zone (see htlb_alloc_mask()).

Link: http://lkml.kernel.org/r/20200816125333.7434-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_isolation.c |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

--- a/mm/page_isolation.c~mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate
+++ a/mm/page_isolation.c
@@ -57,15 +57,12 @@ static int set_migratetype_isolate(struc
 	spin_unlock_irqrestore(&zone->lock, flags);
 	if (!ret) {
 		drain_all_pages(zone);
-	} else {
-		WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
-
-		if ((isol_flags & REPORT_FAILURE) && unmovable)
-			/*
-			 * printk() with zone->lock held will likely trigger a
-			 * lockdep splat, so defer it here.
-			 */
-			dump_page(unmovable, "unmovable page");
+	} else if ((isol_flags & REPORT_FAILURE) && unmovable) {
+		/*
+		 * printk() with zone->lock held will likely trigger a
+		 * lockdep splat, so defer it here.
+		 */
+		dump_page(unmovable, "unmovable page");
 	}
 
 	return ret;
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-page_isolation-cleanup-set_migratetype_isolate.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (65 preceding siblings ...)
  2020-08-19  2:55 ` + mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch " Andrew Morton
@ 2020-08-19  2:55 ` Andrew Morton
  2020-08-19  2:55 ` + virtio-mem-dont-special-case-zone_movable.patch " Andrew Morton
                   ` (72 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:55 UTC (permalink / raw)
  To: bhe, cai, david, jasowang, mhocko, mike.kravetz, mm-commits, mst,
	pankaj.gupta.linux, rppt


The patch titled
     Subject: mm/page_isolation: cleanup set_migratetype_isolate()
has been added to the -mm tree.  Its filename is
     mm-page_isolation-cleanup-set_migratetype_isolate.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-page_isolation-cleanup-set_migratetype_isolate.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_isolation-cleanup-set_migratetype_isolate.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/page_isolation: cleanup set_migratetype_isolate()

Let's clean it up a bit, simplifying the exit paths.

Link: http://lkml.kernel.org/r/20200816125333.7434-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_isolation.c |   17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

--- a/mm/page_isolation.c~mm-page_isolation-cleanup-set_migratetype_isolate
+++ a/mm/page_isolation.c
@@ -17,12 +17,9 @@
 
 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
 {
-	struct page *unmovable = NULL;
-	struct zone *zone;
+	struct zone *zone = page_zone(page);
+	struct page *unmovable;
 	unsigned long flags;
-	int ret = -EBUSY;
-
-	zone = page_zone(page);
 
 	spin_lock_irqsave(&zone->lock, flags);
 
@@ -51,13 +48,13 @@ static int set_migratetype_isolate(struc
 									NULL);
 
 		__mod_zone_freepage_state(zone, -nr_pages, mt);
-		ret = 0;
+		spin_unlock_irqrestore(&zone->lock, flags);
+		drain_all_pages(zone);
+		return 0;
 	}
 
 	spin_unlock_irqrestore(&zone->lock, flags);
-	if (!ret) {
-		drain_all_pages(zone);
-	} else if ((isol_flags & REPORT_FAILURE) && unmovable) {
+	if (isol_flags & REPORT_FAILURE) {
 		/*
 		 * printk() with zone->lock held will likely trigger a
 		 * lockdep splat, so defer it here.
@@ -65,7 +62,7 @@ static int set_migratetype_isolate(struc
 		dump_page(unmovable, "unmovable page");
 	}
 
-	return ret;
+	return -EBUSY;
 }
 
 static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + virtio-mem-dont-special-case-zone_movable.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (66 preceding siblings ...)
  2020-08-19  2:55 ` + mm-page_isolation-cleanup-set_migratetype_isolate.patch " Andrew Morton
@ 2020-08-19  2:55 ` Andrew Morton
  2020-08-19  2:55 ` + mm-document-semantics-of-zone_movable.patch " Andrew Morton
                   ` (71 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:55 UTC (permalink / raw)
  To: bhe, cai, david, jasowang, mhocko, mike.kravetz, mm-commits, mst,
	pankaj.gupta.linux, rppt


The patch titled
     Subject: virtio-mem: don't special-case ZONE_MOVABLE
has been added to the -mm tree.  Its filename is
     virtio-mem-dont-special-case-zone_movable.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/virtio-mem-dont-special-case-zone_movable.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/virtio-mem-dont-special-case-zone_movable.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: virtio-mem: don't special-case ZONE_MOVABLE

When introducing virtio-mem, the semantics of ZONE_MOVABLE were rather
unclear, which is why we special-cased ZONE_MOVABLE such that partially
plugged blocks would never end up in ZONE_MOVABLE.

Now that the semantics are much clearer (and will be documented in a
follow-up patch including the new virtio-mem behavior), let's allow to
online partially plugged memory blocks to ZONE_MOVABLE and also consider
memory blocks that were onlined to ZONE_MOVABLE when unplugging memory. 
While unplugged memory pages are, in general, unmovable, they can be
skipped when offlining memory.

virtio-mem only unplugs fairly big chunks (in the megabyte range) and
rather tries to shrink the memory region than randomly choosing memory. 
In theory, if all other pages in the movable zone would be movable,
virtio-mem would only shrink that zone and not create any kind of
fragmentation.

In the future, we might want to remember the zone again and use the
information when (un)plugging memory.  For now, let's keep it simple.

Note: Support for defragmentation is planned, to deal with fragmentation
after unplug due to memory chunks within memory blocks that could not get
unplugged before (e.g., somebody pinning pages within ZONE_MOVABLE for a
longer time).

Link: http://lkml.kernel.org/r/20200816125333.7434-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/virtio/virtio_mem.c |   47 +++++-----------------------------
 1 file changed, 8 insertions(+), 39 deletions(-)

--- a/drivers/virtio/virtio_mem.c~virtio-mem-dont-special-case-zone_movable
+++ a/drivers/virtio/virtio_mem.c
@@ -36,18 +36,10 @@ enum virtio_mem_mb_state {
 	VIRTIO_MEM_MB_STATE_OFFLINE,
 	/* Partially plugged, fully added to Linux, offline. */
 	VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL,
-	/* Fully plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+	/* Fully plugged, fully added to Linux, online. */
 	VIRTIO_MEM_MB_STATE_ONLINE,
-	/* Partially plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+	/* Partially plugged, fully added to Linux, online. */
 	VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL,
-	/*
-	 * Fully plugged, fully added to Linux, online (ZONE_MOVABLE).
-	 * We are not allowed to allocate (unplug) parts of this block that
-	 * are not movable (similar to gigantic pages). We will never allow
-	 * to online OFFLINE_PARTIAL to ZONE_MOVABLE (as they would contain
-	 * unmovable parts).
-	 */
-	VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE,
 	VIRTIO_MEM_MB_STATE_COUNT
 };
 
@@ -526,21 +518,10 @@ static bool virtio_mem_owned_mb(struct v
 }
 
 static int virtio_mem_notify_going_online(struct virtio_mem *vm,
-					  unsigned long mb_id,
-					  enum zone_type zone)
+					  unsigned long mb_id)
 {
 	switch (virtio_mem_mb_get_state(vm, mb_id)) {
 	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
-		/*
-		 * We won't allow to online a partially plugged memory block
-		 * to the MOVABLE zone - it would contain unmovable parts.
-		 */
-		if (zone == ZONE_MOVABLE) {
-			dev_warn_ratelimited(&vm->vdev->dev,
-					     "memory block has holes, MOVABLE not supported\n");
-			return NOTIFY_BAD;
-		}
-		return NOTIFY_OK;
 	case VIRTIO_MEM_MB_STATE_OFFLINE:
 		return NOTIFY_OK;
 	default:
@@ -560,7 +541,6 @@ static void virtio_mem_notify_offline(st
 					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
 		break;
 	case VIRTIO_MEM_MB_STATE_ONLINE:
-	case VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE:
 		virtio_mem_mb_set_state(vm, mb_id,
 					VIRTIO_MEM_MB_STATE_OFFLINE);
 		break;
@@ -579,24 +559,17 @@ static void virtio_mem_notify_offline(st
 	virtio_mem_retry(vm);
 }
 
-static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id,
-				     enum zone_type zone)
+static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
 {
 	unsigned long nb_offline;
 
 	switch (virtio_mem_mb_get_state(vm, mb_id)) {
 	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
-		BUG_ON(zone == ZONE_MOVABLE);
 		virtio_mem_mb_set_state(vm, mb_id,
 					VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
 		break;
 	case VIRTIO_MEM_MB_STATE_OFFLINE:
-		if (zone == ZONE_MOVABLE)
-			virtio_mem_mb_set_state(vm, mb_id,
-					    VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE);
-		else
-			virtio_mem_mb_set_state(vm, mb_id,
-						VIRTIO_MEM_MB_STATE_ONLINE);
+		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE);
 		break;
 	default:
 		BUG();
@@ -675,7 +648,6 @@ static int virtio_mem_memory_notifier_cb
 	const unsigned long start = PFN_PHYS(mhp->start_pfn);
 	const unsigned long size = PFN_PHYS(mhp->nr_pages);
 	const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
-	enum zone_type zone;
 	int rc = NOTIFY_OK;
 
 	if (!virtio_mem_overlaps_range(vm, start, size))
@@ -717,8 +689,7 @@ static int virtio_mem_memory_notifier_cb
 			break;
 		}
 		vm->hotplug_active = true;
-		zone = page_zonenum(pfn_to_page(mhp->start_pfn));
-		rc = virtio_mem_notify_going_online(vm, mb_id, zone);
+		rc = virtio_mem_notify_going_online(vm, mb_id);
 		break;
 	case MEM_OFFLINE:
 		virtio_mem_notify_offline(vm, mb_id);
@@ -726,8 +697,7 @@ static int virtio_mem_memory_notifier_cb
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
 	case MEM_ONLINE:
-		zone = page_zonenum(pfn_to_page(mhp->start_pfn));
-		virtio_mem_notify_online(vm, mb_id, zone);
+		virtio_mem_notify_online(vm, mb_id);
 		vm->hotplug_active = false;
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
@@ -1906,8 +1876,7 @@ static void virtio_mem_remove(struct vir
 	if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] ||
 	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
 	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL] ||
-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE]) {
+	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) {
 		dev_warn(&vdev->dev, "device still has system memory added\n");
 	} else {
 		virtio_mem_delete_resource(vm);
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-document-semantics-of-zone_movable.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (67 preceding siblings ...)
  2020-08-19  2:55 ` + virtio-mem-dont-special-case-zone_movable.patch " Andrew Morton
@ 2020-08-19  2:55 ` Andrew Morton
  2020-08-19  3:09 ` + mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag.patch " Andrew Morton
                   ` (70 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  2:55 UTC (permalink / raw)
  To: bhe, cai, david, jasowang, mhocko, mike.kravetz, mm-commits, mst,
	pankaj.gupta.linux, rppt


The patch titled
     Subject: mm: document semantics of ZONE_MOVABLE
has been added to the -mm tree.  Its filename is
     mm-document-semantics-of-zone_movable.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-document-semantics-of-zone_movable.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-document-semantics-of-zone_movable.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm: document semantics of ZONE_MOVABLE

Let's document what ZONE_MOVABLE means, how it's used, and which special
cases we have regarding unmovable pages (memory offlining vs.  migration /
allocations).

Link: http://lkml.kernel.org/r/20200816125333.7434-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmzone.h |   35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

--- a/include/linux/mmzone.h~mm-document-semantics-of-zone_movable
+++ a/include/linux/mmzone.h
@@ -396,6 +396,41 @@ enum zone_type {
 	 */
 	ZONE_HIGHMEM,
 #endif
+	/*
+	 * ZONE_MOVABLE is similar to ZONE_NORMAL, except that it contains
+	 * movable pages with few exceptional cases described below. Main use
+	 * cases for ZONE_MOVABLE are to make memory offlining/unplug more
+	 * likely to succeed, and to locally limit unmovable allocations - e.g.,
+	 * to increase the number of THP/huge pages. Notable special cases are:
+	 *
+	 * 1. Pinned pages: (long-term) pinning of movable pages might
+	 *    essentially turn such pages unmovable. Memory offlining might
+	 *    retry a long time.
+	 * 2. memblock allocations: kernelcore/movablecore setups might create
+	 *    situations where ZONE_MOVABLE contains unmovable allocations
+	 *    after boot. Memory offlining and allocations fail early.
+	 * 3. Memory holes: kernelcore/movablecore setups might create very rare
+	 *    situations where ZONE_MOVABLE contains memory holes after boot,
+	 *    for example, if we have sections that are only partially
+	 *    populated. Memory offlining and allocations fail early.
+	 * 4. PG_hwpoison pages: while poisoned pages can be skipped during
+	 *    memory offlining, such pages cannot be allocated.
+	 * 5. Unmovable PG_offline pages: in paravirtualized environments,
+	 *    hotplugged memory blocks might only partially be managed by the
+	 *    buddy (e.g., via XEN-balloon, Hyper-V balloon, virtio-mem). The
+	 *    parts not manged by the buddy are unmovable PG_offline pages. In
+	 *    some cases (virtio-mem), such pages can be skipped during
+	 *    memory offlining, however, cannot be moved/allocated. These
+	 *    techniques might use alloc_contig_range() to hide previously
+	 *    exposed pages from the buddy again (e.g., to implement some sort
+	 *    of memory unplug in virtio-mem).
+	 *
+	 * In general, no unmovable allocations that degrade memory offlining
+	 * should end up in ZONE_MOVABLE. Allocators (like alloc_contig_range())
+	 * have to expect that migrating pages in ZONE_MOVABLE can fail (even
+	 * if has_unmovable_pages() states that there are no unmovable pages,
+	 * there can be false negatives).
+	 */
 	ZONE_MOVABLE,
 #ifdef CONFIG_ZONE_DEVICE
 	ZONE_DEVICE,
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (68 preceding siblings ...)
  2020-08-19  2:55 ` + mm-document-semantics-of-zone_movable.patch " Andrew Morton
@ 2020-08-19  3:09 ` Andrew Morton
  2020-08-19  3:13 ` + squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch " Andrew Morton
                   ` (69 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:09 UTC (permalink / raw)
  To: corbet, dan.j.williams, david, hch, jack, jgg, jglisse, jhubbard,
	mhocko, mike.kravetz, mm-commits, shuah, song.bao.hua, vbabka,
	viro, willy


The patch titled
     Subject: mm/gup_benchmark: use pin_user_pages for FOLL_LONGTERM flag
has been added to the -mm tree.  Its filename is
     mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Barry Song <song.bao.hua@hisilicon.com>
Subject: mm/gup_benchmark: use pin_user_pages for FOLL_LONGTERM flag

According to Documentation/core-api/pin_user_pages.rst, FOLL_PIN is a
prerequisite to FOLL_LONGTERM.  Another way of saying that is,
FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN.

Almost all kernel modules are using pin_user_pages() with FOLL_LONGTERM,
mm/gup_benchmark.c seems to the only exception in which FOLL_PIN is not a
prerequisite to FOLL_LONGTERM.

Link: http://lkml.kernel.org/r/20200815122056.29508-1-song.bao.hua@hisilicon.com
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/gup_benchmark.c                         |   23 +++++++++----------
 tools/testing/selftests/vm/gup_benchmark.c |   14 +++++------
 2 files changed, 19 insertions(+), 18 deletions(-)

--- a/mm/gup_benchmark.c~mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag
+++ a/mm/gup_benchmark.c
@@ -6,10 +6,10 @@
 #include <linux/debugfs.h>
 
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
-#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
-#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
-#define PIN_FAST_BENCHMARK	_IOWR('g', 4, struct gup_benchmark)
-#define PIN_BENCHMARK		_IOWR('g', 5, struct gup_benchmark)
+#define GUP_BENCHMARK		_IOWR('g', 2, struct gup_benchmark)
+#define PIN_FAST_BENCHMARK	_IOWR('g', 3, struct gup_benchmark)
+#define PIN_BENCHMARK		_IOWR('g', 4, struct gup_benchmark)
+#define PIN_LONGTERM_BENCHMARK	_IOWR('g', 5, struct gup_benchmark)
 
 struct gup_benchmark {
 	__u64 get_delta_usec;
@@ -28,7 +28,6 @@ static void put_back_pages(unsigned int
 
 	switch (cmd) {
 	case GUP_FAST_BENCHMARK:
-	case GUP_LONGTERM_BENCHMARK:
 	case GUP_BENCHMARK:
 		for (i = 0; i < nr_pages; i++)
 			put_page(pages[i]);
@@ -36,6 +35,7 @@ static void put_back_pages(unsigned int
 
 	case PIN_FAST_BENCHMARK:
 	case PIN_BENCHMARK:
+	case PIN_LONGTERM_BENCHMARK:
 		unpin_user_pages(pages, nr_pages);
 		break;
 	}
@@ -50,6 +50,7 @@ static void verify_dma_pinned(unsigned i
 	switch (cmd) {
 	case PIN_FAST_BENCHMARK:
 	case PIN_BENCHMARK:
+	case PIN_LONGTERM_BENCHMARK:
 		for (i = 0; i < nr_pages; i++) {
 			page = pages[i];
 			if (WARN(!page_maybe_dma_pinned(page),
@@ -101,11 +102,6 @@ static int __gup_benchmark_ioctl(unsigne
 			nr = get_user_pages_fast(addr, nr, gup->flags,
 						 pages + i);
 			break;
-		case GUP_LONGTERM_BENCHMARK:
-			nr = get_user_pages(addr, nr,
-					    gup->flags | FOLL_LONGTERM,
-					    pages + i, NULL);
-			break;
 		case GUP_BENCHMARK:
 			nr = get_user_pages(addr, nr, gup->flags, pages + i,
 					    NULL);
@@ -118,6 +114,11 @@ static int __gup_benchmark_ioctl(unsigne
 			nr = pin_user_pages(addr, nr, gup->flags, pages + i,
 					    NULL);
 			break;
+		case PIN_LONGTERM_BENCHMARK:
+			nr = pin_user_pages(addr, nr,
+					    gup->flags | FOLL_LONGTERM,
+					    pages + i, NULL);
+			break;
 		default:
 			kvfree(pages);
 			ret = -EINVAL;
@@ -162,10 +163,10 @@ static long gup_benchmark_ioctl(struct f
 
 	switch (cmd) {
 	case GUP_FAST_BENCHMARK:
-	case GUP_LONGTERM_BENCHMARK:
 	case GUP_BENCHMARK:
 	case PIN_FAST_BENCHMARK:
 	case PIN_BENCHMARK:
+	case PIN_LONGTERM_BENCHMARK:
 		break;
 	default:
 		return -EINVAL;
--- a/tools/testing/selftests/vm/gup_benchmark.c~mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag
+++ a/tools/testing/selftests/vm/gup_benchmark.c
@@ -15,12 +15,12 @@
 #define PAGE_SIZE sysconf(_SC_PAGESIZE)
 
 #define GUP_FAST_BENCHMARK	_IOWR('g', 1, struct gup_benchmark)
-#define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
-#define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
+#define GUP_BENCHMARK		_IOWR('g', 2, struct gup_benchmark)
 
 /* Similar to above, but use FOLL_PIN instead of FOLL_GET. */
-#define PIN_FAST_BENCHMARK	_IOWR('g', 4, struct gup_benchmark)
-#define PIN_BENCHMARK		_IOWR('g', 5, struct gup_benchmark)
+#define PIN_FAST_BENCHMARK	_IOWR('g', 3, struct gup_benchmark)
+#define PIN_BENCHMARK		_IOWR('g', 4, struct gup_benchmark)
+#define PIN_LONGTERM_BENCHMARK	_IOWR('g', 5, struct gup_benchmark)
 
 /* Just the flags we need, copied from mm.h: */
 #define FOLL_WRITE	0x01	/* check pte is writable */
@@ -52,6 +52,9 @@ int main(int argc, char **argv)
 		case 'b':
 			cmd = PIN_BENCHMARK;
 			break;
+		case 'L':
+			cmd = PIN_LONGTERM_BENCHMARK;
+			break;
 		case 'm':
 			size = atoi(optarg) * MB;
 			break;
@@ -67,9 +70,6 @@ int main(int argc, char **argv)
 		case 'T':
 			thp = 0;
 			break;
-		case 'L':
-			cmd = GUP_LONGTERM_BENCHMARK;
-			break;
 		case 'U':
 			cmd = GUP_BENCHMARK;
 			break;
_

Patches currently in -mm which might be from song.bao.hua@hisilicon.com are

mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (69 preceding siblings ...)
  2020-08-19  3:09 ` + mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag.patch " Andrew Morton
@ 2020-08-19  3:13 ` Andrew Morton
  2020-08-19  3:19 ` + mm-include-cma-pages-in-lowmem_reserve-at-boot.patch " Andrew Morton
                   ` (68 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:13 UTC (permalink / raw)
  To: adrien+dev, drosen, groeck, hch, mm-commits, nicolas.prochazka,
	phillip, pliard, shimada, stable


The patch titled
     Subject: squashfs: avoid bio_alloc() failure with 1Mbyte blocks
has been added to the -mm tree.  Its filename is
     squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Phillip Lougher <phillip@squashfs.org.uk>
Subject: squashfs: avoid bio_alloc() failure with 1Mbyte blocks

This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".

Bio_alloc() is limited to 256 pages (1 Mbyte).  This can cause a failure
when reading 1 Mbyte block filesystems.  The problem is a datablock can be
fully (or almost uncompressed), requiring 256 pages, but, because blocks
are not aligned to page boundaries, it may require 257 pages to read.

Bio_kmalloc() can handle 1024 pages, and so use this for the edge
condition.

Link: http://lkml.kernel.org/r/20200815035637.15319-1-phillip@squashfs.org.uk
Fixes: 93e72b3c612a ("squashfs: migrate from ll_rw_block usage to BIO")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: Nicolas Prochazka <nicolas.prochazka@gmail.com>
Reported-by: Tomoatsu Shimada <shimada@walbrix.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Cc: Philippe Liard <pliard@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Adrien Schildknecht <adrien+dev@schischi.me>
Cc: Daniel Rosenberg <drosen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/squashfs/block.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/fs/squashfs/block.c~squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks
+++ a/fs/squashfs/block.c
@@ -87,7 +87,11 @@ static int squashfs_bio_read(struct supe
 	int error, i;
 	struct bio *bio;
 
-	bio = bio_alloc(GFP_NOIO, page_count);
+	if (page_count <= BIO_MAX_PAGES)
+		bio = bio_alloc(GFP_NOIO, page_count);
+	else
+		bio = bio_kmalloc(GFP_NOIO, page_count);
+
 	if (!bio)
 		return -ENOMEM;
 
_

Patches currently in -mm which might be from phillip@squashfs.org.uk are

squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-include-cma-pages-in-lowmem_reserve-at-boot.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (70 preceding siblings ...)
  2020-08-19  3:13 ` + squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch " Andrew Morton
@ 2020-08-19  3:19 ` Andrew Morton
  2020-08-19  3:21 ` + mm-dmapoolc-replace-open-coded-list_for_each_entry_safe.patch " Andrew Morton
                   ` (67 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:19 UTC (permalink / raw)
  To: jbaron, kirill.shutemov, mhocko, mm-commits, opendmb, rientjes


The patch titled
     Subject: mm: include CMA pages in lowmem_reserve at boot
has been added to the -mm tree.  Its filename is
     mm-include-cma-pages-in-lowmem_reserve-at-boot.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-include-cma-pages-in-lowmem_reserve-at-boot.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-include-cma-pages-in-lowmem_reserve-at-boot.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Doug Berger <opendmb@gmail.com>
Subject: mm: include CMA pages in lowmem_reserve at boot

The lowmem_reserve arrays provide a means of applying pressure against
allocations from lower zones that were targeted at higher zones.  Its
values are a function of the number of pages managed by higher zones and
are assigned by a call to the setup_per_zone_lowmem_reserve() function.

The function is initially called at boot time by the function
init_per_zone_wmark_min() and may be called later by accesses of the
/proc/sys/vm/lowmem_reserve_ratio sysctl file.

The function init_per_zone_wmark_min() was moved up from a module_init to
a core_initcall to resolve a sequencing issue with khugepaged. 
Unfortunately this created a sequencing issue with CMA page accounting.

The CMA pages are added to the managed page count of a zone when
cma_init_reserved_areas() is called at boot also as a core_initcall.  This
makes it uncertain whether the CMA pages will be added to the managed page
counts of their zones before or after the call to
init_per_zone_wmark_min() as it becomes dependent on link order.  With the
current link order the pages are added to the managed count after the
lowmem_reserve arrays are initialized at boot.

This means the lowmem_reserve values at boot may be lower than the values
used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the
ratio values are unchanged.

In many cases the difference is not significant, but for example
an ARM platform with 1GB of memory and the following memory layout
[    0.000000] cma: Reserved 256 MiB at 0x0000000030000000
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x000000002fffffff]
[    0.000000]   Normal   empty
[    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fffffff]

would result in 0 lowmem_reserve for the DMA zone.  This would allow
userspace to deplete the DMA zone easily.  Funnily enough

$ cat /proc/sys/vm/lowmem_reserve_ratio

would fix up the situation because it forces setup_per_zone_lowmem_reserve
as a side effect.

This commit breaks the link order dependency by invoking
init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages
have the chance to be properly accounted in their zone(s) and allowing the
lowmem_reserve arrays to receive consistent values.

Link: http://lkml.kernel.org/r/1597423766-27849-1-git-send-email-opendmb@gmail.com
Fixes: bc22af74f271 ("mm: update min_free_kbytes from khugepaged after core initialization")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/page_alloc.c~mm-include-cma-pages-in-lowmem_reserve-at-boot
+++ a/mm/page_alloc.c
@@ -7888,7 +7888,7 @@ int __meminit init_per_zone_wmark_min(vo
 
 	return 0;
 }
-core_initcall(init_per_zone_wmark_min)
+postcore_initcall(init_per_zone_wmark_min)
 
 /*
  * min_free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so
_

Patches currently in -mm which might be from opendmb@gmail.com are

mm-include-cma-pages-in-lowmem_reserve-at-boot.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-dmapoolc-replace-open-coded-list_for_each_entry_safe.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (71 preceding siblings ...)
  2020-08-19  3:19 ` + mm-include-cma-pages-in-lowmem_reserve-at-boot.patch " Andrew Morton
@ 2020-08-19  3:21 ` Andrew Morton
  2020-08-19  3:21 ` + mm-dmapoolc-replace-hard-coded-function-name-with-__func__.patch " Andrew Morton
                   ` (66 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:21 UTC (permalink / raw)
  To: andriy.shevchenko, mm-commits, willy


The patch titled
     Subject: mm/dmapool.c: replace open-coded list_for_each_entry_safe()
has been added to the -mm tree.  Its filename is
     mm-dmapoolc-replace-open-coded-list_for_each_entry_safe.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-dmapoolc-replace-open-coded-list_for_each_entry_safe.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-dmapoolc-replace-open-coded-list_for_each_entry_safe.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: mm/dmapool.c: replace open-coded list_for_each_entry_safe()

There is a place in the code where open-coded version of
list_for_each_entry_safe() is used.  Replace that with the standard macro.

Link: http://lkml.kernel.org/r/20200814135055.24898-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/dmapool.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/mm/dmapool.c~mm-dmapoolc-replace-open-coded-list_for_each_entry_safe
+++ a/mm/dmapool.c
@@ -266,6 +266,7 @@ static void pool_free_page(struct dma_po
  */
 void dma_pool_destroy(struct dma_pool *pool)
 {
+	struct dma_page *page, *tmp;
 	bool empty = false;
 
 	if (unlikely(!pool))
@@ -281,10 +282,7 @@ void dma_pool_destroy(struct dma_pool *p
 		device_remove_file(pool->dev, &dev_attr_pools);
 	mutex_unlock(&pools_reg_lock);
 
-	while (!list_empty(&pool->page_list)) {
-		struct dma_page *page;
-		page = list_entry(pool->page_list.next,
-				  struct dma_page, page_list);
+	list_for_each_entry_safe(page, tmp, &pool->page_list, page_list) {
 		if (is_page_busy(page)) {
 			if (pool->dev)
 				dev_err(pool->dev,
_

Patches currently in -mm which might be from andriy.shevchenko@linux.intel.com are

mm-dmapoolc-replace-open-coded-list_for_each_entry_safe.patch
mm-dmapoolc-replace-hard-coded-function-name-with-__func__.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-dmapoolc-replace-hard-coded-function-name-with-__func__.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (72 preceding siblings ...)
  2020-08-19  3:21 ` + mm-dmapoolc-replace-open-coded-list_for_each_entry_safe.patch " Andrew Morton
@ 2020-08-19  3:21 ` Andrew Morton
  2020-08-19  3:27 ` + mm-slub-branch-optimization-in-free-slowpath.patch " Andrew Morton
                   ` (65 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:21 UTC (permalink / raw)
  To: andriy.shevchenko, mm-commits, willy


The patch titled
     Subject: mm/dmapool.c: replace hard coded function name with __func__
has been added to the -mm tree.  Its filename is
     mm-dmapoolc-replace-hard-coded-function-name-with-__func__.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-dmapoolc-replace-hard-coded-function-name-with-__func__.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-dmapoolc-replace-hard-coded-function-name-with-__func__.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: mm/dmapool.c: replace hard coded function name with __func__

No need to hard code function name when __func__ can be used.

While here, replace specifiers for special types like dma_addr_t.

Link: http://lkml.kernel.org/r/20200814135055.24898-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/dmapool.c |   40 ++++++++++++++++++----------------------
 1 file changed, 18 insertions(+), 22 deletions(-)

--- a/mm/dmapool.c~mm-dmapoolc-replace-hard-coded-function-name-with-__func__
+++ a/mm/dmapool.c
@@ -285,11 +285,10 @@ void dma_pool_destroy(struct dma_pool *p
 	list_for_each_entry_safe(page, tmp, &pool->page_list, page_list) {
 		if (is_page_busy(page)) {
 			if (pool->dev)
-				dev_err(pool->dev,
-					"dma_pool_destroy %s, %p busy\n",
+				dev_err(pool->dev, "%s %s, %p busy\n", __func__,
 					pool->name, page->vaddr);
 			else
-				pr_err("dma_pool_destroy %s, %p busy\n",
+				pr_err("%s %s, %p busy\n", __func__,
 				       pool->name, page->vaddr);
 			/* leak the still-in-use consistent memory */
 			list_del(&page->page_list);
@@ -353,12 +352,11 @@ void *dma_pool_alloc(struct dma_pool *po
 			if (data[i] == POOL_POISON_FREED)
 				continue;
 			if (pool->dev)
-				dev_err(pool->dev,
-					"dma_pool_alloc %s, %p (corrupted)\n",
-					pool->name, retval);
+				dev_err(pool->dev, "%s %s, %p (corrupted)\n",
+					__func__, pool->name, retval);
 			else
-				pr_err("dma_pool_alloc %s, %p (corrupted)\n",
-					pool->name, retval);
+				pr_err("%s %s, %p (corrupted)\n",
+					__func__, pool->name, retval);
 
 			/*
 			 * Dump the first 4 bytes even if they are not
@@ -414,12 +412,11 @@ void dma_pool_free(struct dma_pool *pool
 	if (!page) {
 		spin_unlock_irqrestore(&pool->lock, flags);
 		if (pool->dev)
-			dev_err(pool->dev,
-				"dma_pool_free %s, %p/%lx (bad dma)\n",
-				pool->name, vaddr, (unsigned long)dma);
+			dev_err(pool->dev, "%s %s, %p/%pad (bad dma)\n",
+				__func__, pool->name, vaddr, &dma);
 		else
-			pr_err("dma_pool_free %s, %p/%lx (bad dma)\n",
-			       pool->name, vaddr, (unsigned long)dma);
+			pr_err("%s %s, %p/%pad (bad dma)\n",
+			       __func__, pool->name, vaddr, &dma);
 		return;
 	}
 
@@ -430,12 +427,11 @@ void dma_pool_free(struct dma_pool *pool
 	if ((dma - page->dma) != offset) {
 		spin_unlock_irqrestore(&pool->lock, flags);
 		if (pool->dev)
-			dev_err(pool->dev,
-				"dma_pool_free %s, %p (bad vaddr)/%pad\n",
-				pool->name, vaddr, &dma);
+			dev_err(pool->dev, "%s %s, %p (bad vaddr)/%pad\n",
+				__func__, pool->name, vaddr, &dma);
 		else
-			pr_err("dma_pool_free %s, %p (bad vaddr)/%pad\n",
-			       pool->name, vaddr, &dma);
+			pr_err("%s %s, %p (bad vaddr)/%pad\n",
+			       __func__, pool->name, vaddr, &dma);
 		return;
 	}
 	{
@@ -447,11 +443,11 @@ void dma_pool_free(struct dma_pool *pool
 			}
 			spin_unlock_irqrestore(&pool->lock, flags);
 			if (pool->dev)
-				dev_err(pool->dev, "dma_pool_free %s, dma %pad already free\n",
-					pool->name, &dma);
+				dev_err(pool->dev, "%s %s, dma %pad already free\n",
+					__func__, pool->name, &dma);
 			else
-				pr_err("dma_pool_free %s, dma %pad already free\n",
-				       pool->name, &dma);
+				pr_err("%s %s, dma %pad already free\n",
+				       __func__, pool->name, &dma);
 			return;
 		}
 	}
_

Patches currently in -mm which might be from andriy.shevchenko@linux.intel.com are

mm-dmapoolc-replace-open-coded-list_for_each_entry_safe.patch
mm-dmapoolc-replace-hard-coded-function-name-with-__func__.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-slub-branch-optimization-in-free-slowpath.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (73 preceding siblings ...)
  2020-08-19  3:21 ` + mm-dmapoolc-replace-hard-coded-function-name-with-__func__.patch " Andrew Morton
@ 2020-08-19  3:27 ` Andrew Morton
  2020-08-19  3:39 ` [to-be-updated] mm-page_alloc-keep-memoryless-cpuless-node-0-offline.patch removed from " Andrew Morton
                   ` (64 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:27 UTC (permalink / raw)
  To: cl, hewenliang4, hushiyuan, iamjoonsoo.kim, mm-commits, penberg,
	rientjes, wuyun.wu


The patch titled
     Subject: mm/slub.c: branch optimization in free slowpath
has been added to the -mm tree.  Its filename is
     mm-slub-branch-optimization-in-free-slowpath.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-slub-branch-optimization-in-free-slowpath.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-slub-branch-optimization-in-free-slowpath.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Abel Wu <wuyun.wu@huawei.com>
Subject: mm/slub.c: branch optimization in free slowpath

The two conditions are mutually exclusive and gcc compiler will optimise
this into if-else-like pattern.  Given that the majority of free_slowpath
is free_frozen, let's provide some hint to the compilers.

Tests (perf bench sched messaging -g 20 -l 400000, executed 10x
after reboot) are done and the summarized result:

	un-patched	patched
max.	192.316		189.851
min.	187.267		186.252
avg.	189.154		188.086
stdev.	1.37		0.99

Link: http://lkml.kernel.org/r/20200813101812.1617-1-wuyun.wu@huawei.com
Signed-off-by: Abel Wu <wuyun.wu@huawei.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

--- a/mm/slub.c~mm-slub-branch-optimization-in-free-slowpath
+++ a/mm/slub.c
@@ -3024,20 +3024,21 @@ static void __slab_free(struct kmem_cach
 
 	if (likely(!n)) {
 
-		/*
-		 * If we just froze the page then put it onto the
-		 * per cpu partial list.
-		 */
-		if (new.frozen && !was_frozen) {
+		if (likely(was_frozen)) {
+			/*
+			 * The list lock was not taken therefore no list
+			 * activity can be necessary.
+			 */
+			stat(s, FREE_FROZEN);
+		} else if (new.frozen) {
+			/*
+			 * If we just froze the page then put it onto the
+			 * per cpu partial list.
+			 */
 			put_cpu_partial(s, page, 1);
 			stat(s, CPU_PARTIAL_FREE);
 		}
-		/*
-		 * The list lock was not taken therefore no list
-		 * activity can be necessary.
-		 */
-		if (was_frozen)
-			stat(s, FREE_FROZEN);
+
 		return;
 	}
 
_

Patches currently in -mm which might be from wuyun.wu@huawei.com are

mm-slub-branch-optimization-in-free-slowpath.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* [to-be-updated] mm-page_alloc-keep-memoryless-cpuless-node-0-offline.patch removed from -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (74 preceding siblings ...)
  2020-08-19  3:27 ` + mm-slub-branch-optimization-in-free-slowpath.patch " Andrew Morton
@ 2020-08-19  3:39 ` Andrew Morton
  2020-08-19  3:39 ` [to-be-updated] powerpc-numa-set-numa_node-for-all-possible-cpus.patch " Andrew Morton
                   ` (63 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:39 UTC (permalink / raw)
  To: cl, david, ego, kirill, mgorman, mhocko, mm-commits, mpe,
	sathnaga, srikar, vbabka


The patch titled
     Subject: mm/page_alloc: keep memoryless cpuless node 0 offline
has been removed from the -mm tree.  Its filename was
     mm-page_alloc-keep-memoryless-cpuless-node-0-offline.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Subject: mm/page_alloc: keep memoryless cpuless node 0 offline

Currently Linux kernel with CONFIG_NUMA on a system with multiple possible
nodes, marks node 0 as online at boot.  However in practice, there are
systems which have node 0 as memoryless and cpuless.

This can cause numa_balancing to be enabled on systems with only one node
with memory and CPUs.  The existence of this dummy node which is cpuless
and memoryless node can confuse users/scripts looking at output of lscpu /
numactl.

By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is
always online.

v5.8-rc2
 available: 2 nodes (0,2)
 node 0 cpus:
 node 0 size: 0 MB
 node 0 free: 0 MB
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31490 MB
 node distances:
 node   0   2
   0:  10  20
   2:  20  10

proc and sys files
------------------
 /sys/devices/system/node/online:            0,2
 /proc/sys/kernel/numa_balancing:            1
 /sys/devices/system/node/has_cpu:           2
 /sys/devices/system/node/has_memory:        2
 /sys/devices/system/node/has_normal_memory: 2
 /sys/devices/system/node/possible:          0-31

v5.8-rc2 + patch
------------------
 available: 1 nodes (2)
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31487 MB
 node distances:
 node   2
   2:  10

proc and sys files
------------------
/sys/devices/system/node/online:            2
/proc/sys/kernel/numa_balancing:            0
/sys/devices/system/node/has_cpu:           2
/sys/devices/system/node/has_memory:        2
/sys/devices/system/node/has_normal_memory: 2
/sys/devices/system/node/possible:          0-31

Note: On Powerpc, cpu_to_node of possible but not present cpus would
previously return 0.  Hence this commit depends on commit ("powerpc/numa:
Set numa_node for all possible cpus") and commit ("powerpc/numa: Prefer
node id queried from vphn").  Without the 2 commits, Powerpc system might
crash.

1. User space applications like Numactl, lscpu, that parse the sysfs
   tend to believe there is an extra online node.  This tends to confuse
   users and applications.  Other user space applications start believing
   that system was not able to use all the resources (i.e missing
   resources) or the system was not setup correctly.

2. Also existence of dummy node also leads to inconsistent
   information.  The number of online nodes is inconsistent with the
   information in the device-tree and resource-dump

3. When the dummy node is present, single node non-Numa systems end up
   showing up as NUMA systems and numa_balancing gets enabled.  This will
   mean we take the hit from the unnecessary numa hinting faults.

Link: http://lkml.kernel.org/r/20200624092846.9194-4-srikar@linux.vnet.ibm.com
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Christopher Lameter <cl@linux.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/page_alloc.c~mm-page_alloc-keep-memoryless-cpuless-node-0-offline
+++ a/mm/page_alloc.c
@@ -117,8 +117,10 @@ EXPORT_SYMBOL(latent_entropy);
  */
 nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
 	[N_POSSIBLE] = NODE_MASK_ALL,
+#ifdef CONFIG_NUMA
+	[N_ONLINE] = NODE_MASK_NONE,
+#else
 	[N_ONLINE] = { { [0] = 1UL } },
-#ifndef CONFIG_NUMA
 	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
 #ifdef CONFIG_HIGHMEM
 	[N_HIGH_MEMORY] = { { [0] = 1UL } },
_

Patches currently in -mm which might be from srikar@linux.vnet.ibm.com are

powerpc-numa-set-numa_node-for-all-possible-cpus.patch
powerpc-numa-prefer-node-id-queried-from-vphn.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* [to-be-updated] powerpc-numa-set-numa_node-for-all-possible-cpus.patch removed from -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (75 preceding siblings ...)
  2020-08-19  3:39 ` [to-be-updated] mm-page_alloc-keep-memoryless-cpuless-node-0-offline.patch removed from " Andrew Morton
@ 2020-08-19  3:39 ` Andrew Morton
  2020-08-19  3:39 ` [to-be-updated] powerpc-numa-prefer-node-id-queried-from-vphn.patch " Andrew Morton
                   ` (62 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:39 UTC (permalink / raw)
  To: cl, david, ego, kirill, mgorman, mhocko, mm-commits, mpe,
	sathnaga, srikar, vbabka


The patch titled
     Subject: powerpc/numa: set numa_node for all possible cpus
has been removed from the -mm tree.  Its filename was
     powerpc-numa-set-numa_node-for-all-possible-cpus.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Subject: powerpc/numa: set numa_node for all possible cpus

Patch series "Offline memoryless cpuless node 0", v5.

Linux kernel configured with CONFIG_NUMA on a system with multiple
   possible nodes, marks node 0 as online at boot.  However in practice,
   there are systems which have node 0 as memoryless and cpuless.

This can cause
1. numa_balancing to be enabled on systems with only one online node.
2. Existence of dummy (cpuless and memoryless) node which can confuse
users/scripts looking at output of lscpu / numactl.

This patchset wants to correct this anomaly.

This should only affect systems that have CONFIG_MEMORYLESS_NODES. 
   Currently there are only 2 architectures ia64 and powerpc that have
   this config.

Note: Patch 3 in this patch series depends on patches 1 and 2.  Without
   patches 1 and 2, patch 3 might crash powerpc.


This patch (of 3):

A Powerpc system with multiple possible nodes and with CONFIG_NUMA enabled
always used to have a node 0, even if node 0 does not any cpus or memory
attached to it.  As per PAPR, node affinity of a cpu is only available
once its present / online.  For all cpus that are possible but not
present, cpu_to_node() would point to node 0.

To ensure a cpuless, memoryless dummy node is not online, powerpc need to
make sure all possible but not present cpu_to_node are set to a proper
node.

Link: http://lkml.kernel.org/r/20200624092846.9194-1-srikar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20200624092846.9194-2-srikar@linux.vnet.ibm.com
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Christopher Lameter <cl@linux.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/mm/numa.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/arch/powerpc/mm/numa.c~powerpc-numa-set-numa_node-for-all-possible-cpus
+++ a/arch/powerpc/mm/numa.c
@@ -506,6 +506,11 @@ static int numa_setup_cpu(unsigned long
 	int fcpu = cpu_first_thread_sibling(lcpu);
 	int nid = NUMA_NO_NODE;
 
+	if (!cpu_present(lcpu)) {
+		set_cpu_numa_node(lcpu, first_online_node);
+		return first_online_node;
+	}
+
 	/*
 	 * If a valid cpu-to-node mapping is already available, use it
 	 * directly instead of querying the firmware, since it represents
@@ -931,8 +936,17 @@ void __init mem_topology_setup(void)
 
 	reset_numa_cpu_lookup_table();
 
-	for_each_present_cpu(cpu)
+	for_each_possible_cpu(cpu) {
+		/*
+		 * Powerpc with CONFIG_NUMA always used to have a node 0,
+		 * even if it was memoryless or cpuless. For all cpus that
+		 * are possible but not present, cpu_to_node() would point
+		 * to node 0. To remove a cpuless, memoryless dummy node,
+		 * powerpc need to make sure all possible but not present
+		 * cpu_to_node are set to a proper node.
+		 */
 		numa_setup_cpu(cpu);
+	}
 }
 
 void __init initmem_init(void)
_

Patches currently in -mm which might be from srikar@linux.vnet.ibm.com are

powerpc-numa-prefer-node-id-queried-from-vphn.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* [to-be-updated] powerpc-numa-prefer-node-id-queried-from-vphn.patch removed from -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (76 preceding siblings ...)
  2020-08-19  3:39 ` [to-be-updated] powerpc-numa-set-numa_node-for-all-possible-cpus.patch " Andrew Morton
@ 2020-08-19  3:39 ` Andrew Morton
  2020-08-19  3:50 ` + mm-memcg-warning-on-memcg-after-readahead-page-charged.patch added to " Andrew Morton
                   ` (61 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:39 UTC (permalink / raw)
  To: cl, david, ego, kirill, mgorman, mhocko, mm-commits, mpe,
	sathnaga, srikar, vbabka


The patch titled
     Subject: powerpc/numa: prefer node id queried from vphn
has been removed from the -mm tree.  Its filename was
     powerpc-numa-prefer-node-id-queried-from-vphn.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Subject: powerpc/numa: prefer node id queried from vphn

Node id queried from the static device tree may not be correct.  For
example: it may always show 0 on a shared processor.  Hence prefer the
node id queried from vphn and fallback on the device tree based node id if
vphn query fails.

Link: http://lkml.kernel.org/r/20200624092846.9194-3-srikar@linux.vnet.ibm.com
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Christopher Lameter <cl@linux.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/mm/numa.c |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

--- a/arch/powerpc/mm/numa.c~powerpc-numa-prefer-node-id-queried-from-vphn
+++ a/arch/powerpc/mm/numa.c
@@ -724,21 +724,22 @@ static int __init parse_numa_properties(
 	 */
 	for_each_present_cpu(i) {
 		struct device_node *cpu;
-		int nid;
-
-		cpu = of_get_cpu_node(i, NULL);
-		BUG_ON(!cpu);
-		nid = of_node_to_nid_single(cpu);
-		of_node_put(cpu);
+		int nid = vphn_get_nid(i);
 
 		/*
 		 * Don't fall back to default_nid yet -- we will plug
 		 * cpus into nodes once the memory scan has discovered
 		 * the topology.
 		 */
-		if (nid < 0)
-			continue;
-		node_set_online(nid);
+		if (nid == NUMA_NO_NODE) {
+			cpu = of_get_cpu_node(i, NULL);
+			BUG_ON(!cpu);
+			nid = of_node_to_nid_single(cpu);
+			of_node_put(cpu);
+		}
+
+		if (likely(nid > 0))
+			node_set_online(nid);
 	}
 
 	get_n_mem_cells(&n_mem_addr_cells, &n_mem_size_cells);
_

Patches currently in -mm which might be from srikar@linux.vnet.ibm.com are



^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memcg-warning-on-memcg-after-readahead-page-charged.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (77 preceding siblings ...)
  2020-08-19  3:39 ` [to-be-updated] powerpc-numa-prefer-node-id-queried-from-vphn.patch " Andrew Morton
@ 2020-08-19  3:50 ` Andrew Morton
  2020-08-19  3:50 ` + mm-memcg-remove-useless-check-on-page-mem_cgroup.patch " Andrew Morton
                   ` (60 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:50 UTC (permalink / raw)
  To: aarcange, alex.shi, guro, hannes, hughd, kirill.shutemov, mhocko,
	mhocko, mm-commits, richard.weiyang, vdavydov.dev, willy


The patch titled
     Subject: mm/memcg: warning on !memcg after readahead page charged
has been added to the -mm tree.  Its filename is
     mm-memcg-warning-on-memcg-after-readahead-page-charged.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcg-warning-on-memcg-after-readahead-page-charged.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/memcg: warning on !memcg after readahead page charged

Since readahead page is charged on memcg too, in theory we don't have to
check this exception now.  Before safely remove them all, add a warning
for the unexpected !memcg.

Link: http://lkml.kernel.org/r/1597144232-11370-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmdebug.h |   13 +++++++++++++
 mm/memcontrol.c         |   15 ++++++++-------
 2 files changed, 21 insertions(+), 7 deletions(-)

--- a/include/linux/mmdebug.h~mm-memcg-warning-on-memcg-after-readahead-page-charged
+++ a/include/linux/mmdebug.h
@@ -37,6 +37,18 @@ void dump_mm(const struct mm_struct *mm)
 			BUG();						\
 		}							\
 	} while (0)
+#define VM_WARN_ON_ONCE_PAGE(cond, page)	({			\
+	static bool __section(.data.once) __warned;			\
+	int __ret_warn_once = !!(cond);					\
+									\
+	if (unlikely(__ret_warn_once && !__warned)) {			\
+		dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\
+		__warned = true;					\
+		WARN_ON(1);						\
+	}								\
+	unlikely(__ret_warn_once);					\
+})
+
 #define VM_WARN_ON(cond) (void)WARN_ON(cond)
 #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
 #define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format)
@@ -48,6 +60,7 @@ void dump_mm(const struct mm_struct *mm)
 #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
 #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
+#define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #endif
--- a/mm/memcontrol.c~mm-memcg-warning-on-memcg-after-readahead-page-charged
+++ a/mm/memcontrol.c
@@ -1322,10 +1322,8 @@ struct lruvec *mem_cgroup_page_lruvec(st
 	}
 
 	memcg = page->mem_cgroup;
-	/*
-	 * Swapcache readahead pages are added to the LRU - and
-	 * possibly migrated - before they are charged.
-	 */
+	/* Readahead page is charged too, to see if other page uncharged */
+	VM_WARN_ON_ONCE_PAGE(!memcg, page);
 	if (!memcg)
 		memcg = root_mem_cgroup;
 
@@ -6906,8 +6904,9 @@ void mem_cgroup_migrate(struct page *old
 	if (newpage->mem_cgroup)
 		return;
 
-	/* Swapcache readahead pages can get replaced before being charged */
 	memcg = oldpage->mem_cgroup;
+	/* Readahead page is charged too, to see if other page uncharged */
+	VM_WARN_ON_ONCE_PAGE(!memcg, oldpage);
 	if (!memcg)
 		return;
 
@@ -7104,7 +7103,8 @@ void mem_cgroup_swapout(struct page *pag
 
 	memcg = page->mem_cgroup;
 
-	/* Readahead page, never charged */
+	/* Readahead page is charged too, to see if other page uncharged */
+	VM_WARN_ON_ONCE_PAGE(!memcg, page);
 	if (!memcg)
 		return;
 
@@ -7168,7 +7168,8 @@ int mem_cgroup_try_charge_swap(struct pa
 
 	memcg = page->mem_cgroup;
 
-	/* Readahead page, never charged */
+	/* Readahead page is charged too, to see if other page uncharged */
+	VM_WARN_ON_ONCE_PAGE(!memcg, page);
 	if (!memcg)
 		return 0;
 
_

Patches currently in -mm which might be from alex.shi@linux.alibaba.com are

mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
mm-memcg-remove-useless-check-on-page-mem_cgroup.patch
mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
mm-thp-clean-up-lru_add_page_tail.patch
mm-thp-remove-code-path-which-never-got-into.patch
mm-thp-narrow-lru-locking.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memcg-remove-useless-check-on-page-mem_cgroup.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (78 preceding siblings ...)
  2020-08-19  3:50 ` + mm-memcg-warning-on-memcg-after-readahead-page-charged.patch added to " Andrew Morton
@ 2020-08-19  3:50 ` Andrew Morton
  2020-08-19  3:50 ` + mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch " Andrew Morton
                   ` (59 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:50 UTC (permalink / raw)
  To: aarcange, alex.shi, guro, hannes, hughd, kirill.shutemov, mhocko,
	mhocko, mm-commits, richard.weiyang, vdavydov.dev, willy


The patch titled
     Subject: mm/memcg: bail out early from swap accounting when memcg is disabled
has been added to the -mm tree.  Its filename is
     mm-memcg-remove-useless-check-on-page-mem_cgroup.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcg-remove-useless-check-on-page-mem_cgroup.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcg-remove-useless-check-on-page-mem_cgroup.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/memcg: bail out early from swap accounting when memcg is disabled

If we disabled memcg by cgroup_disable=memory, page->memcg will be NULL
and so the charge is skipped and that will trigger a warning like below. 
Let's return from the funcs earlier.

 anon flags:0x5005b48008000d(locked|uptodate|dirty|swapbacked)
 raw: 005005b48008000d dead000000000100 dead000000000122 ffff8897c7c76ad1
 raw: 0000000000000022 0000000000000000 0000000200000000 0000000000000000
 page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
...
 RIP: 0010:vprintk_emit+0x1f7/0x260
 Code: 00 84 d2 74 72 0f b6 15 27 58 64 01 48 c7 c0 00 d4 72 82 84 d2 74 09 f3 90 0f b6 10 84 d2 75 f7 e8 de 0d 00 00 4c 89 e7 57 9d <0f> 1f 44 00 00 e9 62 ff ff ff 80 3d 88 c9 3a 01 00 0f 85 54 fe ff
 RSP: 0018:ffffc9000faab358 EFLAGS: 00000202
 RAX: ffffffff8272d400 RBX: 000000000000005e RCX: ffff88afd80d0040
 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000202
 RBP: ffffc9000faab3a8 R08: ffffffff8272d440 R09: 0000000000022480
 R10: 00120c77be68bfac R11: 0000000000cd7568 R12: 0000000000000202
 R13: 0057ffffc0080005 R14: ffffffff820a0130 R15: ffffc9000faab3e8
 ? vprintk_emit+0x140/0x260
 vprintk_default+0x1a/0x20
 vprintk_func+0x4f/0xc4
 ? vprintk_func+0x4f/0xc4
 printk+0x53/0x6a
 ? xas_load+0xc/0x80
 __dump_page.cold.6+0xff/0x4ee
 ? xas_init_marks+0x23/0x50
 ? xas_store+0x30/0x40
 ? free_swap_slot+0x43/0xd0
 ? put_swap_page+0x119/0x320
 ? update_load_avg+0x82/0x580
 dump_page+0x9/0xb
 mem_cgroup_try_charge_swap+0x16e/0x1d0
 get_swap_page+0x130/0x210
 add_to_swap+0x41/0xc0
 shrink_page_list+0x99e/0xdf0
 shrink_inactive_list+0x199/0x360
 shrink_lruvec+0x40d/0x650
 ? _cond_resched+0x14/0x30
 ? _cond_resched+0x14/0x30
 shrink_node+0x226/0x6e0
 do_try_to_free_pages+0xd0/0x400
 try_to_free_pages+0xef/0x130
 __alloc_pages_slowpath.constprop.127+0x38d/0xbd0
 ? ___slab_alloc+0x31d/0x6f0
 __alloc_pages_nodemask+0x27f/0x2c0
 alloc_pages_vma+0x75/0x220
 shmem_alloc_page+0x46/0x90
 ? release_pages+0x1ae/0x410
 shmem_alloc_and_acct_page+0x77/0x1c0
 shmem_getpage_gfp+0x162/0x910
 shmem_fault+0x74/0x210
 ? filemap_map_pages+0x29c/0x410
 __do_fault+0x37/0x190
 handle_mm_fault+0x120a/0x1770
 exc_page_fault+0x251/0x450
 ? asm_exc_page_fault+0x8/0x30
 asm_exc_page_fault+0x1e/0x30

Link: http://lkml.kernel.org/r/abd9276b-31f8-a51d-43d6-6c8ae93237dc@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/mm/memcontrol.c~mm-memcg-remove-useless-check-on-page-mem_cgroup
+++ a/mm/memcontrol.c
@@ -7098,6 +7098,9 @@ void mem_cgroup_swapout(struct page *pag
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 	VM_BUG_ON_PAGE(page_count(page), page);
 
+	if (mem_cgroup_disabled())
+		return;
+
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return;
 
@@ -7163,6 +7166,9 @@ int mem_cgroup_try_charge_swap(struct pa
 	struct mem_cgroup *memcg;
 	unsigned short oldid;
 
+	if (mem_cgroup_disabled())
+		return 0;
+
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return 0;
 
_

Patches currently in -mm which might be from alex.shi@linux.alibaba.com are

mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
mm-memcg-remove-useless-check-on-page-mem_cgroup.patch
mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
mm-thp-clean-up-lru_add_page_tail.patch
mm-thp-remove-code-path-which-never-got-into.patch
mm-thp-narrow-lru-locking.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (79 preceding siblings ...)
  2020-08-19  3:50 ` + mm-memcg-remove-useless-check-on-page-mem_cgroup.patch " Andrew Morton
@ 2020-08-19  3:50 ` Andrew Morton
  2020-08-19  3:50 ` + mm-thp-clean-up-lru_add_page_tail.patch " Andrew Morton
                   ` (58 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:50 UTC (permalink / raw)
  To: aarcange, alex.shi, guro, hannes, hughd, kirill.shutemov, mhocko,
	mhocko, mm-commits, richard.weiyang, vdavydov.dev, willy


The patch titled
     Subject: mm/thp: move lru_add_page_tail func to huge_memory.c
has been added to the -mm tree.  Its filename is
     mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/thp: move lru_add_page_tail func to huge_memory.c

The function is only used in huge_memory.c, defining it in other file with
a CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird.

Let's move it THP. And make it static as Hugh Dickin suggested.

Link: http://lkml.kernel.org/r/1597144232-11370-3-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/swap.h |    2 --
 mm/huge_memory.c     |   30 ++++++++++++++++++++++++++++++
 mm/swap.c            |   33 ---------------------------------
 3 files changed, 30 insertions(+), 35 deletions(-)

--- a/include/linux/swap.h~mm-thp-move-lru_add_page_tail-func-to-huge_memoryc
+++ a/include/linux/swap.h
@@ -338,8 +338,6 @@ extern void lru_note_cost(struct lruvec
 			  unsigned int nr_pages);
 extern void lru_note_cost_page(struct page *);
 extern void lru_cache_add(struct page *);
-extern void lru_add_page_tail(struct page *page, struct page *page_tail,
-			 struct lruvec *lruvec, struct list_head *head);
 extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
--- a/mm/huge_memory.c~mm-thp-move-lru_add_page_tail-func-to-huge_memoryc
+++ a/mm/huge_memory.c
@@ -2313,6 +2313,36 @@ static void remap_page(struct page *page
 	}
 }
 
+static void lru_add_page_tail(struct page *page, struct page *page_tail,
+				struct lruvec *lruvec, struct list_head *list)
+{
+	VM_BUG_ON_PAGE(!PageHead(page), page);
+	VM_BUG_ON_PAGE(PageCompound(page_tail), page);
+	VM_BUG_ON_PAGE(PageLRU(page_tail), page);
+	lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock);
+
+	if (!list)
+		SetPageLRU(page_tail);
+
+	if (likely(PageLRU(page)))
+		list_add_tail(&page_tail->lru, &page->lru);
+	else if (list) {
+		/* page reclaim is reclaiming a huge page */
+		get_page(page_tail);
+		list_add_tail(&page_tail->lru, list);
+	} else {
+		/*
+		 * Head page has not yet been counted, as an hpage,
+		 * so we must account for each subpage individually.
+		 *
+		 * Put page_tail on the list at the correct position
+		 * so they all end up in order.
+		 */
+		add_page_to_lru_list_tail(page_tail, lruvec,
+					  page_lru(page_tail));
+	}
+}
+
 static void __split_huge_page_tail(struct page *head, int tail,
 		struct lruvec *lruvec, struct list_head *list)
 {
--- a/mm/swap.c~mm-thp-move-lru_add_page_tail-func-to-huge_memoryc
+++ a/mm/swap.c
@@ -930,39 +930,6 @@ void __pagevec_release(struct pagevec *p
 }
 EXPORT_SYMBOL(__pagevec_release);
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/* used by __split_huge_page_refcount() */
-void lru_add_page_tail(struct page *page, struct page *page_tail,
-		       struct lruvec *lruvec, struct list_head *list)
-{
-	VM_BUG_ON_PAGE(!PageHead(page), page);
-	VM_BUG_ON_PAGE(PageCompound(page_tail), page);
-	VM_BUG_ON_PAGE(PageLRU(page_tail), page);
-	lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock);
-
-	if (!list)
-		SetPageLRU(page_tail);
-
-	if (likely(PageLRU(page)))
-		list_add_tail(&page_tail->lru, &page->lru);
-	else if (list) {
-		/* page reclaim is reclaiming a huge page */
-		get_page(page_tail);
-		list_add_tail(&page_tail->lru, list);
-	} else {
-		/*
-		 * Head page has not yet been counted, as an hpage,
-		 * so we must account for each subpage individually.
-		 *
-		 * Put page_tail on the list at the correct position
-		 * so they all end up in order.
-		 */
-		add_page_to_lru_list_tail(page_tail, lruvec,
-					  page_lru(page_tail));
-	}
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-
 static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
 				 void *arg)
 {
_

Patches currently in -mm which might be from alex.shi@linux.alibaba.com are

mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
mm-memcg-remove-useless-check-on-page-mem_cgroup.patch
mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
mm-thp-clean-up-lru_add_page_tail.patch
mm-thp-remove-code-path-which-never-got-into.patch
mm-thp-narrow-lru-locking.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-thp-clean-up-lru_add_page_tail.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (80 preceding siblings ...)
  2020-08-19  3:50 ` + mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch " Andrew Morton
@ 2020-08-19  3:50 ` Andrew Morton
  2020-08-19  3:50 ` + mm-thp-remove-code-path-which-never-got-into.patch " Andrew Morton
                   ` (57 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:50 UTC (permalink / raw)
  To: aarcange, alex.shi, guro, hannes, hughd, kirill.shutemov, mhocko,
	mhocko, mm-commits, richard.weiyang, vdavydov.dev, willy


The patch titled
     Subject: mm/thp: clean up lru_add_page_tail
has been added to the -mm tree.  Its filename is
     mm-thp-clean-up-lru_add_page_tail.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-clean-up-lru_add_page_tail.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-clean-up-lru_add_page_tail.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/thp: clean up lru_add_page_tail

Since the first parameter is only used by head page, it's better to make
it explicit.

Link: http://lkml.kernel.org/r/1597144232-11370-4-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/mm/huge_memory.c~mm-thp-clean-up-lru_add_page_tail
+++ a/mm/huge_memory.c
@@ -2313,19 +2313,19 @@ static void remap_page(struct page *page
 	}
 }
 
-static void lru_add_page_tail(struct page *page, struct page *page_tail,
+static void lru_add_page_tail(struct page *head, struct page *page_tail,
 				struct lruvec *lruvec, struct list_head *list)
 {
-	VM_BUG_ON_PAGE(!PageHead(page), page);
-	VM_BUG_ON_PAGE(PageCompound(page_tail), page);
-	VM_BUG_ON_PAGE(PageLRU(page_tail), page);
+	VM_BUG_ON_PAGE(!PageHead(head), head);
+	VM_BUG_ON_PAGE(PageCompound(page_tail), head);
+	VM_BUG_ON_PAGE(PageLRU(page_tail), head);
 	lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock);
 
 	if (!list)
 		SetPageLRU(page_tail);
 
-	if (likely(PageLRU(page)))
-		list_add_tail(&page_tail->lru, &page->lru);
+	if (likely(PageLRU(head)))
+		list_add_tail(&page_tail->lru, &head->lru);
 	else if (list) {
 		/* page reclaim is reclaiming a huge page */
 		get_page(page_tail);
_

Patches currently in -mm which might be from alex.shi@linux.alibaba.com are

mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
mm-memcg-remove-useless-check-on-page-mem_cgroup.patch
mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
mm-thp-clean-up-lru_add_page_tail.patch
mm-thp-remove-code-path-which-never-got-into.patch
mm-thp-narrow-lru-locking.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-thp-remove-code-path-which-never-got-into.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (81 preceding siblings ...)
  2020-08-19  3:50 ` + mm-thp-clean-up-lru_add_page_tail.patch " Andrew Morton
@ 2020-08-19  3:50 ` Andrew Morton
  2020-08-19  3:50 ` + mm-thp-narrow-lru-locking.patch " Andrew Morton
                   ` (56 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:50 UTC (permalink / raw)
  To: aarcange, alex.shi, guro, hannes, hughd, kirill.shutemov, mhocko,
	mhocko, mm-commits, richard.weiyang, vdavydov.dev, willy


The patch titled
     Subject: mm/thp: remove code path which never got into
has been added to the -mm tree.  Its filename is
     mm-thp-remove-code-path-which-never-got-into.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-remove-code-path-which-never-got-into.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-remove-code-path-which-never-got-into.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/thp: remove code path which never got into

split_huge_page() will never call on a page which isn't on lru list, so
this code never got a chance to run, and should not be run, to add tail
pages on a lru list which head page isn't there.

Although the bug was never triggered, it'better be removed for code
correctness, and add a warn for unexpected calling.

Link: http://lkml.kernel.org/r/1597144232-11370-5-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |   13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

--- a/mm/huge_memory.c~mm-thp-remove-code-path-which-never-got-into
+++ a/mm/huge_memory.c
@@ -2330,17 +2330,8 @@ static void lru_add_page_tail(struct pag
 		/* page reclaim is reclaiming a huge page */
 		get_page(page_tail);
 		list_add_tail(&page_tail->lru, list);
-	} else {
-		/*
-		 * Head page has not yet been counted, as an hpage,
-		 * so we must account for each subpage individually.
-		 *
-		 * Put page_tail on the list at the correct position
-		 * so they all end up in order.
-		 */
-		add_page_to_lru_list_tail(page_tail, lruvec,
-					  page_lru(page_tail));
-	}
+	} else
+		VM_WARN_ON(!PageLRU(head));
 }
 
 static void __split_huge_page_tail(struct page *head, int tail,
_

Patches currently in -mm which might be from alex.shi@linux.alibaba.com are

mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
mm-memcg-remove-useless-check-on-page-mem_cgroup.patch
mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
mm-thp-clean-up-lru_add_page_tail.patch
mm-thp-remove-code-path-which-never-got-into.patch
mm-thp-narrow-lru-locking.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-thp-narrow-lru-locking.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (82 preceding siblings ...)
  2020-08-19  3:50 ` + mm-thp-remove-code-path-which-never-got-into.patch " Andrew Morton
@ 2020-08-19  3:50 ` Andrew Morton
  2020-08-19  3:56 ` + mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch " Andrew Morton
                   ` (55 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:50 UTC (permalink / raw)
  To: aarcange, alex.shi, guro, hannes, hughd, kirill.shutemov, mhocko,
	mhocko, mm-commits, richard.weiyang, vdavydov.dev, willy


The patch titled
     Subject: mm/thp: narrow lru locking
has been added to the -mm tree.  Its filename is
     mm-thp-narrow-lru-locking.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-narrow-lru-locking.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-narrow-lru-locking.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/thp: narrow lru locking

lru_lock and page cache xa_lock have no reason with current sequence, put
them together isn't necessary.  let's narrow the lru locking, but left the
local_irq_disable to block interrupt re-entry and statistic update.

Hugh Dickins point: split_huge_page_to_list() was already silly,to be
using the _irqsave variant: it's just been taking sleeping locks, so would
already be broken if entered with interrupts enabled.  so we can save
passing flags argument down to __split_huge_page().

Link: http://lkml.kernel.org/r/1597144232-11370-6-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |   25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

--- a/mm/huge_memory.c~mm-thp-narrow-lru-locking
+++ a/mm/huge_memory.c
@@ -2397,7 +2397,7 @@ static void __split_huge_page_tail(struc
 }
 
 static void __split_huge_page(struct page *page, struct list_head *list,
-		pgoff_t end, unsigned long flags)
+			      pgoff_t end)
 {
 	struct page *head = compound_head(page);
 	pg_data_t *pgdat = page_pgdat(head);
@@ -2406,8 +2406,6 @@ static void __split_huge_page(struct pag
 	unsigned long offset = 0;
 	int i;
 
-	lruvec = mem_cgroup_page_lruvec(head, pgdat);
-
 	/* complete memcg works before add pages to LRU */
 	mem_cgroup_split_huge_fixup(head);
 
@@ -2419,6 +2417,11 @@ static void __split_huge_page(struct pag
 		xa_lock(&swap_cache->i_pages);
 	}
 
+	/* prevent PageLRU to go away from under us, and freeze lru stats */
+	spin_lock(&pgdat->lru_lock);
+
+	lruvec = mem_cgroup_page_lruvec(head, pgdat);
+
 	for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
 		__split_huge_page_tail(head, i, lruvec, list);
 		/* Some pages can be beyond i_size: drop them from page cache */
@@ -2438,6 +2441,8 @@ static void __split_huge_page(struct pag
 	}
 
 	ClearPageCompound(head);
+	spin_unlock(&pgdat->lru_lock);
+	/* Caller disabled irqs, so they are still disabled here */
 
 	split_page_owner(head, HPAGE_PMD_ORDER);
 
@@ -2455,8 +2460,7 @@ static void __split_huge_page(struct pag
 		page_ref_add(head, 2);
 		xa_unlock(&head->mapping->i_pages);
 	}
-
-	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+	local_irq_enable();
 
 	remap_page(head);
 
@@ -2595,12 +2599,10 @@ bool can_split_huge_page(struct page *pa
 int split_huge_page_to_list(struct page *page, struct list_head *list)
 {
 	struct page *head = compound_head(page);
-	struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
 	struct deferred_split *ds_queue = get_deferred_split_queue(head);
 	struct anon_vma *anon_vma = NULL;
 	struct address_space *mapping = NULL;
 	int count, mapcount, extra_pins, ret;
-	unsigned long flags;
 	pgoff_t end;
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2661,9 +2663,8 @@ int split_huge_page_to_list(struct page
 	unmap_page(head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
-	/* prevent PageLRU to go away from under us, and freeze lru stats */
-	spin_lock_irqsave(&pgdata->lru_lock, flags);
-
+	/* block interrupt reentry in xa_lock and spinlock */
+	local_irq_disable();
 	if (mapping) {
 		XA_STATE(xas, &mapping->i_pages, page_index(head));
 
@@ -2693,7 +2694,7 @@ int split_huge_page_to_list(struct page
 				__dec_node_page_state(head, NR_FILE_THPS);
 		}
 
-		__split_huge_page(page, list, end, flags);
+		__split_huge_page(page, list, end);
 		if (PageSwapCache(head)) {
 			swp_entry_t entry = { .val = page_private(head) };
 
@@ -2712,7 +2713,7 @@ int split_huge_page_to_list(struct page
 		spin_unlock(&ds_queue->split_queue_lock);
 fail:		if (mapping)
 			xa_unlock(&mapping->i_pages);
-		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
+		local_irq_enable();
 		remap_page(head);
 		ret = -EBUSY;
 	}
_

Patches currently in -mm which might be from alex.shi@linux.alibaba.com are

mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
mm-memcg-remove-useless-check-on-page-mem_cgroup.patch
mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
mm-thp-clean-up-lru_add_page_tail.patch
mm-thp-remove-code-path-which-never-got-into.patch
mm-thp-narrow-lru-locking.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (83 preceding siblings ...)
  2020-08-19  3:50 ` + mm-thp-narrow-lru-locking.patch " Andrew Morton
@ 2020-08-19  3:56 ` Andrew Morton
  2020-08-19 17:20 ` + mm-mmap-add-inline-munmap_vma_range-for-code-readability.patch " Andrew Morton
                   ` (54 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19  3:56 UTC (permalink / raw)
  To: cl, hewenliang4, hushiyuan, iamjoonsoo.kim, mm-commits, penberg,
	rientjes, wuyun.wu


The patch titled
     Subject: mm/slub: fix missing ALLOC_SLOWPATH stat when bulk alloc
has been added to the -mm tree.  Its filename is
     mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Abel Wu <wuyun.wu@huawei.com>
Subject: mm/slub: fix missing ALLOC_SLOWPATH stat when bulk alloc

The ALLOC_SLOWPATH statistics is missing in bulk allocation now.  Fix it
by doing statistics in alloc slow path.

Link: http://lkml.kernel.org/r/20200811022427.1363-1-wuyun.wu@huawei.com
Signed-off-by: Abel Wu <wuyun.wu@huawei.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/slub.c~mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc
+++ a/mm/slub.c
@@ -2666,6 +2666,8 @@ static void *___slab_alloc(struct kmem_c
 	void *freelist;
 	struct page *page;
 
+	stat(s, ALLOC_SLOWPATH);
+
 	page = c->page;
 	if (!page) {
 		/*
@@ -2855,7 +2857,6 @@ redo:
 	page = c->page;
 	if (unlikely(!object || !node_match(page, node))) {
 		object = __slab_alloc(s, gfpflags, node, addr, c);
-		stat(s, ALLOC_SLOWPATH);
 	} else {
 		void *next_object = get_freepointer_safe(s, object);
 
_

Patches currently in -mm which might be from wuyun.wu@huawei.com are

mm-slub-branch-optimization-in-free-slowpath.patch
mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-mmap-add-inline-munmap_vma_range-for-code-readability.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (84 preceding siblings ...)
  2020-08-19  3:56 ` + mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch " Andrew Morton
@ 2020-08-19 17:20 ` Andrew Morton
  2020-08-19 17:20 ` + mm-mmap-add-inline-vma_next-for-readability-of-mmap-code.patch " Andrew Morton
                   ` (53 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 17:20 UTC (permalink / raw)
  To: akpm, Liam.Howlett, mm-commits


The patch titled
     Subject: mm/mmap: add inline munmap_vma_range() for code readability
has been added to the -mm tree.  Its filename is
     mm-mmap-add-inline-munmap_vma_range-for-code-readability.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-mmap-add-inline-munmap_vma_range-for-code-readability.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-mmap-add-inline-munmap_vma_range-for-code-readability.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
Subject: mm/mmap: add inline munmap_vma_range() for code readability

There are two locations that have a block of code for munmapping a vma
range.  Change those two locations to use a function and add meaningful
comments about what happens to the arguments, which was unclear in the
previous code.

Link: http://lkml.kernel.org/r/20200818154707.2515169-2-Liam.Howlett@Oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mmap.c |   48 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 15 deletions(-)

--- a/mm/mmap.c~mm-mmap-add-inline-munmap_vma_range-for-code-readability
+++ a/mm/mmap.c
@@ -577,6 +577,33 @@ static inline struct vm_area_struct *vma
 
 	return vma->vm_next;
 }
+
+/*
+ * munmap_vma_range() - munmap VMAs that overlap a range.
+ * @mm: The mm struct
+ * @start: The start of the range.
+ * @len: The length of the range.
+ * @pprev: pointer to the pointer that will be set to previous vm_area_struct
+ * @rb_link: the rb_node
+ * @rb_parent: the parent rb_node
+ *
+ * Find all the vm_area_struct that overlap from @start to
+ * @end and munmap them.  Set @pprev to the previous vm_area_struct.
+ *
+ * Returns: -ENOMEM on munmap failure or 0 on success.
+ */
+static inline int
+munmap_vma_range(struct mm_struct *mm, unsigned long start, unsigned long len,
+		 struct vm_area_struct **pprev, struct rb_node ***link,
+		 struct rb_node **parent, struct list_head *uf)
+{
+
+	while (find_vma_links(mm, start, start + len, pprev, link, parent))
+		if (do_munmap(mm, start, len, uf))
+			return -ENOMEM;
+
+	return 0;
+}
 static unsigned long count_vma_pages_range(struct mm_struct *mm,
 		unsigned long addr, unsigned long end)
 {
@@ -1724,13 +1751,9 @@ unsigned long mmap_region(struct file *f
 			return -ENOMEM;
 	}
 
-	/* Clear old maps */
-	while (find_vma_links(mm, addr, addr + len, &prev, &rb_link,
-			      &rb_parent)) {
-		if (do_munmap(mm, addr, len, uf))
-			return -ENOMEM;
-	}
-
+	/* Clear old maps, set up prev, rb_link, rb_parent, and uf */
+	if (munmap_vma_range(mm, addr, len, &prev, &rb_link, &rb_parent, uf))
+		return -ENOMEM;
 	/*
 	 * Private writable mapping: check memory availability
 	 */
@@ -3071,14 +3094,9 @@ static int do_brk_flags(unsigned long ad
 	if (error)
 		return error;
 
-	/*
-	 * Clear old maps.  this also does some error checking for us
-	 */
-	while (find_vma_links(mm, addr, addr + len, &prev, &rb_link,
-			      &rb_parent)) {
-		if (do_munmap(mm, addr, len, uf))
-			return -ENOMEM;
-	}
+	/* Clear old maps, set up prev, rb_link, rb_parent, and uf */
+	if (munmap_vma_range(mm, addr, len, &prev, &rb_link, &rb_parent, uf))
+		return -ENOMEM;
 
 	/* Check against address space limits *after* clearing old maps... */
 	if (!may_expand_vm(mm, flags, len >> PAGE_SHIFT))
_

Patches currently in -mm which might be from Liam.Howlett@Oracle.com are

mm-mmap-add-inline-vma_next-for-readability-of-mmap-code.patch
mm-mmap-add-inline-munmap_vma_range-for-code-readability.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-mmap-add-inline-vma_next-for-readability-of-mmap-code.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (85 preceding siblings ...)
  2020-08-19 17:20 ` + mm-mmap-add-inline-munmap_vma_range-for-code-readability.patch " Andrew Morton
@ 2020-08-19 17:20 ` Andrew Morton
  2020-08-19 17:47 ` + mm-gup-dont-permit-users-to-call-get_user_pages-with-foll_longterm.patch " Andrew Morton
                   ` (52 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 17:20 UTC (permalink / raw)
  To: akpm, Liam.Howlett, mm-commits


The patch titled
     Subject: mm/mmap: add inline vma_next() for readability of mmap code
has been added to the -mm tree.  Its filename is
     mm-mmap-add-inline-vma_next-for-readability-of-mmap-code.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-mmap-add-inline-vma_next-for-readability-of-mmap-code.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-mmap-add-inline-vma_next-for-readability-of-mmap-code.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
Subject: mm/mmap: add inline vma_next() for readability of mmap code

There are three places that the next vma is required which uses the same
block of code.  Replace the block with a function and add comments on what
happens in the case where NULL is encountered.

Link: http://lkml.kernel.org/r/20200818154707.2515169-1-Liam.Howlett@Oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mmap.c |   26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

--- a/mm/mmap.c~mm-mmap-add-inline-vma_next-for-readability-of-mmap-code
+++ a/mm/mmap.c
@@ -560,6 +560,23 @@ static int find_vma_links(struct mm_stru
 	return 0;
 }
 
+/*
+ * vma_next() - Get the next VMA.
+ * @mm: The mm_struct.
+ * @vma: The current vma.
+ *
+ * If @vma is NULL, return the first vma in the mm.
+ *
+ * Returns: The next VMA after @vma.
+ */
+static inline struct vm_area_struct *vma_next(struct mm_struct *mm,
+					 struct vm_area_struct *vma)
+{
+	if (!vma)
+		return mm->mmap;
+
+	return vma->vm_next;
+}
 static unsigned long count_vma_pages_range(struct mm_struct *mm,
 		unsigned long addr, unsigned long end)
 {
@@ -1131,10 +1148,7 @@ struct vm_area_struct *vma_merge(struct
 	if (vm_flags & VM_SPECIAL)
 		return NULL;
 
-	if (prev)
-		next = prev->vm_next;
-	else
-		next = mm->mmap;
+	next = vma_next(mm, prev);
 	area = next;
 	if (area && area->vm_end == end)		/* cases 6, 7, 8 */
 		next = next->vm_next;
@@ -2640,7 +2654,7 @@ static void unmap_region(struct mm_struc
 		struct vm_area_struct *vma, struct vm_area_struct *prev,
 		unsigned long start, unsigned long end)
 {
-	struct vm_area_struct *next = prev ? prev->vm_next : mm->mmap;
+	struct vm_area_struct *next = vma_next(mm, prev);
 	struct mmu_gather tlb;
 
 	lru_add_drain();
@@ -2839,7 +2853,7 @@ int __do_munmap(struct mm_struct *mm, un
 		if (error)
 			return error;
 	}
-	vma = prev ? prev->vm_next : mm->mmap;
+	vma = vma_next(mm, prev);
 
 	if (unlikely(uf)) {
 		/*
_

Patches currently in -mm which might be from Liam.Howlett@Oracle.com are

mm-mmap-add-inline-vma_next-for-readability-of-mmap-code.patch
mm-mmap-add-inline-munmap_vma_range-for-code-readability.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-gup-dont-permit-users-to-call-get_user_pages-with-foll_longterm.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (86 preceding siblings ...)
  2020-08-19 17:20 ` + mm-mmap-add-inline-vma_next-for-readability-of-mmap-code.patch " Andrew Morton
@ 2020-08-19 17:47 ` Andrew Morton
  2020-08-19 18:20 ` + mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch " Andrew Morton
                   ` (51 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 17:47 UTC (permalink / raw)
  To: corbet, dan.j.williams, david, hch, jack, jgg, jglisse, jhubbard,
	mhocko, mike.kravetz, mm-commits, shuah, song.bao.hua, vbabka,
	viro, willy


The patch titled
     Subject: mm/gup: don't permit users to call get_user_pages with FOLL_LONGTERM
has been added to the -mm tree.  Its filename is
     mm-gup-dont-permit-users-to-call-get_user_pages-with-foll_longterm.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-gup-dont-permit-users-to-call-get_user_pages-with-foll_longterm.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-gup-dont-permit-users-to-call-get_user_pages-with-foll_longterm.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Barry Song <song.bao.hua@hisilicon.com>
Subject: mm/gup: don't permit users to call get_user_pages with FOLL_LONGTERM

gup prohibits users from calling get_user_pages() with FOLL_PIN.  But it
allows users to call get_user_pages() with FOLL_LONGTERM only.  It seems
insensible.

Since FOLL_LONGTERM is a stricter case of FOLL_PIN, we should prohibit
users from calling get_user_pages() with FOLL_LONGTERM while not with
FOLL_PIN.

mm/gup_benchmark.c used to be the only user who did this improperly.
But it has been fixed by moving to use pin_user_pages().

Link: http://lkml.kernel.org/r/20200819110100.23504-1-song.bao.hua@hisilicon.com
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/gup.c |   37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

--- a/mm/gup.c~mm-gup-dont-permit-users-to-call-get_user_pages-with-foll_longterm
+++ a/mm/gup.c
@@ -1789,6 +1789,25 @@ static long __get_user_pages_remote(stru
 				       gup_flags | FOLL_TOUCH | FOLL_REMOTE);
 }
 
+static bool is_valid_gup_flags(unsigned int gup_flags)
+{
+	/*
+	 * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
+	 * never directly by the caller, so enforce that with an assertion:
+	 */
+	if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
+		return false;
+	/*
+	 * FOLL_PIN is a prerequisite to FOLL_LONGTERM. Another way of saying
+	 * that is, FOLL_LONGTERM is a specific case, more restrictive case of
+	 * FOLL_PIN.
+	 */
+	if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
+		return false;
+
+	return true;
+}
+
 /**
  * get_user_pages_remote() - pin user pages in memory
  * @mm:		mm_struct of target mm
@@ -1854,11 +1873,7 @@ long get_user_pages_remote(struct mm_str
 		unsigned int gup_flags, struct page **pages,
 		struct vm_area_struct **vmas, int *locked)
 {
-	/*
-	 * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
-	 * never directly by the caller, so enforce that with an assertion:
-	 */
-	if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
+	if (!is_valid_gup_flags(gup_flags))
 		return -EINVAL;
 
 	return __get_user_pages_remote(mm, start, nr_pages, gup_flags,
@@ -1904,11 +1919,7 @@ long get_user_pages(unsigned long start,
 		unsigned int gup_flags, struct page **pages,
 		struct vm_area_struct **vmas)
 {
-	/*
-	 * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
-	 * never directly by the caller, so enforce that with an assertion:
-	 */
-	if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
+	if (!is_valid_gup_flags(gup_flags))
 		return -EINVAL;
 
 	return __gup_longterm_locked(current->mm, start, nr_pages,
@@ -2810,11 +2821,7 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast_on
 int get_user_pages_fast(unsigned long start, int nr_pages,
 			unsigned int gup_flags, struct page **pages)
 {
-	/*
-	 * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
-	 * never directly by the caller, so enforce that:
-	 */
-	if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
+	if (!is_valid_gup_flags(gup_flags))
 		return -EINVAL;
 
 	/*
_

Patches currently in -mm which might be from song.bao.hua@hisilicon.com are

mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag.patch
mm-gup-dont-permit-users-to-call-get_user_pages-with-foll_longterm.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (87 preceding siblings ...)
  2020-08-19 17:47 ` + mm-gup-dont-permit-users-to-call-get_user_pages-with-foll_longterm.patch " Andrew Morton
@ 2020-08-19 18:20 ` Andrew Morton
  2020-08-19 18:20 ` + mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch " Andrew Morton
                   ` (50 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:20 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm/memory_hotplug: inline __offline_pages() into offline_pages()
has been added to the -mm tree.  Its filename is
     mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: inline __offline_pages() into offline_pages()

Patch series "mm/memory_hotplug: online_pages()/offline_pages() cleanups", v2.

These are a bunch of cleanups for online_pages()/offline_pages() and
related code, mostly getting rid of memory hole handling that is no longer
necessary.  There is only a single walk_system_ram_range() call left in
offline_pages(), to make sure we don't have any memory holes.  I had some
of these patches lying around for a longer time but didn't have time to
polish them.

In addition, the last patch marks all pageblocks of memory to get onlined
MIGRATE_ISOLATE, so pages that have just been exposed to the buddy cannot
get allocated before onlining is complete.  Once heavy lifting is done,
the pageblocks are set to MIGRATE_MOVABLE, such that allocations are
possible.

I played with DIMMs and virtio-mem on x86-64 and didn't spot any
surprises.  I verified that the numer of isolated pageblocks is correctly
handled when onlining/offlining.


This patch (of 10):

There is only a single user, offline_pages(). Let's inline, to make
it look more similar to online_pages().

Link: https://lkml.kernel.org/r/20200819175957.28465-1-david@redhat.com
Link: https://lkml.kernel.org/r/20200819175957.28465-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |   16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-inline-__offline_pages-into-offline_pages
+++ a/mm/memory_hotplug.c
@@ -1475,11 +1475,10 @@ static int count_system_ram_pages_cb(uns
 	return 0;
 }
 
-static int __ref __offline_pages(unsigned long start_pfn,
-		  unsigned long end_pfn)
+int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
-	unsigned long pfn, nr_pages = 0;
-	unsigned long offlined_pages = 0;
+	const unsigned long end_pfn = start_pfn + nr_pages;
+	unsigned long pfn, system_ram_pages = 0, offlined_pages = 0;
 	int ret, node, nr_isolate_pageblock;
 	unsigned long flags;
 	struct zone *zone;
@@ -1496,9 +1495,9 @@ static int __ref __offline_pages(unsigne
 	 * memory holes PG_reserved, don't need pfn_valid() checks, and can
 	 * avoid using walk_system_ram_range() later.
 	 */
-	walk_system_ram_range(start_pfn, end_pfn - start_pfn, &nr_pages,
+	walk_system_ram_range(start_pfn, nr_pages, &system_ram_pages,
 			      count_system_ram_pages_cb);
-	if (nr_pages != end_pfn - start_pfn) {
+	if (system_ram_pages != nr_pages) {
 		ret = -EINVAL;
 		reason = "memory holes";
 		goto failed_removal;
@@ -1633,11 +1632,6 @@ failed_removal:
 	return ret;
 }
 
-int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
-{
-	return __offline_pages(start_pfn, start_pfn + nr_pages);
-}
-
 static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 {
 	int ret = !is_memblock_offlined(mem);
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (88 preceding siblings ...)
  2020-08-19 18:20 ` + mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch " Andrew Morton
@ 2020-08-19 18:20 ` Andrew Morton
  2020-08-19 18:20 ` + mm-memory_hotplug-simplify-page-offlining.patch " Andrew Morton
                   ` (49 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:20 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm/memory_hotplug: enforce section granularity when onlining/offlining
has been added to the -mm tree.  Its filename is
     mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: enforce section granularity when onlining/offlining

Already two people (including me) tried to offline subsections, because
the function looks like it can deal with it.  But we really can only
online/offline full sections that are properly aligned (e.g., we can only
mark full sections online/offline via SECTION_IS_ONLINE).

Add a simple safety net to document the restriction now.  Current users
(core and powernv/memtrace) respect these restrictions.

Link: https://lkml.kernel.org/r/20200819175957.28465-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining
+++ a/mm/memory_hotplug.c
@@ -801,6 +801,11 @@ int __ref online_pages(unsigned long pfn
 	int ret;
 	struct memory_notify arg;
 
+	/* We can only online full sections (e.g., SECTION_IS_ONLINE) */
+	if (WARN_ON_ONCE(!nr_pages ||
+			 !IS_ALIGNED(pfn | nr_pages, PAGES_PER_SECTION)))
+		return -EINVAL;
+
 	mem_hotplug_begin();
 
 	/* associate pfn range with the zone */
@@ -1485,6 +1490,11 @@ int __ref offline_pages(unsigned long st
 	struct memory_notify arg;
 	char *reason;
 
+	/* We can only offline full sections (e.g., SECTION_IS_ONLINE) */
+	if (WARN_ON_ONCE(!nr_pages ||
+			 !IS_ALIGNED(start_pfn | nr_pages, PAGES_PER_SECTION)))
+		return -EINVAL;
+
 	mem_hotplug_begin();
 
 	/*
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memory_hotplug-simplify-page-offlining.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (89 preceding siblings ...)
  2020-08-19 18:20 ` + mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch " Andrew Morton
@ 2020-08-19 18:20 ` Andrew Morton
  2020-08-19 18:20 ` + mm-page_alloc-simplify-__offline_isolated_pages.patch " Andrew Morton
                   ` (48 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:20 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm/memory_hotplug: simplify page offlining
has been added to the -mm tree.  Its filename is
     mm-memory_hotplug-simplify-page-offlining.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-simplify-page-offlining.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-simplify-page-offlining.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: simplify page offlining

We make sure that we cannot have any memory holes right at the beginning
of offline_pages().  We no longer need walk_system_ram_range() and can
call test_pages_isolated() and __offline_isolated_pages() directly.

offlined_pages always corresponds to nr_pages, so we can simplify that.

Link: https://lkml.kernel.org/r/20200819175957.28465-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |   46 +++++++++---------------------------------
 1 file changed, 10 insertions(+), 36 deletions(-)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-simplify-page-offlining
+++ a/mm/memory_hotplug.c
@@ -1375,28 +1375,6 @@ do_migrate_range(unsigned long start_pfn
 	return ret;
 }
 
-/* Mark all sections offline and remove all free pages from the buddy. */
-static int
-offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages,
-			void *data)
-{
-	unsigned long *offlined_pages = (unsigned long *)data;
-
-	*offlined_pages += __offline_isolated_pages(start, start + nr_pages);
-	return 0;
-}
-
-/*
- * Check all pages in range, recorded as memory resource, are isolated.
- */
-static int
-check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
-			void *data)
-{
-	return test_pages_isolated(start_pfn, start_pfn + nr_pages,
-				   MEMORY_OFFLINE);
-}
-
 static int __init cmdline_parse_movable_node(char *p)
 {
 	movable_node_enabled = true;
@@ -1483,7 +1461,7 @@ static int count_system_ram_pages_cb(uns
 int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
 	const unsigned long end_pfn = start_pfn + nr_pages;
-	unsigned long pfn, system_ram_pages = 0, offlined_pages = 0;
+	unsigned long pfn, system_ram_pages = 0;
 	int ret, node, nr_isolate_pageblock;
 	unsigned long flags;
 	struct zone *zone;
@@ -1581,16 +1559,12 @@ int __ref offline_pages(unsigned long st
 			reason = "failure to dissolve huge pages";
 			goto failed_removal_isolated;
 		}
-		/* check again */
-		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
-					    NULL, check_pages_isolated_cb);
-	} while (ret);
-
-	/* Ok, all of our target is isolated.
-	   We cannot do rollback at this point. */
-	walk_system_ram_range(start_pfn, end_pfn - start_pfn,
-			      &offlined_pages, offline_isolated_pages_cb);
-	pr_info("Offlined Pages %ld\n", offlined_pages);
+	} while (test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE));
+
+	/* Mark all sections offline and remove free pages from the buddy. */
+	__offline_isolated_pages(start_pfn, end_pfn);
+	pr_info("Offlined Pages %ld\n", nr_pages);
+
 	/*
 	 * Onlining will reset pagetype flags and makes migrate type
 	 * MOVABLE, so just need to decrease the number of isolated
@@ -1601,11 +1575,11 @@ int __ref offline_pages(unsigned long st
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	/* removal success */
-	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
-	zone->present_pages -= offlined_pages;
+	adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages);
+	zone->present_pages -= nr_pages;
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
-	zone->zone_pgdat->node_present_pages -= offlined_pages;
+	zone->zone_pgdat->node_present_pages -= nr_pages;
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 
 	init_per_zone_wmark_min();
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-page_alloc-simplify-__offline_isolated_pages.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (90 preceding siblings ...)
  2020-08-19 18:20 ` + mm-memory_hotplug-simplify-page-offlining.patch " Andrew Morton
@ 2020-08-19 18:20 ` Andrew Morton
  2020-08-19 18:20 ` + mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch " Andrew Morton
                   ` (47 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:20 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm/page_alloc: simplify __offline_isolated_pages()
has been added to the -mm tree.  Its filename is
     mm-page_alloc-simplify-__offline_isolated_pages.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-simplify-__offline_isolated_pages.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-simplify-__offline_isolated_pages.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/page_alloc: simplify __offline_isolated_pages()

offline_pages() is the only user.  __offline_isolated_pages() never gets
called with ranges that contain memory holes and we no longer care about
the return value.  Drop the return value handling and all pfn_valid()
checks.

Update the documentation.

Link: https://lkml.kernel.org/r/20200819175957.28465-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memory_hotplug.h |    4 ++--
 mm/page_alloc.c                |   27 ++++-----------------------
 2 files changed, 6 insertions(+), 25 deletions(-)

--- a/include/linux/memory_hotplug.h~mm-page_alloc-simplify-__offline_isolated_pages
+++ a/include/linux/memory_hotplug.h
@@ -103,8 +103,8 @@ extern int online_pages(unsigned long pf
 			int online_type, int nid);
 extern struct zone *test_pages_in_a_zone(unsigned long start_pfn,
 					 unsigned long end_pfn);
-extern unsigned long __offline_isolated_pages(unsigned long start_pfn,
-						unsigned long end_pfn);
+extern void __offline_isolated_pages(unsigned long start_pfn,
+				     unsigned long end_pfn);
 
 typedef void (*online_page_callback_t)(struct page *page, unsigned int order);
 
--- a/mm/page_alloc.c~mm-page_alloc-simplify-__offline_isolated_pages
+++ a/mm/page_alloc.c
@@ -8685,35 +8685,21 @@ void zone_pcp_reset(struct zone *zone)
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
- * All pages in the range must be in a single zone and isolated
- * before calling this.
+ * All pages in the range must be in a single zone, must not contain holes,
+ * must span full sections, and must be isolated before calling this function.
  */
-unsigned long
-__offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
+void __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 {
+	unsigned long pfn = start_pfn;
 	struct page *page;
 	struct zone *zone;
 	unsigned int order;
-	unsigned long pfn;
 	unsigned long flags;
-	unsigned long offlined_pages = 0;
-
-	/* find the first valid pfn */
-	for (pfn = start_pfn; pfn < end_pfn; pfn++)
-		if (pfn_valid(pfn))
-			break;
-	if (pfn == end_pfn)
-		return offlined_pages;
 
 	offline_mem_sections(pfn, end_pfn);
 	zone = page_zone(pfn_to_page(pfn));
 	spin_lock_irqsave(&zone->lock, flags);
-	pfn = start_pfn;
 	while (pfn < end_pfn) {
-		if (!pfn_valid(pfn)) {
-			pfn++;
-			continue;
-		}
 		page = pfn_to_page(pfn);
 		/*
 		 * The HWPoisoned page may be not in buddy system, and
@@ -8721,7 +8707,6 @@ __offline_isolated_pages(unsigned long s
 		 */
 		if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
 			pfn++;
-			offlined_pages++;
 			continue;
 		}
 		/*
@@ -8732,20 +8717,16 @@ __offline_isolated_pages(unsigned long s
 			BUG_ON(page_count(page));
 			BUG_ON(PageBuddy(page));
 			pfn++;
-			offlined_pages++;
 			continue;
 		}
 
 		BUG_ON(page_count(page));
 		BUG_ON(!PageBuddy(page));
 		order = page_order(page);
-		offlined_pages += 1 << order;
 		del_page_from_free_list(page, zone, order);
 		pfn += (1 << order);
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
-
-	return offlined_pages;
 }
 #endif
 
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (91 preceding siblings ...)
  2020-08-19 18:20 ` + mm-page_alloc-simplify-__offline_isolated_pages.patch " Andrew Morton
@ 2020-08-19 18:20 ` Andrew Morton
  2020-08-19 18:20 ` + mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch " Andrew Morton
                   ` (46 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:20 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm/memory_hotplug: drop nr_isolate_pageblock in offline_pages()
has been added to the -mm tree.  Its filename is
     mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: drop nr_isolate_pageblock in offline_pages()

We make sure that we cannot have any memory holes right at the beginning
of offline_pages() and we only support to online/offline full sections. 
Both, sections and pageblocks are a power of two in size, and sections
always span full pageblocks.

We can directly calculate the number of isolated pageblocks from nr_pages.

Link: https://lkml.kernel.org/r/20200819175957.28465-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages
+++ a/mm/memory_hotplug.c
@@ -1462,10 +1462,10 @@ int __ref offline_pages(unsigned long st
 {
 	const unsigned long end_pfn = start_pfn + nr_pages;
 	unsigned long pfn, system_ram_pages = 0;
-	int ret, node, nr_isolate_pageblock;
 	unsigned long flags;
 	struct zone *zone;
 	struct memory_notify arg;
+	int ret, node;
 	char *reason;
 
 	/* We can only offline full sections (e.g., SECTION_IS_ONLINE) */
@@ -1509,7 +1509,6 @@ int __ref offline_pages(unsigned long st
 		reason = "failure to isolate range";
 		goto failed_removal;
 	}
-	nr_isolate_pageblock = ret;
 
 	arg.start_pfn = start_pfn;
 	arg.nr_pages = nr_pages;
@@ -1571,7 +1570,7 @@ int __ref offline_pages(unsigned long st
 	 * pageblocks zone counter here.
 	 */
 	spin_lock_irqsave(&zone->lock, flags);
-	zone->nr_isolate_pageblock -= nr_isolate_pageblock;
+	zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages;
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	/* removal success */
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (92 preceding siblings ...)
  2020-08-19 18:20 ` + mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch " Andrew Morton
@ 2020-08-19 18:20 ` Andrew Morton
  2020-08-19 18:20 ` + mm-memory_hotplug-simplify-page-onlining.patch " Andrew Morton
                   ` (45 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:20 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm/page_isolation: simplify return value of start_isolate_page_range()
has been added to the -mm tree.  Its filename is
     mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/page_isolation: simplify return value of start_isolate_page_range()

Callers no longer need the number of isolated pageblocks.  Let's simplify.

Link: https://lkml.kernel.org/r/20200819175957.28465-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |    2 +-
 mm/page_alloc.c     |    2 +-
 mm/page_isolation.c |    7 ++-----
 3 files changed, 4 insertions(+), 7 deletions(-)

--- a/mm/memory_hotplug.c~mm-page_isolation-simplify-return-value-of-start_isolate_page_range
+++ a/mm/memory_hotplug.c
@@ -1505,7 +1505,7 @@ int __ref offline_pages(unsigned long st
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
 				       MEMORY_OFFLINE | REPORT_FAILURE);
-	if (ret < 0) {
+	if (ret) {
 		reason = "failure to isolate range";
 		goto failed_removal;
 	}
--- a/mm/page_alloc.c~mm-page_isolation-simplify-return-value-of-start_isolate_page_range
+++ a/mm/page_alloc.c
@@ -8449,7 +8449,7 @@ int alloc_contig_range(unsigned long sta
 
 	ret = start_isolate_page_range(pfn_max_align_down(start),
 				       pfn_max_align_up(end), migratetype, 0);
-	if (ret < 0)
+	if (ret)
 		return ret;
 
 	/*
--- a/mm/page_isolation.c~mm-page_isolation-simplify-return-value-of-start_isolate_page_range
+++ a/mm/page_isolation.c
@@ -165,8 +165,7 @@ __first_valid_page(unsigned long pfn, un
  * pageblocks we may have modified and return -EBUSY to caller. This
  * prevents two threads from simultaneously working on overlapping ranges.
  *
- * Return: the number of isolated pageblocks on success and -EBUSY if any part
- * of range cannot be isolated.
+ * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     unsigned migratetype, int flags)
@@ -174,7 +173,6 @@ int start_isolate_page_range(unsigned lo
 	unsigned long pfn;
 	unsigned long undo_pfn;
 	struct page *page;
-	int nr_isolate_pageblock = 0;
 
 	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
 	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
@@ -188,10 +186,9 @@ int start_isolate_page_range(unsigned lo
 				undo_pfn = pfn;
 				goto undo;
 			}
-			nr_isolate_pageblock++;
 		}
 	}
-	return nr_isolate_pageblock;
+	return 0;
 undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memory_hotplug-simplify-page-onlining.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (93 preceding siblings ...)
  2020-08-19 18:20 ` + mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch " Andrew Morton
@ 2020-08-19 18:20 ` Andrew Morton
  2020-08-19 18:20 ` + mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch " Andrew Morton
                   ` (44 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:20 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm/memory_hotplug: simplify page onlining
has been added to the -mm tree.  Its filename is
     mm-memory_hotplug-simplify-page-onlining.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-simplify-page-onlining.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-simplify-page-onlining.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: simplify page onlining

We don't allow to offline memory with holes, all boot memory is online,
and all hotplugged memory cannot have holes.

We can now simplify onlining of pages.  As we only allow to online/offline
full sections and sections always span full MAX_ORDER_NR_PAGES, we can
just process MAX_ORDER - 1 pages without further special handling.

The number of onlined pages simply corresponds to the number of pages we
were requested to online.

While at it, refine the comment regarding the callback not exposing all
pages to the buddy.

Link: https://lkml.kernel.org/r/20200819175957.28465-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |   38 ++++++++++----------------------------
 1 file changed, 10 insertions(+), 28 deletions(-)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-simplify-page-onlining
+++ a/mm/memory_hotplug.c
@@ -617,31 +617,22 @@ void generic_online_page(struct page *pa
 }
 EXPORT_SYMBOL_GPL(generic_online_page);
 
-static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
-			void *arg)
+static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
 {
 	const unsigned long end_pfn = start_pfn + nr_pages;
 	unsigned long pfn;
-	int order;
 
 	/*
-	 * Online the pages. The callback might decide to keep some pages
-	 * PG_reserved (to add them to the buddy later), but we still account
-	 * them as being online/belonging to this zone ("present").
+	 * Online the pages in MAX_ORDER - 1 aligned chunks. The callback might
+	 * decide to not expose all pages to the buddy (e.g., expose them
+	 * later). We account all pages as being online and belonging to this
+	 * zone ("present").
 	 */
-	for (pfn = start_pfn; pfn < end_pfn; pfn += 1ul << order) {
-		order = min(MAX_ORDER - 1, get_order(PFN_PHYS(end_pfn - pfn)));
-		/* __free_pages_core() wants pfns to be aligned to the order */
-		if (WARN_ON_ONCE(!IS_ALIGNED(pfn, 1ul << order)))
-			order = 0;
-		(*online_page_callback)(pfn_to_page(pfn), order);
-	}
+	for (pfn = start_pfn; pfn < end_pfn; pfn += MAX_ORDER_NR_PAGES)
+		(*online_page_callback)(pfn_to_page(pfn), MAX_ORDER - 1);
 
 	/* mark all involved sections as online */
 	online_mem_sections(start_pfn, end_pfn);
-
-	*(unsigned long *)arg += nr_pages;
-	return 0;
 }
 
 /* check which state of node_states will be changed when online memory */
@@ -795,7 +786,6 @@ int __ref online_pages(unsigned long pfn
 		       int online_type, int nid)
 {
 	unsigned long flags;
-	unsigned long onlined_pages = 0;
 	struct zone *zone;
 	int need_zonelists_rebuild = 0;
 	int ret;
@@ -831,19 +821,11 @@ int __ref online_pages(unsigned long pfn
 		setup_zone_pageset(zone);
 	}
 
-	ret = walk_system_ram_range(pfn, nr_pages, &onlined_pages,
-		online_pages_range);
-	if (ret) {
-		/* not a single memory resource was applicable */
-		if (need_zonelists_rebuild)
-			zone_pcp_reset(zone);
-		goto failed_addition;
-	}
-
-	zone->present_pages += onlined_pages;
+	online_pages_range(pfn, nr_pages);
+	zone->present_pages += nr_pages;
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
-	zone->zone_pgdat->node_present_pages += onlined_pages;
+	zone->zone_pgdat->node_present_pages += nr_pages;
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 
 	/*
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (94 preceding siblings ...)
  2020-08-19 18:20 ` + mm-memory_hotplug-simplify-page-onlining.patch " Andrew Morton
@ 2020-08-19 18:20 ` Andrew Morton
  2020-08-19 18:21 ` + mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch " Andrew Morton
                   ` (43 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:20 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm/page_alloc: drop stale pageblock comment in memmap_init_zone*()
has been added to the -mm tree.  Its filename is
     mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/page_alloc: drop stale pageblock comment in memmap_init_zone*()

Commit ac5d2539b238 ("mm: meminit: reduce number of times pageblocks are
set during struct page init") moved the actual zone range check, leaving
only the alignment check for pageblocks.

Let's drop the stale comment and make the pageblock check easier to read.

Link: https://lkml.kernel.org/r/20200819175957.28465-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |   14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

--- a/mm/page_alloc.c~mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone
+++ a/mm/page_alloc.c
@@ -6025,13 +6025,8 @@ void __meminit memmap_init_zone(unsigned
 		 * to reserve their blocks rather than leaking throughout
 		 * the address space during boot when many long-lived
 		 * kernel allocations are made.
-		 *
-		 * bitmap is created for zone's valid pfn range. but memmap
-		 * can be created for invalid pages (for alignment)
-		 * check here not to call set_pageblock_migratetype() against
-		 * pfn out of zone.
 		 */
-		if (!(pfn & (pageblock_nr_pages - 1))) {
+		if (IS_ALIGNED(pfn, pageblock_nr_pages)) {
 			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 			cond_resched();
 		}
@@ -6094,15 +6089,10 @@ void __ref memmap_init_zone_device(struc
 		 * the address space during boot when many long-lived
 		 * kernel allocations are made.
 		 *
-		 * bitmap is created for zone's valid pfn range. but memmap
-		 * can be created for invalid pages (for alignment)
-		 * check here not to call set_pageblock_migratetype() against
-		 * pfn out of zone.
-		 *
 		 * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
 		 * because this is done early in section_activate()
 		 */
-		if (!(pfn & (pageblock_nr_pages - 1))) {
+		if (IS_ALIGNED(pfn, pageblock_nr_pages)) {
 			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 			cond_resched();
 		}
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (95 preceding siblings ...)
  2020-08-19 18:20 ` + mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch " Andrew Morton
@ 2020-08-19 18:21 ` Andrew Morton
  2020-08-19 18:21 ` + mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch " Andrew Morton
                   ` (42 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:21 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm: pass migratetype into memmap_init_zone() and move_pfn_range_to_zone()
has been added to the -mm tree.  Its filename is
     mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm: pass migratetype into memmap_init_zone() and move_pfn_range_to_zone()

On the memory onlining path, we want to start with MIGRATE_ISOLATE, to
un-isolate the pages after memory onlining is complete.  Let's allow
passing in the migratetype.

Link: https://lkml.kernel.org/r/20200819175957.28465-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/ia64/mm/init.c            |    4 ++--
 include/linux/memory_hotplug.h |    3 ++-
 include/linux/mm.h             |    3 ++-
 mm/memory_hotplug.c            |   11 ++++++++---
 mm/memremap.c                  |    3 ++-
 mm/page_alloc.c                |   21 ++++++++++++---------
 6 files changed, 28 insertions(+), 17 deletions(-)

--- a/arch/ia64/mm/init.c~mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone
+++ a/arch/ia64/mm/init.c
@@ -538,7 +538,7 @@ virtual_memmap_init(u64 start, u64 end,
 	if (map_start < map_end)
 		memmap_init_zone((unsigned long)(map_end - map_start),
 				 args->nid, args->zone, page_to_pfn(map_start),
-				 MEMMAP_EARLY, NULL);
+				 MEMMAP_EARLY, NULL, MIGRATE_MOVABLE);
 	return 0;
 }
 
@@ -548,7 +548,7 @@ memmap_init (unsigned long size, int nid
 {
 	if (!vmem_map) {
 		memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY,
-				NULL);
+				 NULL, MIGRATE_MOVABLE);
 	} else {
 		struct page *start;
 		struct memmap_init_callback_data args;
--- a/include/linux/memory_hotplug.h~mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone
+++ a/include/linux/memory_hotplug.h
@@ -346,7 +346,8 @@ extern int add_memory_resource(int nid,
 extern int add_memory_driver_managed(int nid, u64 start, u64 size,
 				     const char *resource_name);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap);
+				   unsigned long nr_pages,
+				   struct vmem_altmap *altmap, int migratetype);
 extern void remove_pfn_range_from_zone(struct zone *zone,
 				       unsigned long start_pfn,
 				       unsigned long nr_pages);
--- a/include/linux/mm.h~mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone
+++ a/include/linux/mm.h
@@ -2425,7 +2425,8 @@ extern int __meminit __early_pfn_to_nid(
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
 extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
-		enum memmap_context, struct vmem_altmap *);
+			    enum memmap_context, struct vmem_altmap *,
+			    int migratetype);
 extern void setup_per_zone_wmarks(void);
 extern int __meminit init_per_zone_wmark_min(void);
 extern void mem_init(void);
--- a/mm/memory_hotplug.c~mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone
+++ a/mm/memory_hotplug.c
@@ -693,9 +693,14 @@ static void __meminit resize_pgdat_range
  * Associate the pfn range with the given zone, initializing the memmaps
  * and resizing the pgdat/zone data to span the added pages. After this
  * call, all affected pages are PG_reserved.
+ *
+ * All aligned pageblocks are initialized to the specified migratetype
+ * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
+ * zone stats (e.g., nr_isolate_pageblock) are touched.
  */
 void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap)
+				  unsigned long nr_pages,
+				  struct vmem_altmap *altmap, int migratetype)
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	int nid = pgdat->node_id;
@@ -720,7 +725,7 @@ void __ref move_pfn_range_to_zone(struct
 	 * are reserved so nobody should be touching them so we should be safe
 	 */
 	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
-			MEMMAP_HOTPLUG, altmap);
+			 MEMMAP_HOTPLUG, altmap, migratetype);
 
 	set_zone_contiguous(zone);
 }
@@ -800,7 +805,7 @@ int __ref online_pages(unsigned long pfn
 
 	/* associate pfn range with the zone */
 	zone = zone_for_pfn_range(online_type, nid, pfn, nr_pages);
-	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL);
+	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_MOVABLE);
 
 	arg.start_pfn = pfn;
 	arg.nr_pages = nr_pages;
--- a/mm/memremap.c~mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone
+++ a/mm/memremap.c
@@ -342,7 +342,8 @@ void *memremap_pages(struct dev_pagemap
 
 		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
 		move_pfn_range_to_zone(zone, PHYS_PFN(res->start),
-				PHYS_PFN(resource_size(res)), params.altmap);
+				       PHYS_PFN(resource_size(res)),
+				       params.altmap, MIGRATE_MOVABLE);
 	}
 
 	mem_hotplug_done();
--- a/mm/page_alloc.c~mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone
+++ a/mm/page_alloc.c
@@ -5973,10 +5973,15 @@ overlap_memmap_init(unsigned long zone,
  * Initially all pages are reserved - free ones are freed
  * up by memblock_free_all() once the early boot process is
  * done. Non-atomic initialization, single-pass.
+ *
+ * All aligned pageblocks are initialized to the specified migratetype
+ * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
+ * zone stats (e.g., nr_isolate_pageblock) are touched.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
-		unsigned long start_pfn, enum memmap_context context,
-		struct vmem_altmap *altmap)
+				unsigned long start_pfn,
+				enum memmap_context context,
+				struct vmem_altmap *altmap, int migratetype)
 {
 	unsigned long pfn, end_pfn = start_pfn + size;
 	struct page *page;
@@ -6020,14 +6025,12 @@ void __meminit memmap_init_zone(unsigned
 			__SetPageReserved(page);
 
 		/*
-		 * Mark the block movable so that blocks are reserved for
-		 * movable at startup. This will force kernel allocations
-		 * to reserve their blocks rather than leaking throughout
-		 * the address space during boot when many long-lived
-		 * kernel allocations are made.
+		 * Usually, we want to mark the pageblock MIGRATE_MOVABLE,
+		 * such that unmovable allocations won't be scattered all
+		 * over the place during system boot.
 		 */
 		if (IS_ALIGNED(pfn, pageblock_nr_pages)) {
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			set_pageblock_migratetype(page, migratetype);
 			cond_resched();
 		}
 		pfn++;
@@ -6127,7 +6130,7 @@ void __meminit __weak memmap_init(unsign
 		if (end_pfn > start_pfn) {
 			size = end_pfn - start_pfn;
 			memmap_init_zone(size, nid, zone, start_pfn,
-					 MEMMAP_EARLY, NULL);
+					 MEMMAP_EARLY, NULL, MIGRATE_MOVABLE);
 		}
 	}
 }
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (96 preceding siblings ...)
  2020-08-19 18:21 ` + mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch " Andrew Morton
@ 2020-08-19 18:21 ` Andrew Morton
  2020-08-19 18:31 ` + mm-migrate-avoid-possible-unnecessary-process-right-check-in-kernel_move_pages.patch " Andrew Morton
                   ` (41 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:21 UTC (permalink / raw)
  To: bhe, charante, dan.j.williams, david, fenghua.yu, logang,
	mgorman, mgorman, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, richard.weiyang, rppt, tony.luck, walken,
	willy


The patch titled
     Subject: mm/memory_hotplug: mark pageblocks MIGRATE_ISOLATE while onlining memory
has been added to the -mm tree.  Its filename is
     mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: mark pageblocks MIGRATE_ISOLATE while onlining memory

Currently, it can happen that pages are allocated (and freed) via the
buddy before we finished basic memory onlining.

For example, pages are exposed to the buddy and can be allocated before we
actually mark the sections online.  Allocated pages could suddenly fail
pfn_to_online_page() checks.  We had similar issues with pcp handling,
when pages are allocated+freed before we reach zone_pcp_update() in
online_pages() [1].

Instead, mark all pageblocks MIGRATE_ISOLATE, such that allocations are
impossible.  Once done with the heavy lifting, use
undo_isolate_page_range() to move the pages to the MIGRATE_MOVABLE
freelist, marking them ready for allocation.  Similar to offline_pages(),
we have to manually adjust zone->nr_isolate_pageblock.

[1] https://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaurora.org

Link: https://lkml.kernel.org/r/20200819175957.28465-11-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Charan Teja Reddy <charante@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/Kconfig          |    2 +-
 mm/memory_hotplug.c |   32 ++++++++++++++++++++++----------
 2 files changed, 23 insertions(+), 11 deletions(-)

--- a/mm/Kconfig~mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory
+++ a/mm/Kconfig
@@ -152,6 +152,7 @@ config HAVE_BOOTMEM_INFO_NODE
 # eventually, we can have this option just 'select SPARSEMEM'
 config MEMORY_HOTPLUG
 	bool "Allow for memory hot-add"
+	select MEMORY_ISOLATION
 	depends on SPARSEMEM || X86_64_ACPI_NUMA
 	depends on ARCH_ENABLE_MEMORY_HOTPLUG
 	depends on 64BIT || BROKEN
@@ -178,7 +179,6 @@ config MEMORY_HOTPLUG_DEFAULT_ONLINE
 
 config MEMORY_HOTREMOVE
 	bool "Allow for memory hot remove"
-	select MEMORY_ISOLATION
 	select HAVE_BOOTMEM_INFO_NODE if (X86_64 || PPC64)
 	depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
 	depends on MIGRATION
--- a/mm/memory_hotplug.c~mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory
+++ a/mm/memory_hotplug.c
@@ -805,7 +805,7 @@ int __ref online_pages(unsigned long pfn
 
 	/* associate pfn range with the zone */
 	zone = zone_for_pfn_range(online_type, nid, pfn, nr_pages);
-	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_MOVABLE);
+	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
 
 	arg.start_pfn = pfn;
 	arg.nr_pages = nr_pages;
@@ -817,6 +817,14 @@ int __ref online_pages(unsigned long pfn
 		goto failed_addition;
 
 	/*
+	 * Fixup the number of isolated pageblocks before marking the sections
+	 * onlining, such that undo_isolate_page_range() works correctly.
+	 */
+	spin_lock_irqsave(&zone->lock, flags);
+	zone->nr_isolate_pageblock += nr_pages / pageblock_nr_pages;
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	/*
 	 * If this zone is not populated, then it is not in zonelist.
 	 * This means the page allocator ignores this zone.
 	 * So, zonelist must be updated after online.
@@ -833,21 +841,25 @@ int __ref online_pages(unsigned long pfn
 	zone->zone_pgdat->node_present_pages += nr_pages;
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 
+	node_states_set_node(nid, &arg);
+	if (need_zonelists_rebuild)
+		build_all_zonelists(NULL);
+	zone_pcp_update(zone);
+
+	/* Basic onlining is complete, allow allocation of onlined pages. */
+	undo_isolate_page_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE);
+
 	/*
 	 * When exposing larger, physically contiguous memory areas to the
 	 * buddy, shuffling in the buddy (when freeing onlined pages, putting
 	 * them either to the head or the tail of the freelist) is only helpful
 	 * for maintaining the shuffle, but not for creating the initial
 	 * shuffle. Shuffle the whole zone to make sure the just onlined pages
-	 * are properly distributed across the whole freelist.
+	 * are properly distributed across the whole freelist. Make sure to
+	 * shuffle once pageblocks are no longer isolated.
 	 */
 	shuffle_zone(zone);
 
-	node_states_set_node(nid, &arg);
-	if (need_zonelists_rebuild)
-		build_all_zonelists(NULL);
-	zone_pcp_update(zone);
-
 	init_per_zone_wmark_min();
 
 	kswapd_run(nid);
@@ -1552,9 +1564,9 @@ int __ref offline_pages(unsigned long st
 	pr_info("Offlined Pages %ld\n", nr_pages);
 
 	/*
-	 * Onlining will reset pagetype flags and makes migrate type
-	 * MOVABLE, so just need to decrease the number of isolated
-	 * pageblocks zone counter here.
+	 * The memory sections are marked offline, and the pageblock flags
+	 * effectively stale; nobody should be touching them. Fixup the number
+	 * of isolated pageblocks, memory onlining will properly revert this.
 	 */
 	spin_lock_irqsave(&zone->lock, flags);
 	zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages;
_

Patches currently in -mm which might be from david@redhat.com are

mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
mm-page_isolation-cleanup-set_migratetype_isolate.patch
virtio-mem-dont-special-case-zone_movable.patch
mm-document-semantics-of-zone_movable.patch
mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
mm-memory_hotplug-simplify-page-offlining.patch
mm-page_alloc-simplify-__offline_isolated_pages.patch
mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
mm-memory_hotplug-simplify-page-onlining.patch
mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-migrate-avoid-possible-unnecessary-process-right-check-in-kernel_move_pages.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (97 preceding siblings ...)
  2020-08-19 18:21 ` + mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch " Andrew Morton
@ 2020-08-19 18:31 ` Andrew Morton
  2020-08-19 18:34 ` + mm-fix-missing-function-declaration.patch " Andrew Morton
                   ` (40 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:31 UTC (permalink / raw)
  To: cl, linmiaohe, louhongxiang, mm-commits, willy


The patch titled
     Subject: mm/migrate: avoid possible unnecessary process right check in kernel_move_pages()
has been added to the -mm tree.  Its filename is
     mm-migrate-avoid-possible-unnecessary-process-right-check-in-kernel_move_pages.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-migrate-avoid-possible-unnecessary-process-right-check-in-kernel_move_pages.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-migrate-avoid-possible-unnecessary-process-right-check-in-kernel_move_pages.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/migrate: avoid possible unnecessary process right check in kernel_move_pages()

There is no need to check if this process has the right to modify the
specified process when they are same.  And we could also skip the security
hook call if a process is modifying its own pages.  Add helper function to
handle these.

Link: https://lkml.kernel.org/r/20200819083331.19012-1-linmiaohe@huawei.com
Signed-off-by: Hongxiang Lou <louhongxiang@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |   71 +++++++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 28 deletions(-)

--- a/mm/migrate.c~mm-migrate-avoid-possible-unnecessary-process-right-check-in-kernel_move_pages
+++ a/mm/migrate.c
@@ -1864,33 +1864,27 @@ static int do_pages_stat(struct mm_struc
 	return nr_pages ? -EFAULT : 0;
 }
 
-/*
- * Move a list of pages in the address space of the currently executing
- * process.
- */
-static int kernel_move_pages(pid_t pid, unsigned long nr_pages,
-			     const void __user * __user *pages,
-			     const int __user *nodes,
-			     int __user *status, int flags)
+static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
 {
 	struct task_struct *task;
 	struct mm_struct *mm;
-	int err;
-	nodemask_t task_nodes;
-
-	/* Check flags */
-	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL))
-		return -EINVAL;
 
-	if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
-		return -EPERM;
+	/*
+	 * There is no need to check if current process has the right to modify
+	 * the specified process when they are same.
+	 */
+	if (!pid) {
+		mmget(current->mm);
+		*mem_nodes = cpuset_mems_allowed(current);
+		return current->mm;
+	}
 
 	/* Find the mm_struct */
 	rcu_read_lock();
-	task = pid ? find_task_by_vpid(pid) : current;
+	task = find_task_by_vpid(pid);
 	if (!task) {
 		rcu_read_unlock();
-		return -ESRCH;
+		return ERR_PTR(-ESRCH);
 	}
 	get_task_struct(task);
 
@@ -1900,22 +1894,47 @@ static int kernel_move_pages(pid_t pid,
 	 */
 	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
 		rcu_read_unlock();
-		err = -EPERM;
+		mm = ERR_PTR(-EPERM);
 		goto out;
 	}
 	rcu_read_unlock();
 
- 	err = security_task_movememory(task);
- 	if (err)
+	mm = ERR_PTR(security_task_movememory(task));
+	if (IS_ERR(mm))
 		goto out;
-
-	task_nodes = cpuset_mems_allowed(task);
+	*mem_nodes = cpuset_mems_allowed(task);
 	mm = get_task_mm(task);
+out:
 	put_task_struct(task);
-
 	if (!mm)
+		mm = ERR_PTR(-EINVAL);
+	return mm;
+}
+
+/*
+ * Move a list of pages in the address space of the currently executing
+ * process.
+ */
+static int kernel_move_pages(pid_t pid, unsigned long nr_pages,
+			     const void __user * __user *pages,
+			     const int __user *nodes,
+			     int __user *status, int flags)
+{
+	struct mm_struct *mm;
+	int err;
+	nodemask_t task_nodes;
+
+	/* Check flags */
+	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL))
 		return -EINVAL;
 
+	if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
+		return -EPERM;
+
+	mm = find_mm_struct(pid, &task_nodes);
+	if (IS_ERR(mm))
+		return PTR_ERR(mm);
+
 	if (nodes)
 		err = do_pages_move(mm, task_nodes, nr_pages, pages,
 				    nodes, status, flags);
@@ -1924,10 +1943,6 @@ static int kernel_move_pages(pid_t pid,
 
 	mmput(mm);
 	return err;
-
-out:
-	put_task_struct(task);
-	return err;
 }
 
 SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
_

Patches currently in -mm which might be from linmiaohe@huawei.com are

mm-migrate-avoid-possible-unnecessary-process-right-check-in-kernel_move_pages.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-fix-missing-function-declaration.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (98 preceding siblings ...)
  2020-08-19 18:31 ` + mm-migrate-avoid-possible-unnecessary-process-right-check-in-kernel_move_pages.patch " Andrew Morton
@ 2020-08-19 18:34 ` Andrew Morton
  2020-08-19 18:36 ` + ia64-fix-build-error-with-coredump.patch " Andrew Morton
                   ` (39 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:34 UTC (permalink / raw)
  To: anshuman.khandual, leonro, mm-commits


The patch titled
     Subject: mm/rodata_test.c: fix missing function declaration
has been added to the -mm tree.  Its filename is
     mm-fix-missing-function-declaration.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-fix-missing-function-declaration.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-missing-function-declaration.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Leon Romanovsky <leonro@nvidia.com>
Subject: mm/rodata_test.c: fix missing function declaration

The compilation with CONFIG_DEBUG_RODATA_TEST set produces the following
warning due to the missing include.

 mm/rodata_test.c:15:6: warning: no previous prototype for 'rodata_test' [-Wmissing-prototypes]
    15 | void rodata_test(void)
      |      ^~~~~~~~~~~

Link: https://lkml.kernel.org/r/20200819080026.918134-1-leon@kernel.org
Fixes: 2959a5f726f6 ("mm: add arch-independent testcases for RODATA")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/rodata_test.c |    1 +
 1 file changed, 1 insertion(+)

--- a/mm/rodata_test.c~mm-fix-missing-function-declaration
+++ a/mm/rodata_test.c
@@ -7,6 +7,7 @@
  */
 #define pr_fmt(fmt) "rodata_test: " fmt
 
+#include <linux/rodata_test.h>
 #include <linux/uaccess.h>
 #include <asm/sections.h>
 
_

Patches currently in -mm which might be from leonro@nvidia.com are

mm-fix-missing-function-declaration.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + ia64-fix-build-error-with-coredump.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (99 preceding siblings ...)
  2020-08-19 18:34 ` + mm-fix-missing-function-declaration.patch " Andrew Morton
@ 2020-08-19 18:36 ` Andrew Morton
  2020-08-19 19:01 ` + mm-debug-do-not-dereference-i_ino-blindly.patch " Andrew Morton
                   ` (38 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 18:36 UTC (permalink / raw)
  To: fenghua.yu, krzk, lkp, mm-commits, stable, tony.luck


The patch titled
     Subject: ia64: fix build error with !COREDUMP
has been added to the -mm tree.  Its filename is
     ia64-fix-build-error-with-coredump.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/ia64-fix-build-error-with-coredump.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/ia64-fix-build-error-with-coredump.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Krzysztof Kozlowski <krzk@kernel.org>
Subject: ia64: fix build error with !COREDUMP

Fix linkage error when CONFIG_BINFMT_ELF is selected but CONFIG_COREDUMP
is not:

    ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_phdrs':
    elfcore.c:(.text+0x172): undefined reference to `dump_emit'
    ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_data':
    elfcore.c:(.text+0x2b2): undefined reference to `dump_emit'

Link: https://lkml.kernel.org/r/20200819064146.12529-1-krzk@kernel.org
Fixes: 1fcccbac89f5 ("elf coredump: replace ELF_CORE_EXTRA_* macros by functions")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/ia64/kernel/Makefile |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/ia64/kernel/Makefile~ia64-fix-build-error-with-coredump
+++ a/arch/ia64/kernel/Makefile
@@ -41,7 +41,7 @@ obj-y				+= esi_stub.o	# must be in kern
 endif
 obj-$(CONFIG_INTEL_IOMMU)	+= pci-dma.o
 
-obj-$(CONFIG_BINFMT_ELF)	+= elfcore.o
+obj-$(CONFIG_ELF_CORE)		+= elfcore.o
 
 # fp_emulate() expects f2-f5,f16-f31 to contain the user-level state.
 CFLAGS_traps.o  += -mfixed-range=f2-f5,f16-f31
_

Patches currently in -mm which might be from krzk@kernel.org are

ia64-fix-build-error-with-coredump.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-debug-do-not-dereference-i_ino-blindly.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (100 preceding siblings ...)
  2020-08-19 18:36 ` + ia64-fix-build-error-with-coredump.patch " Andrew Morton
@ 2020-08-19 19:01 ` Andrew Morton
  2020-08-19 19:02 ` + mm-highmem-clean-up-endif-comments.patch " Andrew Morton
                   ` (37 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:01 UTC (permalink / raw)
  To: jhubbard, mm-commits, rppt, vbabka, willy


The patch titled
     Subject: mm/debug.c: do not dereference i_ino blindly
has been added to the -mm tree.  Its filename is
     mm-debug-do-not-dereference-i_ino-blindly.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-debug-do-not-dereference-i_ino-blindly.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-debug-do-not-dereference-i_ino-blindly.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm/debug.c: do not dereference i_ino blindly

__dump_page() checks i_dentry is fetchable and i_ino is earlier in the
struct than i_ino, so it ought to work fine, but it's possible that struct
randomisation has reordered i_ino after i_dentry and the pointer is just
wild enough that i_dentry is fetchable and i_ino isn't.

Also print the inode number if the dentry is invalid.

Link: https://lkml.kernel.org/r/20200819185710.28180-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/debug.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

--- a/mm/debug.c~mm-debug-do-not-dereference-i_ino-blindly
+++ a/mm/debug.c
@@ -120,6 +120,7 @@ void __dump_page(struct page *page, cons
 		struct hlist_node *dentry_first;
 		struct dentry *dentry_ptr;
 		struct dentry dentry;
+		unsigned long ino;
 
 		/*
 		 * mapping can be invalid pointer and we don't want to crash
@@ -136,21 +137,22 @@ void __dump_page(struct page *page, cons
 			goto out_mapping;
 		}
 
-		if (get_kernel_nofault(dentry_first, &host->i_dentry.first)) {
+		if (get_kernel_nofault(dentry_first, &host->i_dentry.first) ||
+		    get_kernel_nofault(ino, &host->i_ino)) {
 			pr_warn("aops:%ps with invalid host inode %px\n",
 					a_ops, host);
 			goto out_mapping;
 		}
 
 		if (!dentry_first) {
-			pr_warn("aops:%ps ino:%lx\n", a_ops, host->i_ino);
+			pr_warn("aops:%ps ino:%lx\n", a_ops, ino);
 			goto out_mapping;
 		}
 
 		dentry_ptr = container_of(dentry_first, struct dentry, d_u.d_alias);
 		if (get_kernel_nofault(dentry, dentry_ptr)) {
-			pr_warn("aops:%ps with invalid dentry %px\n", a_ops,
-					dentry_ptr);
+			pr_warn("aops:%ps ino:%lx with invalid dentry %px\n",
+					a_ops, ino, dentry_ptr);
 		} else {
 			/*
 			 * if dentry is corrupted, the %pd handler may still
@@ -158,7 +160,7 @@ void __dump_page(struct page *page, cons
 			 * corrupted struct page
 			 */
 			pr_warn("aops:%ps ino:%lx dentry name:\"%pd\"\n",
-					a_ops, host->i_ino, &dentry);
+					a_ops, ino, &dentry);
 		}
 	}
 out_mapping:
_

Patches currently in -mm which might be from willy@infradead.org are

mm-debug-do-not-dereference-i_ino-blindly.patch
mm-account-pmd-tables-like-pte-tables.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-highmem-clean-up-endif-comments.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (101 preceding siblings ...)
  2020-08-19 19:01 ` + mm-debug-do-not-dereference-i_ino-blindly.patch " Andrew Morton
@ 2020-08-19 19:02 ` Andrew Morton
  2020-08-19 19:27 ` + kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch " Andrew Morton
                   ` (36 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:02 UTC (permalink / raw)
  To: akpm, ira.weiny, mm-commits


The patch titled
     Subject: mm/highmem.c: clean up endif comments
has been added to the -mm tree.  Its filename is
     mm-highmem-clean-up-endif-comments.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-highmem-clean-up-endif-comments.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-highmem-clean-up-endif-comments.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Ira Weiny <ira.weiny@intel.com>
Subject: mm/highmem.c: clean up endif comments

The #endif at the end of the file matches up with the '#if
defined(HASHED_PAGE_VIRTUAL)' on line 374.  Not the CONFIG_HIGHMEM #if
earlier.

Fix comments on both of the #endif's to indicate the correct end of blocks
for each.

Link: https://lkml.kernel.org/r/20200819184635.112579-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/highmem.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/highmem.c~mm-highmem-clean-up-endif-comments
+++ a/mm/highmem.c
@@ -369,7 +369,7 @@ void kunmap_high(struct page *page)
 }
 
 EXPORT_SYMBOL(kunmap_high);
-#endif
+#endif	/* CONFIG_HIGHMEM */
 
 #if defined(HASHED_PAGE_VIRTUAL)
 
@@ -481,4 +481,4 @@ void __init page_address_init(void)
 	}
 }
 
-#endif	/* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */
+#endif	/* defined(HASHED_PAGE_VIRTUAL) */
_

Patches currently in -mm which might be from ira.weiny@intel.com are

mm-highmem-clean-up-endif-comments.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (102 preceding siblings ...)
  2020-08-19 19:02 ` + mm-highmem-clean-up-endif-comments.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + dma-contiguous-simplify-cma_early_percent_memory.patch " Andrew Morton
                   ` (35 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
has been added to the -mm tree.  Its filename is
     kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: KVM: PPC: Book3S HV: simplify kvm_cma_reserve()

Patch series "memblock: seasonal cleaning^w cleanup", v3.

These patches simplify several uses of memblock iterators and hide some of
the memblock implementation details from the rest of the system.


This patch (of 17):

The memory size calculation in kvm_cma_reserve() traverses memblock.memory
rather than simply call memblock_phys_mem_size().  The comment in that
function suggests that at some point there should have been call to
memblock_analyze() before memblock_phys_mem_size() could be used.  As of
now, there is no memblock_analyze() at all and memblock_phys_mem_size()
can be used as soon as cold-plug memory is registered with memblock.

Replace loop over memblock.memory with a call to memblock_phys_mem_size().

Link: https://lkml.kernel.org/r/20200818151634.14343-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20200818151634.14343-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/kvm/book3s_hv_builtin.c |   12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

--- a/arch/powerpc/kvm/book3s_hv_builtin.c~kvm-ppc-book3s-hv-simplify-kvm_cma_reserve
+++ a/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -95,23 +95,15 @@ EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
 void __init kvm_cma_reserve(void)
 {
 	unsigned long align_size;
-	struct memblock_region *reg;
-	phys_addr_t selected_size = 0;
+	phys_addr_t selected_size;
 
 	/*
 	 * We need CMA reservation only when we are in HV mode
 	 */
 	if (!cpu_has_feature(CPU_FTR_HVMODE))
 		return;
-	/*
-	 * We cannot use memblock_phys_mem_size() here, because
-	 * memblock_analyze() has not been called yet.
-	 */
-	for_each_memblock(memory, reg)
-		selected_size += memblock_region_memory_end_pfn(reg) -
-				 memblock_region_memory_base_pfn(reg);
 
-	selected_size = (selected_size * kvm_cma_resv_ratio / 100) << PAGE_SHIFT;
+	selected_size = PAGE_ALIGN(memblock_phys_mem_size() * kvm_cma_resv_ratio / 100);
 	if (selected_size) {
 		pr_info("%s: reserving %ld MiB for global area\n", __func__,
 			 (unsigned long)selected_size / SZ_1M);
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + dma-contiguous-simplify-cma_early_percent_memory.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (103 preceding siblings ...)
  2020-08-19 19:27 ` + kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + arm-xtensa-simplify-initialization-of-high-memory-pages.patch " Andrew Morton
                   ` (34 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: dma-contiguous: simplify cma_early_percent_memory()
has been added to the -mm tree.  Its filename is
     dma-contiguous-simplify-cma_early_percent_memory.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/dma-contiguous-simplify-cma_early_percent_memory.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/dma-contiguous-simplify-cma_early_percent_memory.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: dma-contiguous: simplify cma_early_percent_memory()

The memory size calculation in cma_early_percent_memory() traverses
memblock.memory rather than simply call memblock_phys_mem_size().  The
comment in that function suggests that at some point there should have
been call to memblock_analyze() before memblock_phys_mem_size() could be
used.  As of now, there is no memblock_analyze() at all and
memblock_phys_mem_size() can be used as soon as cold-plug memory is
registered with memblock.

Replace loop over memblock.memory with a call to memblock_phys_mem_size().

Link: https://lkml.kernel.org/r/20200818151634.14343-3-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/dma/contiguous.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

--- a/kernel/dma/contiguous.c~dma-contiguous-simplify-cma_early_percent_memory
+++ a/kernel/dma/contiguous.c
@@ -73,16 +73,7 @@ early_param("cma", early_cma);
 
 static phys_addr_t __init __maybe_unused cma_early_percent_memory(void)
 {
-	struct memblock_region *reg;
-	unsigned long total_pages = 0;
-
-	/*
-	 * We cannot use memblock_phys_mem_size() here, because
-	 * memblock_analyze() has not been called yet.
-	 */
-	for_each_memblock(memory, reg)
-		total_pages += memblock_region_memory_end_pfn(reg) -
-			       memblock_region_memory_base_pfn(reg);
+	unsigned long total_pages = PHYS_PFN(memblock_phys_mem_size());
 
 	return (total_pages * CONFIG_CMA_SIZE_PERCENTAGE / 100) << PAGE_SHIFT;
 }
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + arm-xtensa-simplify-initialization-of-high-memory-pages.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (104 preceding siblings ...)
  2020-08-19 19:27 ` + dma-contiguous-simplify-cma_early_percent_memory.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + arm64-numa-simplify-dummy_numa_init.patch " Andrew Morton
                   ` (33 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: arm, xtensa: simplify initialization of high memory pages
has been added to the -mm tree.  Its filename is
     arm-xtensa-simplify-initialization-of-high-memory-pages.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/arm-xtensa-simplify-initialization-of-high-memory-pages.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/arm-xtensa-simplify-initialization-of-high-memory-pages.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: arm, xtensa: simplify initialization of high memory pages

free_highpages() in both arm and xtensa essentially open-code
for_each_free_mem_range() loop to detect high memory pages that were not
reserved and that should be initialized and passed to the buddy allocator.

Replace open-coded implementation of for_each_free_mem_range() with usage
of memblock API to simplify the code.

Link: https://lkml.kernel.org/r/20200818151634.14343-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>	[xtensa]
Tested-by: Max Filippov <jcmvbkbc@gmail.com>	[xtensa]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm/mm/init.c    |   48 +++++-----------------------------
 arch/xtensa/mm/init.c |   55 +++++++---------------------------------
 2 files changed, 18 insertions(+), 85 deletions(-)

--- a/arch/arm/mm/init.c~arm-xtensa-simplify-initialization-of-high-memory-pages
+++ a/arch/arm/mm/init.c
@@ -347,61 +347,29 @@ static void __init free_unused_memmap(vo
 #endif
 }
 
-#ifdef CONFIG_HIGHMEM
-static inline void free_area_high(unsigned long pfn, unsigned long end)
-{
-	for (; pfn < end; pfn++)
-		free_highmem_page(pfn_to_page(pfn));
-}
-#endif
-
 static void __init free_highpages(void)
 {
 #ifdef CONFIG_HIGHMEM
 	unsigned long max_low = max_low_pfn;
-	struct memblock_region *mem, *res;
+	phys_addr_t range_start, range_end;
+	u64 i;
 
 	/* set highmem page free */
-	for_each_memblock(memory, mem) {
-		unsigned long start = memblock_region_memory_base_pfn(mem);
-		unsigned long end = memblock_region_memory_end_pfn(mem);
+	for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE,
+				&range_start, &range_end, NULL) {
+		unsigned long start = PHYS_PFN(range_start);
+		unsigned long end = PHYS_PFN(range_end);
 
 		/* Ignore complete lowmem entries */
 		if (end <= max_low)
 			continue;
 
-		if (memblock_is_nomap(mem))
-			continue;
-
 		/* Truncate partial highmem entries */
 		if (start < max_low)
 			start = max_low;
 
-		/* Find and exclude any reserved regions */
-		for_each_memblock(reserved, res) {
-			unsigned long res_start, res_end;
-
-			res_start = memblock_region_reserved_base_pfn(res);
-			res_end = memblock_region_reserved_end_pfn(res);
-
-			if (res_end < start)
-				continue;
-			if (res_start < start)
-				res_start = start;
-			if (res_start > end)
-				res_start = end;
-			if (res_end > end)
-				res_end = end;
-			if (res_start != start)
-				free_area_high(start, res_start);
-			start = res_end;
-			if (start == end)
-				break;
-		}
-
-		/* And now free anything which remains */
-		if (start < end)
-			free_area_high(start, end);
+		for (; start < end; start++)
+			free_highmem_page(pfn_to_page(start));
 	}
 #endif
 }
--- a/arch/xtensa/mm/init.c~arm-xtensa-simplify-initialization-of-high-memory-pages
+++ a/arch/xtensa/mm/init.c
@@ -79,67 +79,32 @@ void __init zones_init(void)
 	free_area_init(max_zone_pfn);
 }
 
-#ifdef CONFIG_HIGHMEM
-static void __init free_area_high(unsigned long pfn, unsigned long end)
-{
-	for (; pfn < end; pfn++)
-		free_highmem_page(pfn_to_page(pfn));
-}
-
 static void __init free_highpages(void)
 {
+#ifdef CONFIG_HIGHMEM
 	unsigned long max_low = max_low_pfn;
-	struct memblock_region *mem, *res;
+	phys_addr_t range_start, range_end;
+	u64 i;
 
-	reset_all_zones_managed_pages();
 	/* set highmem page free */
-	for_each_memblock(memory, mem) {
-		unsigned long start = memblock_region_memory_base_pfn(mem);
-		unsigned long end = memblock_region_memory_end_pfn(mem);
+	for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE,
+				&range_start, &range_end, NULL) {
+		unsigned long start = PHYS_PFN(range_start);
+		unsigned long end = PHYS_PFN(range_end);
 
 		/* Ignore complete lowmem entries */
 		if (end <= max_low)
 			continue;
 
-		if (memblock_is_nomap(mem))
-			continue;
-
 		/* Truncate partial highmem entries */
 		if (start < max_low)
 			start = max_low;
 
-		/* Find and exclude any reserved regions */
-		for_each_memblock(reserved, res) {
-			unsigned long res_start, res_end;
-
-			res_start = memblock_region_reserved_base_pfn(res);
-			res_end = memblock_region_reserved_end_pfn(res);
-
-			if (res_end < start)
-				continue;
-			if (res_start < start)
-				res_start = start;
-			if (res_start > end)
-				res_start = end;
-			if (res_end > end)
-				res_end = end;
-			if (res_start != start)
-				free_area_high(start, res_start);
-			start = res_end;
-			if (start == end)
-				break;
-		}
-
-		/* And now free anything which remains */
-		if (start < end)
-			free_area_high(start, end);
+		for (; start < end; start++)
+			free_highmem_page(pfn_to_page(start));
 	}
-}
-#else
-static void __init free_highpages(void)
-{
-}
 #endif
+}
 
 /*
  * Initialize memory pages.
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + arm64-numa-simplify-dummy_numa_init.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (105 preceding siblings ...)
  2020-08-19 19:27 ` + arm-xtensa-simplify-initialization-of-high-memory-pages.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch " Andrew Morton
                   ` (32 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: arm64: numa: simplify dummy_numa_init()
has been added to the -mm tree.  Its filename is
     arm64-numa-simplify-dummy_numa_init.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/arm64-numa-simplify-dummy_numa_init.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/arm64-numa-simplify-dummy_numa_init.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: arm64: numa: simplify dummy_numa_init()

dummy_numa_init() loops over memblock.memory and passes nid=0 to
numa_add_memblk() which essentially wraps memblock_set_node().  However,
memblock_set_node() can cope with entire memory span itself, so the loop
over memblock.memory regions is redundant.

Using a single call to memblock_set_node() rather than a loop also fixes
an issue with a buggy ACPI firmware in which the SRAT table covers some
but not all of the memory in the EFI memory map.

Jonathan Cameron says:

  This issue can be easily triggered by having an SRAT table which fails
  to cover all elements of the EFI memory map.

  This firmware error is detected and a warning printed. e.g.
  "NUMA: Warning: invalid memblk node 64 [mem 0x240000000-0x27fffffff]"
  At that point we fall back to dummy_numa_init().

  However, the failed ACPI init has left us with our memblocks all broken
  up as we split them when trying to assign them to NUMA nodes.

  We then iterate over the memblocks and add them to node 0.

  numa_add_memblk() calls memblock_set_node() which merges regions that
  were previously split up during the earlier attempt to add them to
  different nodes during parsing of SRAT.

  This means elements are moved in the memblock array and we can end up
  in a different memblock after the call to numa_add_memblk().
  Result is:

  Unable to handle kernel paging request at virtual address 0000000000003a40
  Mem abort info:
    ESR = 0x96000004
    EC = 0x25: DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
  Data abort info:
    ISV = 0, ISS = 0x00000004
    CM = 0, WnR = 0
  [0000000000003a40] user address but active_mm is swapper
  Internal error: Oops: 96000004 [#1] PREEMPT SMP

  ...

  Call trace:
    sparse_init_nid+0x5c/0x2b0
    sparse_init+0x138/0x170
    bootmem_init+0x80/0xe0
    setup_arch+0x2a0/0x5fc
    start_kernel+0x8c/0x648

Replace the loop with a single call to memblock_set_node() to the entire
memory.

Link: https://lkml.kernel.org/r/20200818151634.14343-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/mm/numa.c |   13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

--- a/arch/arm64/mm/numa.c~arm64-numa-simplify-dummy_numa_init
+++ a/arch/arm64/mm/numa.c
@@ -423,19 +423,16 @@ out_free_distance:
  */
 static int __init dummy_numa_init(void)
 {
+	phys_addr_t start = memblock_start_of_DRAM();
+	phys_addr_t end = memblock_end_of_DRAM();
 	int ret;
-	struct memblock_region *mblk;
 
 	if (numa_off)
 		pr_info("NUMA disabled\n"); /* Forced off on command line. */
-	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
-		memblock_start_of_DRAM(), memblock_end_of_DRAM() - 1);
-
-	for_each_memblock(memory, mblk) {
-		ret = numa_add_memblk(0, mblk->base, mblk->base + mblk->size);
-		if (!ret)
-			continue;
+	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n", start, end - 1);
 
+	ret = numa_add_memblk(0, start, end);
+	if (ret) {
 		pr_err("NUMA init failed\n");
 		return ret;
 	}
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (106 preceding siblings ...)
  2020-08-19 19:27 ` + arm64-numa-simplify-dummy_numa_init.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + riscv-drop-unneeded-node-initialization.patch " Andrew Morton
                   ` (31 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: h8300, nds32, openrisc: simplify detection of memory extents
has been added to the -mm tree.  Its filename is
     h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: h8300, nds32, openrisc: simplify detection of memory extents

Instead of traversing memblock.memory regions to find memory_start and
memory_end, simply query memblock_{start,end}_of_DRAM().

Link: https://lkml.kernel.org/r/20200818151634.14343-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Stafford Horne <shorne@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/h8300/kernel/setup.c    |    8 +++-----
 arch/nds32/kernel/setup.c    |    8 ++------
 arch/openrisc/kernel/setup.c |    9 ++-------
 3 files changed, 7 insertions(+), 18 deletions(-)

--- a/arch/h8300/kernel/setup.c~h8300-nds32-openrisc-simplify-detection-of-memory-extents
+++ a/arch/h8300/kernel/setup.c
@@ -74,17 +74,15 @@ static void __init bootmem_init(void)
 	memory_end = memory_start = 0;
 
 	/* Find main memory where is the kernel */
-	for_each_memblock(memory, region) {
-		memory_start = region->base;
-		memory_end = region->base + region->size;
-	}
+	memory_start = memblock_start_of_DRAM();
+	memory_end = memblock_end_of_DRAM();
 
 	if (!memory_end)
 		panic("No memory!");
 
 	/* setup bootmem globals (we use no_bootmem, but mm still depends on this) */
 	min_low_pfn = PFN_UP(memory_start);
-	max_low_pfn = PFN_DOWN(memblock_end_of_DRAM());
+	max_low_pfn = PFN_DOWN(memory_end);
 	max_pfn = max_low_pfn;
 
 	memblock_reserve(__pa(_stext), _end - _stext);
--- a/arch/nds32/kernel/setup.c~h8300-nds32-openrisc-simplify-detection-of-memory-extents
+++ a/arch/nds32/kernel/setup.c
@@ -249,12 +249,8 @@ static void __init setup_memory(void)
 	memory_end = memory_start = 0;
 
 	/* Find main memory where is the kernel */
-	for_each_memblock(memory, region) {
-		memory_start = region->base;
-		memory_end = region->base + region->size;
-		pr_info("%s: Memory: 0x%x-0x%x\n", __func__,
-			memory_start, memory_end);
-	}
+	memory_start = memblock_start_of_DRAM();
+	memory_end = memblock_end_of_DRAM();
 
 	if (!memory_end) {
 		panic("No memory!");
--- a/arch/openrisc/kernel/setup.c~h8300-nds32-openrisc-simplify-detection-of-memory-extents
+++ a/arch/openrisc/kernel/setup.c
@@ -48,17 +48,12 @@ static void __init setup_memory(void)
 	unsigned long ram_start_pfn;
 	unsigned long ram_end_pfn;
 	phys_addr_t memory_start, memory_end;
-	struct memblock_region *region;
 
 	memory_end = memory_start = 0;
 
 	/* Find main memory where is the kernel, we assume its the only one */
-	for_each_memblock(memory, region) {
-		memory_start = region->base;
-		memory_end = region->base + region->size;
-		printk(KERN_INFO "%s: Memory: 0x%x-0x%x\n", __func__,
-		       memory_start, memory_end);
-	}
+	memory_start = memblock_start_of_DRAM();
+	memory_end = memblock_end_of_DRAM();
 
 	if (!memory_end) {
 		panic("No memory!");
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + riscv-drop-unneeded-node-initialization.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (107 preceding siblings ...)
  2020-08-19 19:27 ` + h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch " Andrew Morton
                   ` (30 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: riscv: drop unneeded node initialization
has been added to the -mm tree.  Its filename is
     riscv-drop-unneeded-node-initialization.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/riscv-drop-unneeded-node-initialization.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/riscv-drop-unneeded-node-initialization.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: riscv: drop unneeded node initialization

RISC-V does not (yet) support NUMA and for UMA architectures node 0 is
used implicitly during early memory initialization.

There is no need to call memblock_set_node(), remove this call and the
surrounding code.

Link: https://lkml.kernel.org/r/20200818151634.14343-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/riscv/mm/init.c |    9 ---------
 1 file changed, 9 deletions(-)

--- a/arch/riscv/mm/init.c~riscv-drop-unneeded-node-initialization
+++ a/arch/riscv/mm/init.c
@@ -191,15 +191,6 @@ void __init setup_bootmem(void)
 	early_init_fdt_scan_reserved_mem();
 	memblock_allow_resize();
 	memblock_dump_all();
-
-	for_each_memblock(memory, reg) {
-		unsigned long start_pfn = memblock_region_memory_base_pfn(reg);
-		unsigned long end_pfn = memblock_region_memory_end_pfn(reg);
-
-		memblock_set_node(PFN_PHYS(start_pfn),
-				  PFN_PHYS(end_pfn - start_pfn),
-				  &memblock.memory, 0);
-	}
 }
 
 #ifdef CONFIG_MMU
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (108 preceding siblings ...)
  2020-08-19 19:27 ` + riscv-drop-unneeded-node-initialization.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + memblock-make-for_each_memblock_type-iterator-private.patch " Andrew Morton
                   ` (29 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: mircoblaze: drop unneeded NUMA and sparsemem initializations
has been added to the -mm tree.  Its filename is
     mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: mircoblaze: drop unneeded NUMA and sparsemem initializations

microblaze does not support neither NUMA not SPARSMEM, so there is no
point to call memblock_set_node() and
sparse_memory_present_with_active_regions() functions during microblaze
memory initialization.

Remove these calls and the surrounding code.

Link: https://lkml.kernel.org/r/20200818151634.14343-8-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/microblaze/mm/init.c |   14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

--- a/arch/microblaze/mm/init.c~mircoblaze-drop-unneeded-numa-and-sparsemem-initializations
+++ a/arch/microblaze/mm/init.c
@@ -105,9 +105,8 @@ static void __init paging_init(void)
 
 void __init setup_memory(void)
 {
-	struct memblock_region *reg;
-
 #ifndef CONFIG_MMU
+	struct memblock_region *reg;
 	u32 kernel_align_start, kernel_align_size;
 
 	/* Find main memory where is the kernel */
@@ -161,17 +160,6 @@ void __init setup_memory(void)
 	pr_info("%s: max_low_pfn: %#lx\n", __func__, max_low_pfn);
 	pr_info("%s: max_pfn: %#lx\n", __func__, max_pfn);
 
-	/* Add active regions with valid PFNs */
-	for_each_memblock(memory, reg) {
-		unsigned long start_pfn, end_pfn;
-
-		start_pfn = memblock_region_memory_base_pfn(reg);
-		end_pfn = memblock_region_memory_end_pfn(reg);
-		memblock_set_node(start_pfn << PAGE_SHIFT,
-				  (end_pfn - start_pfn) << PAGE_SHIFT,
-				  &memblock.memory, 0);
-	}
-
 	paging_init();
 }
 
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + memblock-make-for_each_memblock_type-iterator-private.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (109 preceding siblings ...)
  2020-08-19 19:27 ` + mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + memblock-make-memblock_debug-and-related-functionality-private.patch " Andrew Morton
                   ` (28 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: memblock: make for_each_memblock_type() iterator private
has been added to the -mm tree.  Its filename is
     memblock-make-for_each_memblock_type-iterator-private.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/memblock-make-for_each_memblock_type-iterator-private.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/memblock-make-for_each_memblock_type-iterator-private.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: memblock: make for_each_memblock_type() iterator private

for_each_memblock_type() is not used outside mm/memblock.c, move it there
from include/linux/memblock.h

Link: https://lkml.kernel.org/r/20200818151634.14343-9-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memblock.h |    5 -----
 mm/memblock.c            |    5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

--- a/include/linux/memblock.h~memblock-make-for_each_memblock_type-iterator-private
+++ a/include/linux/memblock.h
@@ -552,11 +552,6 @@ static inline unsigned long memblock_reg
 	     region < (memblock.memblock_type.regions + memblock.memblock_type.cnt);	\
 	     region++)
 
-#define for_each_memblock_type(i, memblock_type, rgn)			\
-	for (i = 0, rgn = &memblock_type->regions[0];			\
-	     i < memblock_type->cnt;					\
-	     i++, rgn = &memblock_type->regions[i])
-
 extern void *alloc_large_system_hash(const char *tablename,
 				     unsigned long bucketsize,
 				     unsigned long numentries,
--- a/mm/memblock.c~memblock-make-for_each_memblock_type-iterator-private
+++ a/mm/memblock.c
@@ -132,6 +132,11 @@ struct memblock_type physmem = {
 };
 #endif
 
+#define for_each_memblock_type(i, memblock_type, rgn)			\
+	for (i = 0, rgn = &memblock_type->regions[0];			\
+	     i < memblock_type->cnt;					\
+	     i++, rgn = &memblock_type->regions[i])
+
 int memblock_debug __initdata_memblock;
 static bool system_has_some_mirror __initdata_memblock = false;
 static int memblock_can_resize __initdata_memblock;
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + memblock-make-memblock_debug-and-related-functionality-private.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (110 preceding siblings ...)
  2020-08-19 19:27 ` + memblock-make-for_each_memblock_type-iterator-private.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + memblock-make-memblock_debug-and-related-functionality-private-fix.patch " Andrew Morton
                   ` (27 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: memblock: make memblock_debug and related functionality private
has been added to the -mm tree.  Its filename is
     memblock-make-memblock_debug-and-related-functionality-private.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/memblock-make-memblock_debug-and-related-functionality-private.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/memblock-make-memblock_debug-and-related-functionality-private.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: memblock: make memblock_debug and related functionality private

The only user of memblock_dbg() outside memblock was s390 setup code and
it is converted to use pr_debug() instead.  This allows to stop exposing
memblock_debug and memblock_dbg() to the rest of the kernel.

Link: https://lkml.kernel.org/r/20200818151634.14343-10-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/s390/kernel/setup.c |    4 ++--
 include/linux/memblock.h |   12 +-----------
 mm/memblock.c            |   13 +++++++++++--
 3 files changed, 14 insertions(+), 15 deletions(-)

--- a/arch/s390/kernel/setup.c~memblock-make-memblock_debug-and-related-functionality-private
+++ a/arch/s390/kernel/setup.c
@@ -776,8 +776,8 @@ static void __init memblock_add_mem_dete
 	unsigned long start, end;
 	int i;
 
-	memblock_dbg("physmem info source: %s (%hhd)\n",
-		     get_mem_info_source(), mem_detect.info_source);
+	pr_debug("physmem info source: %s (%hhd)\n",
+		 get_mem_info_source(), mem_detect.info_source);
 	/* keep memblock lists close to the kernel */
 	memblock_set_bottom_up(true);
 	for_each_mem_detect_block(i, &start, &end) {
--- a/include/linux/memblock.h~memblock-make-memblock_debug-and-related-functionality-private
+++ a/include/linux/memblock.h
@@ -86,7 +86,6 @@ struct memblock {
 };
 
 extern struct memblock memblock;
-extern int memblock_debug;
 
 #ifndef CONFIG_ARCH_KEEP_MEMBLOCK
 #define __init_memblock __meminit
@@ -98,9 +97,6 @@ void memblock_discard(void);
 static inline void memblock_discard(void) {}
 #endif
 
-#define memblock_dbg(fmt, ...) \
-	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
-
 phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
 				   phys_addr_t size, phys_addr_t align);
 void memblock_allow_resize(void);
@@ -476,13 +472,7 @@ bool memblock_is_region_memory(phys_addr
 bool memblock_is_reserved(phys_addr_t addr);
 bool memblock_is_region_reserved(phys_addr_t base, phys_addr_t size);
 
-extern void __memblock_dump_all(void);
-
-static inline void memblock_dump_all(void)
-{
-	if (memblock_debug)
-		__memblock_dump_all();
-}
+void memblock_dump_all(void);
 
 /**
  * memblock_set_current_limit - Set the current allocation limit to allow
--- a/mm/memblock.c~memblock-make-memblock_debug-and-related-functionality-private
+++ a/mm/memblock.c
@@ -137,7 +137,10 @@ struct memblock_type physmem = {
 	     i < memblock_type->cnt;					\
 	     i++, rgn = &memblock_type->regions[i])
 
-int memblock_debug __initdata_memblock;
+#define memblock_dbg(fmt, ...) \
+	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+
+static int memblock_debug __initdata_memblock;
 static bool system_has_some_mirror __initdata_memblock = false;
 static int memblock_can_resize __initdata_memblock;
 static int memblock_memory_in_slab __initdata_memblock = 0;
@@ -1920,7 +1923,7 @@ static void __init_memblock memblock_dum
 	}
 }
 
-void __init_memblock __memblock_dump_all(void)
+static void __init_memblock __memblock_dump_all(void)
 {
 	pr_info("MEMBLOCK configuration:\n");
 	pr_info(" memory size = %pa reserved size = %pa\n",
@@ -1934,6 +1937,12 @@ void __init_memblock __memblock_dump_all
 #endif
 }
 
+void __init_memblock memblock_dump_all(void)
+{
+	if (memblock_debug)
+		__memblock_dump_all();
+}
+
 void __init memblock_allow_resize(void)
 {
 	memblock_can_resize = 1;
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + memblock-make-memblock_debug-and-related-functionality-private-fix.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (111 preceding siblings ...)
  2020-08-19 19:27 ` + memblock-make-memblock_debug-and-related-functionality-private.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + memblock-reduce-number-of-parameters-in-for_each_mem_range.patch " Andrew Morton
                   ` (26 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: akpm, mm-commits, rppt


The patch titled
     Subject: memblock-make-memblock_debug-and-related-functionality-private-fix
has been added to the -mm tree.  Its filename is
     memblock-make-memblock_debug-and-related-functionality-private-fix.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/memblock-make-memblock_debug-and-related-functionality-private-fix.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/memblock-make-memblock_debug-and-related-functionality-private-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: memblock-make-memblock_debug-and-related-functionality-private-fix

make memblock_dbg() safer and neater

Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memblock.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/mm/memblock.c~memblock-make-memblock_debug-and-related-functionality-private-fix
+++ a/mm/memblock.c
@@ -137,8 +137,11 @@ struct memblock_type physmem = {
 	     i < memblock_type->cnt;					\
 	     i++, rgn = &memblock_type->regions[i])
 
-#define memblock_dbg(fmt, ...) \
-	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+#define memblock_dbg(fmt, ...)						\
+	do {								\
+		if (memblock_debug)					\
+			pr_info(fmt, ##__VA_ARGS__);			\
+	} while (0)
 
 static int memblock_debug __initdata_memblock;
 static bool system_has_some_mirror __initdata_memblock = false;
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

mm.patch
memblock-make-memblock_debug-and-related-functionality-private-fix.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix-2.patch
kernel-forkc-export-kernel_thread-to-modules.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + memblock-reduce-number-of-parameters-in-for_each_mem_range.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (112 preceding siblings ...)
  2020-08-19 19:27 ` + memblock-make-memblock_debug-and-related-functionality-private-fix.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch " Andrew Morton
                   ` (25 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: memblock: reduce number of parameters in for_each_mem_range()
has been added to the -mm tree.  Its filename is
     memblock-reduce-number-of-parameters-in-for_each_mem_range.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/memblock-reduce-number-of-parameters-in-for_each_mem_range.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: memblock: reduce number of parameters in for_each_mem_range()

Currently for_each_mem_range() and for_each_mem_range_rev() iterators are
the most generic way to traverse memblock regions.  As such, they have 8
parameters and they are hardly convenient to users.  Most users choose to
utilize one of their wrappers and the only user that actually needs most
of the parameters is memblock itself.

To avoid yet another naming for memblock iterators, rename the existing
for_each_mem_range[_rev]() to __for_each_mem_range[_rev]() and add a new
for_each_mem_range[_rev]() wrappers with only index, start and end
parameters.

The new wrapper nicely fits into init_unavailable_mem() and will be used
in upcoming changes to simplify memblock traversals.

Link: https://lkml.kernel.org/r/20200818151634.14343-11-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	[MIPS]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .clang-format                          |    2 +
 arch/arm64/kernel/machine_kexec_file.c |    6 +--
 arch/powerpc/kexec/file_load_64.c      |    6 +--
 include/linux/memblock.h               |   41 +++++++++++++++++------
 mm/page_alloc.c                        |    3 -
 5 files changed, 38 insertions(+), 20 deletions(-)

--- a/arch/arm64/kernel/machine_kexec_file.c~memblock-reduce-number-of-parameters-in-for_each_mem_range
+++ a/arch/arm64/kernel/machine_kexec_file.c
@@ -215,8 +215,7 @@ static int prepare_elf_headers(void **ad
 	phys_addr_t start, end;
 
 	nr_ranges = 1; /* for exclusion of crashkernel region */
-	for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
-					MEMBLOCK_NONE, &start, &end, NULL)
+	for_each_mem_range(i, &start, &end)
 		nr_ranges++;
 
 	cmem = kmalloc(struct_size(cmem, ranges, nr_ranges), GFP_KERNEL);
@@ -225,8 +224,7 @@ static int prepare_elf_headers(void **ad
 
 	cmem->max_nr_ranges = nr_ranges;
 	cmem->nr_ranges = 0;
-	for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
-					MEMBLOCK_NONE, &start, &end, NULL) {
+	for_each_mem_range(i, &start, &end) {
 		cmem->ranges[cmem->nr_ranges].start = start;
 		cmem->ranges[cmem->nr_ranges].end = end - 1;
 		cmem->nr_ranges++;
--- a/arch/powerpc/kexec/file_load_64.c~memblock-reduce-number-of-parameters-in-for_each_mem_range
+++ a/arch/powerpc/kexec/file_load_64.c
@@ -250,8 +250,7 @@ static int __locate_mem_hole_top_down(st
 	phys_addr_t start, end;
 	u64 i;
 
-	for_each_mem_range_rev(i, &memblock.memory, NULL, NUMA_NO_NODE,
-			       MEMBLOCK_NONE, &start, &end, NULL) {
+	for_each_mem_range_rev(i, &start, &end) {
 		/*
 		 * memblock uses [start, end) convention while it is
 		 * [start, end] here. Fix the off-by-one to have the
@@ -350,8 +349,7 @@ static int __locate_mem_hole_bottom_up(s
 	phys_addr_t start, end;
 	u64 i;
 
-	for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
-			   MEMBLOCK_NONE, &start, &end, NULL) {
+	for_each_mem_range(i, &start, &end) {
 		/*
 		 * memblock uses [start, end) convention while it is
 		 * [start, end] here. Fix the off-by-one to have the
--- a/.clang-format~memblock-reduce-number-of-parameters-in-for_each_mem_range
+++ a/.clang-format
@@ -205,7 +205,9 @@ ForEachMacros:
   - 'for_each_memblock_type'
   - 'for_each_memcg_cache_index'
   - 'for_each_mem_pfn_range'
+  - '__for_each_mem_range'
   - 'for_each_mem_range'
+  - '__for_each_mem_range_rev'
   - 'for_each_mem_range_rev'
   - 'for_each_migratetype_order'
   - 'for_each_msi_entry'
--- a/include/linux/memblock.h~memblock-reduce-number-of-parameters-in-for_each_mem_range
+++ a/include/linux/memblock.h
@@ -162,7 +162,7 @@ static inline void __next_physmem_range(
 #endif /* CONFIG_HAVE_MEMBLOCK_PHYS_MAP */
 
 /**
- * for_each_mem_range - iterate through memblock areas from type_a and not
+ * __for_each_mem_range - iterate through memblock areas from type_a and not
  * included in type_b. Or just type_a if type_b is NULL.
  * @i: u64 used as loop variable
  * @type_a: ptr to memblock_type to iterate
@@ -173,7 +173,7 @@ static inline void __next_physmem_range(
  * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
  * @p_nid: ptr to int for nid of the range, can be %NULL
  */
-#define for_each_mem_range(i, type_a, type_b, nid, flags,		\
+#define __for_each_mem_range(i, type_a, type_b, nid, flags,		\
 			   p_start, p_end, p_nid)			\
 	for (i = 0, __next_mem_range(&i, nid, flags, type_a, type_b,	\
 				     p_start, p_end, p_nid);		\
@@ -182,7 +182,7 @@ static inline void __next_physmem_range(
 			      p_start, p_end, p_nid))
 
 /**
- * for_each_mem_range_rev - reverse iterate through memblock areas from
+ * __for_each_mem_range_rev - reverse iterate through memblock areas from
  * type_a and not included in type_b. Or just type_a if type_b is NULL.
  * @i: u64 used as loop variable
  * @type_a: ptr to memblock_type to iterate
@@ -193,16 +193,37 @@ static inline void __next_physmem_range(
  * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
  * @p_nid: ptr to int for nid of the range, can be %NULL
  */
-#define for_each_mem_range_rev(i, type_a, type_b, nid, flags,		\
-			       p_start, p_end, p_nid)			\
+#define __for_each_mem_range_rev(i, type_a, type_b, nid, flags,		\
+				 p_start, p_end, p_nid)			\
 	for (i = (u64)ULLONG_MAX,					\
-		     __next_mem_range_rev(&i, nid, flags, type_a, type_b,\
+		     __next_mem_range_rev(&i, nid, flags, type_a, type_b, \
 					  p_start, p_end, p_nid);	\
 	     i != (u64)ULLONG_MAX;					\
 	     __next_mem_range_rev(&i, nid, flags, type_a, type_b,	\
 				  p_start, p_end, p_nid))
 
 /**
+ * for_each_mem_range - iterate through memory areas.
+ * @i: u64 used as loop variable
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ */
+#define for_each_mem_range(i, p_start, p_end) \
+	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,	\
+			     MEMBLOCK_NONE, p_start, p_end, NULL)
+
+/**
+ * for_each_mem_range_rev - reverse iterate through memblock areas from
+ * type_a and not included in type_b. Or just type_a if type_b is NULL.
+ * @i: u64 used as loop variable
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ */
+#define for_each_mem_range_rev(i, p_start, p_end)			\
+	__for_each_mem_range_rev(i, &memblock.memory, NULL, NUMA_NO_NODE, \
+				 MEMBLOCK_NONE, p_start, p_end, NULL)
+
+/**
  * for_each_reserved_mem_region - iterate over all reserved memblock areas
  * @i: u64 used as loop variable
  * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
@@ -307,8 +328,8 @@ int __init deferred_page_init_max_thread
  * soon as memblock is initialized.
  */
 #define for_each_free_mem_range(i, nid, flags, p_start, p_end, p_nid)	\
-	for_each_mem_range(i, &memblock.memory, &memblock.reserved,	\
-			   nid, flags, p_start, p_end, p_nid)
+	__for_each_mem_range(i, &memblock.memory, &memblock.reserved,	\
+			     nid, flags, p_start, p_end, p_nid)
 
 /**
  * for_each_free_mem_range_reverse - rev-iterate through free memblock areas
@@ -324,8 +345,8 @@ int __init deferred_page_init_max_thread
  */
 #define for_each_free_mem_range_reverse(i, nid, flags, p_start, p_end,	\
 					p_nid)				\
-	for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,	\
-			       nid, flags, p_start, p_end, p_nid)
+	__for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved, \
+				 nid, flags, p_start, p_end, p_nid)
 
 int memblock_set_node(phys_addr_t base, phys_addr_t size,
 		      struct memblock_type *type, int nid);
--- a/mm/page_alloc.c~memblock-reduce-number-of-parameters-in-for_each_mem_range
+++ a/mm/page_alloc.c
@@ -6984,8 +6984,7 @@ static void __init init_unavailable_mem(
 	 * Loop through unavailable ranges not covered by memblock.memory.
 	 */
 	pgcnt = 0;
-	for_each_mem_range(i, &memblock.memory, NULL,
-			NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) {
+	for_each_mem_range(i, &start, &end) {
 		if (next < start)
 			pgcnt += init_unavailable_range(PFN_DOWN(next),
 							PFN_UP(start));
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (113 preceding siblings ...)
  2020-08-19 19:27 ` + memblock-reduce-number-of-parameters-in-for_each_mem_range.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:27 ` + arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch " Andrew Morton
                   ` (24 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()
has been added to the -mm tree.  Its filename is
     arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()

There are several occurrences of the following pattern:

	for_each_memblock(memory, reg) {
		start_pfn = memblock_region_memory_base_pfn(reg);
		end_pfn = memblock_region_memory_end_pfn(reg);

		/* do something with start_pfn and end_pfn */
	}

Rather than iterate over all memblock.memory regions and each time query
for their start and end PFNs, use for_each_mem_pfn_range() iterator to get
simpler and clearer code.

Link: https://lkml.kernel.org/r/20200818151634.14343-12-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	[.clang-format]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm/mm/init.c           |   11 ++++-------
 arch/arm64/mm/init.c         |   11 ++++-------
 arch/powerpc/kernel/fadump.c |   11 ++++++-----
 arch/powerpc/mm/mem.c        |   15 ++++++++-------
 arch/powerpc/mm/numa.c       |    7 ++-----
 arch/s390/mm/page-states.c   |    6 ++----
 arch/sh/mm/init.c            |    9 +++------
 mm/memblock.c                |    6 ++----
 mm/sparse.c                  |   10 ++++------
 9 files changed, 35 insertions(+), 51 deletions(-)

--- a/arch/arm64/mm/init.c~arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range
+++ a/arch/arm64/mm/init.c
@@ -471,12 +471,10 @@ static inline void free_memmap(unsigned
  */
 static void __init free_unused_memmap(void)
 {
-	unsigned long start, prev_end = 0;
-	struct memblock_region *reg;
-
-	for_each_memblock(memory, reg) {
-		start = __phys_to_pfn(reg->base);
+	unsigned long start, end, prev_end = 0;
+	int i;
 
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, NULL) {
 #ifdef CONFIG_SPARSEMEM
 		/*
 		 * Take care not to free memmap entries that don't exist due
@@ -496,8 +494,7 @@ static void __init free_unused_memmap(vo
 		 * memmap entries are valid from the bank end aligned to
 		 * MAX_ORDER_NR_PAGES.
 		 */
-		prev_end = ALIGN(__phys_to_pfn(reg->base + reg->size),
-				 MAX_ORDER_NR_PAGES);
+		prev_end = ALIGN(end, MAX_ORDER_NR_PAGES);
 	}
 
 #ifdef CONFIG_SPARSEMEM
--- a/arch/arm/mm/init.c~arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range
+++ a/arch/arm/mm/init.c
@@ -299,16 +299,14 @@ free_memmap(unsigned long start_pfn, uns
  */
 static void __init free_unused_memmap(void)
 {
-	unsigned long start, prev_end = 0;
-	struct memblock_region *reg;
+	unsigned long start, end, prev_end = 0;
+	int i;
 
 	/*
 	 * This relies on each bank being in address order.
 	 * The banks are sorted previously in bootmem_init().
 	 */
-	for_each_memblock(memory, reg) {
-		start = memblock_region_memory_base_pfn(reg);
-
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, NULL) {
 #ifdef CONFIG_SPARSEMEM
 		/*
 		 * Take care not to free memmap entries that don't exist
@@ -336,8 +334,7 @@ static void __init free_unused_memmap(vo
 		 * memmap entries are valid from the bank end aligned to
 		 * MAX_ORDER_NR_PAGES.
 		 */
-		prev_end = ALIGN(memblock_region_memory_end_pfn(reg),
-				 MAX_ORDER_NR_PAGES);
+		prev_end = ALIGN(end, MAX_ORDER_NR_PAGES);
 	}
 
 #ifdef CONFIG_SPARSEMEM
--- a/arch/powerpc/kernel/fadump.c~arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range
+++ a/arch/powerpc/kernel/fadump.c
@@ -1242,14 +1242,15 @@ static void fadump_free_reserved_memory(
  */
 static void fadump_release_reserved_area(u64 start, u64 end)
 {
-	u64 tstart, tend, spfn, epfn;
-	struct memblock_region *reg;
+	u64 tstart, tend, spfn, epfn, reg_spfn, reg_epfn, i;
 
 	spfn = PHYS_PFN(start);
 	epfn = PHYS_PFN(end);
-	for_each_memblock(memory, reg) {
-		tstart = max_t(u64, spfn, memblock_region_memory_base_pfn(reg));
-		tend   = min_t(u64, epfn, memblock_region_memory_end_pfn(reg));
+
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &reg_spfn, &reg_epfn, NULL) {
+		tstart = max_t(u64, spfn, reg_spfn);
+		tend   = min_t(u64, epfn, reg_epfn);
+
 		if (tstart < tend) {
 			fadump_free_reserved_memory(tstart, tend);
 
--- a/arch/powerpc/mm/mem.c~arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range
+++ a/arch/powerpc/mm/mem.c
@@ -184,15 +184,16 @@ void __init initmem_init(void)
 /* mark pages that don't exist as nosave */
 static int __init mark_nonram_nosave(void)
 {
-	struct memblock_region *reg, *prev = NULL;
+	unsigned long spfn, epfn, prev = 0;
+	int i;
 
-	for_each_memblock(memory, reg) {
-		if (prev &&
-		    memblock_region_memory_end_pfn(prev) < memblock_region_memory_base_pfn(reg))
-			register_nosave_region(memblock_region_memory_end_pfn(prev),
-					       memblock_region_memory_base_pfn(reg));
-		prev = reg;
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &spfn, &epfn, NULL) {
+		if (prev && prev < spfn)
+			register_nosave_region(prev, spfn);
+
+		prev = epfn;
 	}
+
 	return 0;
 }
 #else /* CONFIG_NEED_MULTIPLE_NODES */
--- a/arch/powerpc/mm/numa.c~arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range
+++ a/arch/powerpc/mm/numa.c
@@ -804,17 +804,14 @@ static void __init setup_nonnuma(void)
 	unsigned long total_ram = memblock_phys_mem_size();
 	unsigned long start_pfn, end_pfn;
 	unsigned int nid = 0;
-	struct memblock_region *reg;
+	int i;
 
 	printk(KERN_DEBUG "Top of RAM: 0x%lx, Total RAM: 0x%lx\n",
 	       top_of_ram, total_ram);
 	printk(KERN_DEBUG "Memory hole size: %ldMB\n",
 	       (top_of_ram - total_ram) >> 20);
 
-	for_each_memblock(memory, reg) {
-		start_pfn = memblock_region_memory_base_pfn(reg);
-		end_pfn = memblock_region_memory_end_pfn(reg);
-
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
 		fake_numa_create_new_node(end_pfn, &nid);
 		memblock_set_node(PFN_PHYS(start_pfn),
 				  PFN_PHYS(end_pfn - start_pfn),
--- a/arch/s390/mm/page-states.c~arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range
+++ a/arch/s390/mm/page-states.c
@@ -183,9 +183,9 @@ static void mark_kernel_pgd(void)
 
 void __init cmma_init_nodat(void)
 {
-	struct memblock_region *reg;
 	struct page *page;
 	unsigned long start, end, ix;
+	int i;
 
 	if (cmma_flag < 2)
 		return;
@@ -193,9 +193,7 @@ void __init cmma_init_nodat(void)
 	mark_kernel_pgd();
 
 	/* Set all kernel pages not used for page tables to stable/no-dat */
-	for_each_memblock(memory, reg) {
-		start = memblock_region_memory_base_pfn(reg);
-		end = memblock_region_memory_end_pfn(reg);
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, NULL) {
 		page = pfn_to_page(start);
 		for (ix = start; ix < end; ix++, page++) {
 			if (__test_and_clear_bit(PG_arch_1, &page->flags))
--- a/arch/sh/mm/init.c~arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range
+++ a/arch/sh/mm/init.c
@@ -226,15 +226,12 @@ void __init allocate_pgdat(unsigned int
 
 static void __init do_init_bootmem(void)
 {
-	struct memblock_region *reg;
+	unsigned long start_pfn, end_pfn;
+	int i;
 
 	/* Add active regions with valid PFNs. */
-	for_each_memblock(memory, reg) {
-		unsigned long start_pfn, end_pfn;
-		start_pfn = memblock_region_memory_base_pfn(reg);
-		end_pfn = memblock_region_memory_end_pfn(reg);
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL)
 		__add_active_range(0, start_pfn, end_pfn);
-	}
 
 	/* All of system RAM sits in node 0 for the non-NUMA case */
 	allocate_pgdat(0);
--- a/mm/memblock.c~arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range
+++ a/mm/memblock.c
@@ -1663,12 +1663,10 @@ phys_addr_t __init_memblock memblock_res
 phys_addr_t __init memblock_mem_size(unsigned long limit_pfn)
 {
 	unsigned long pages = 0;
-	struct memblock_region *r;
 	unsigned long start_pfn, end_pfn;
+	int i;
 
-	for_each_memblock(memory, r) {
-		start_pfn = memblock_region_memory_base_pfn(r);
-		end_pfn = memblock_region_memory_end_pfn(r);
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
 		start_pfn = min_t(unsigned long, start_pfn, limit_pfn);
 		end_pfn = min_t(unsigned long, end_pfn, limit_pfn);
 		pages += end_pfn - start_pfn;
--- a/mm/sparse.c~arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range
+++ a/mm/sparse.c
@@ -291,13 +291,11 @@ static void __init memory_present(int ni
  */
 static void __init memblocks_present(void)
 {
-	struct memblock_region *reg;
+	unsigned long start, end;
+	int i, nid;
 
-	for_each_memblock(memory, reg) {
-		memory_present(memblock_get_region_node(reg),
-			       memblock_region_memory_base_pfn(reg),
-			       memblock_region_memory_end_pfn(reg));
-	}
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid)
+		memory_present(nid, start, end);
 }
 
 /*
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (114 preceding siblings ...)
  2020-08-19 19:27 ` + arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch " Andrew Morton
@ 2020-08-19 19:27 ` Andrew Morton
  2020-08-19 19:28 ` + x86-setup-simplify-initrd-relocation-and-reservation.patch " Andrew Morton
                   ` (23 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:27 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: arch, drivers: replace for_each_membock() with for_each_mem_range()
has been added to the -mm tree.  Its filename is
     arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: arch, drivers: replace for_each_membock() with for_each_mem_range()

There are several occurrences of the following pattern:

	for_each_memblock(memory, reg) {
		start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
		end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));

		/* do something with start and end */
	}

Using for_each_mem_range() iterator is more appropriate in such cases and
allows simpler and cleaner code.

Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm/kernel/setup.c                  |   18 +++++--
 arch/arm/mm/mmu.c                        |   39 +++++-----------
 arch/arm/mm/pmsa-v7.c                    |   23 ++++-----
 arch/arm/mm/pmsa-v8.c                    |   17 +++----
 arch/arm/xen/mm.c                        |    7 +-
 arch/arm64/mm/kasan_init.c               |   10 ++--
 arch/arm64/mm/mmu.c                      |   11 +---
 arch/c6x/kernel/setup.c                  |    9 ++-
 arch/microblaze/mm/init.c                |    9 ++-
 arch/mips/cavium-octeon/dma-octeon.c     |   12 ++---
 arch/mips/kernel/setup.c                 |   31 ++++++-------
 arch/openrisc/mm/init.c                  |    8 ++-
 arch/powerpc/kernel/fadump.c             |   50 +++++++++------------
 arch/powerpc/kexec/file_load_64.c        |   10 +---
 arch/powerpc/mm/book3s64/hash_utils.c    |   16 +++---
 arch/powerpc/mm/book3s64/radix_pgtable.c |   10 ++--
 arch/powerpc/mm/kasan/kasan_init_32.c    |    8 +--
 arch/powerpc/mm/mem.c                    |   16 ++++--
 arch/powerpc/mm/pgtable_32.c             |    8 +--
 arch/riscv/mm/init.c                     |   25 ++++------
 arch/riscv/mm/kasan_init.c               |   10 ++--
 arch/s390/kernel/setup.c                 |   23 ++++++---
 arch/s390/mm/vmem.c                      |    7 +-
 arch/sparc/mm/init_64.c                  |   12 +----
 drivers/bus/mvebu-mbus.c                 |   12 ++---
 25 files changed, 194 insertions(+), 207 deletions(-)

--- a/arch/arm64/mm/kasan_init.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/arm64/mm/kasan_init.c
@@ -212,8 +212,8 @@ void __init kasan_init(void)
 {
 	u64 kimg_shadow_start, kimg_shadow_end;
 	u64 mod_shadow_start, mod_shadow_end;
-	struct memblock_region *reg;
-	int i;
+	phys_addr_t pa_start, pa_end;
+	u64 i;
 
 	kimg_shadow_start = (u64)kasan_mem_to_shadow(_text) & PAGE_MASK;
 	kimg_shadow_end = PAGE_ALIGN((u64)kasan_mem_to_shadow(_end));
@@ -246,9 +246,9 @@ void __init kasan_init(void)
 		kasan_populate_early_shadow((void *)mod_shadow_end,
 					    (void *)kimg_shadow_start);
 
-	for_each_memblock(memory, reg) {
-		void *start = (void *)__phys_to_virt(reg->base);
-		void *end = (void *)__phys_to_virt(reg->base + reg->size);
+	for_each_mem_range(i, &pa_start, &pa_end) {
+		void *start = (void *)__phys_to_virt(pa_start);
+		void *end = (void *)__phys_to_virt(pa_end);
 
 		if (start >= end)
 			break;
--- a/arch/arm64/mm/mmu.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/arm64/mm/mmu.c
@@ -462,8 +462,9 @@ static void __init map_mem(pgd_t *pgdp)
 {
 	phys_addr_t kernel_start = __pa_symbol(_text);
 	phys_addr_t kernel_end = __pa_symbol(__init_begin);
-	struct memblock_region *reg;
+	phys_addr_t start, end;
 	int flags = 0;
+	u64 i;
 
 	if (rodata_full || debug_pagealloc_enabled())
 		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
@@ -482,15 +483,9 @@ static void __init map_mem(pgd_t *pgdp)
 #endif
 
 	/* map all the memory banks */
-	for_each_memblock(memory, reg) {
-		phys_addr_t start = reg->base;
-		phys_addr_t end = start + reg->size;
-
+	for_each_mem_range(i, &start, &end) {
 		if (start >= end)
 			break;
-		if (memblock_is_nomap(reg))
-			continue;
-
 		__map_memblock(pgdp, start, end, PAGE_KERNEL, flags);
 	}
 
--- a/arch/arm/kernel/setup.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/arm/kernel/setup.c
@@ -843,20 +843,26 @@ early_param("mem", early_mem);
 
 static void __init request_standard_resources(const struct machine_desc *mdesc)
 {
-	struct memblock_region *region;
+	phys_addr_t start, end, res_end;
 	struct resource *res;
+	u64 i;
 
 	kernel_code.start   = virt_to_phys(_text);
 	kernel_code.end     = virt_to_phys(__init_begin - 1);
 	kernel_data.start   = virt_to_phys(_sdata);
 	kernel_data.end     = virt_to_phys(_end - 1);
 
-	for_each_memblock(memory, region) {
-		phys_addr_t start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
-		phys_addr_t end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
+	for_each_mem_range(i, &start, &end) {
 		unsigned long boot_alias_start;
 
 		/*
+		 * In memblock, end points to the first byte after the
+		 * range while in resourses, end points to the last byte in
+		 * the range.
+		 */
+		res_end = end - 1;
+
+		/*
 		 * Some systems have a special memory alias which is only
 		 * used for booting.  We need to advertise this region to
 		 * kexec-tools so they know where bootable RAM is located.
@@ -869,7 +875,7 @@ static void __init request_standard_reso
 				      __func__, sizeof(*res));
 			res->name = "System RAM (boot alias)";
 			res->start = boot_alias_start;
-			res->end = phys_to_idmap(end);
+			res->end = phys_to_idmap(res_end);
 			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
 			request_resource(&iomem_resource, res);
 		}
@@ -880,7 +886,7 @@ static void __init request_standard_reso
 			      sizeof(*res));
 		res->name  = "System RAM";
 		res->start = start;
-		res->end = end;
+		res->end = res_end;
 		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 
 		request_resource(&iomem_resource, res);
--- a/arch/arm/mm/mmu.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/arm/mm/mmu.c
@@ -1154,9 +1154,8 @@ phys_addr_t arm_lowmem_limit __initdata
 
 void __init adjust_lowmem_bounds(void)
 {
-	phys_addr_t memblock_limit = 0;
-	u64 vmalloc_limit;
-	struct memblock_region *reg;
+	phys_addr_t block_start, block_end, memblock_limit = 0;
+	u64 vmalloc_limit, i;
 	phys_addr_t lowmem_limit = 0;
 
 	/*
@@ -1172,26 +1171,18 @@ void __init adjust_lowmem_bounds(void)
 	 * The first usable region must be PMD aligned. Mark its start
 	 * as MEMBLOCK_NOMAP if it isn't
 	 */
-	for_each_memblock(memory, reg) {
-		if (!memblock_is_nomap(reg)) {
-			if (!IS_ALIGNED(reg->base, PMD_SIZE)) {
-				phys_addr_t len;
+	for_each_mem_range(i, &block_start, &block_end) {
+		if (!IS_ALIGNED(block_start, PMD_SIZE)) {
+			phys_addr_t len;
 
-				len = round_up(reg->base, PMD_SIZE) - reg->base;
-				memblock_mark_nomap(reg->base, len);
-			}
-			break;
+			len = round_up(block_start, PMD_SIZE) - block_start;
+			memblock_mark_nomap(block_start, len);
 		}
+		break;
 	}
 
-	for_each_memblock(memory, reg) {
-		phys_addr_t block_start = reg->base;
-		phys_addr_t block_end = reg->base + reg->size;
-
-		if (memblock_is_nomap(reg))
-			continue;
-
-		if (reg->base < vmalloc_limit) {
+	for_each_mem_range(i, &block_start, &block_end) {
+		if (block_start < vmalloc_limit) {
 			if (block_end > lowmem_limit)
 				/*
 				 * Compare as u64 to ensure vmalloc_limit does
@@ -1440,19 +1431,15 @@ static void __init kmap_init(void)
 
 static void __init map_lowmem(void)
 {
-	struct memblock_region *reg;
 	phys_addr_t kernel_x_start = round_down(__pa(KERNEL_START), SECTION_SIZE);
 	phys_addr_t kernel_x_end = round_up(__pa(__init_end), SECTION_SIZE);
+	phys_addr_t start, end;
+	u64 i;
 
 	/* Map all the lowmem memory banks. */
-	for_each_memblock(memory, reg) {
-		phys_addr_t start = reg->base;
-		phys_addr_t end = start + reg->size;
+	for_each_mem_range(i, &start, &end) {
 		struct map_desc map;
 
-		if (memblock_is_nomap(reg))
-			continue;
-
 		if (end > arm_lowmem_limit)
 			end = arm_lowmem_limit;
 		if (start >= end)
--- a/arch/arm/mm/pmsa-v7.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/arm/mm/pmsa-v7.c
@@ -231,12 +231,12 @@ static int __init allocate_region(phys_a
 void __init pmsav7_adjust_lowmem_bounds(void)
 {
 	phys_addr_t  specified_mem_size = 0, total_mem_size = 0;
-	struct memblock_region *reg;
-	bool first = true;
 	phys_addr_t mem_start;
 	phys_addr_t mem_end;
+	phys_addr_t reg_start, reg_end;
 	unsigned int mem_max_regions;
-	int num, i;
+	int num;
+	u64 i;
 
 	/* Free-up PMSAv7_PROBE_REGION */
 	mpu_min_region_order = __mpu_min_region_order();
@@ -262,20 +262,19 @@ void __init pmsav7_adjust_lowmem_bounds(
 	mem_max_regions -= num;
 #endif
 
-	for_each_memblock(memory, reg) {
-		if (first) {
+	for_each_mem_range(i, &reg_start, &reg_end) {
+		if (i == 0) {
 			phys_addr_t phys_offset = PHYS_OFFSET;
 
 			/*
 			 * Initially only use memory continuous from
 			 * PHYS_OFFSET */
-			if (reg->base != phys_offset)
+			if (reg_start != phys_offset)
 				panic("First memory bank must be contiguous from PHYS_OFFSET");
 
-			mem_start = reg->base;
-			mem_end = reg->base + reg->size;
-			specified_mem_size = reg->size;
-			first = false;
+			mem_start = reg_start;
+			mem_end = reg_end
+			specified_mem_size = mem_end - mem_start;
 		} else {
 			/*
 			 * memblock auto merges contiguous blocks, remove
@@ -283,8 +282,8 @@ void __init pmsav7_adjust_lowmem_bounds(
 			 * blocks separately while iterating)
 			 */
 			pr_notice("Ignoring RAM after %pa, memory at %pa ignored\n",
-				  &mem_end, &reg->base);
-			memblock_remove(reg->base, 0 - reg->base);
+				  &mem_end, &reg_start);
+			memblock_remove(reg_start, 0 - reg_start);
 			break;
 		}
 	}
--- a/arch/arm/mm/pmsa-v8.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/arm/mm/pmsa-v8.c
@@ -94,20 +94,19 @@ static __init bool is_region_fixed(int n
 void __init pmsav8_adjust_lowmem_bounds(void)
 {
 	phys_addr_t mem_end;
-	struct memblock_region *reg;
-	bool first = true;
+	phys_addr_t reg_start, reg_end;
+	u64 i;
 
-	for_each_memblock(memory, reg) {
-		if (first) {
+	for_each_mem_range(i, &reg_start, &reg_end) {
+		if (i == 0) {
 			phys_addr_t phys_offset = PHYS_OFFSET;
 
 			/*
 			 * Initially only use memory continuous from
 			 * PHYS_OFFSET */
-			if (reg->base != phys_offset)
+			if (reg_start != phys_offset)
 				panic("First memory bank must be contiguous from PHYS_OFFSET");
-			mem_end = reg->base + reg->size;
-			first = false;
+			mem_end = reg_end;
 		} else {
 			/*
 			 * memblock auto merges contiguous blocks, remove
@@ -115,8 +114,8 @@ void __init pmsav8_adjust_lowmem_bounds(
 			 * blocks separately while iterating)
 			 */
 			pr_notice("Ignoring RAM after %pa, memory at %pa ignored\n",
-				  &mem_end, &reg->base);
-			memblock_remove(reg->base, 0 - reg->base);
+				  &mem_end, &reg_start);
+			memblock_remove(reg_start, 0 - reg_start);
 			break;
 		}
 	}
--- a/arch/arm/xen/mm.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/arm/xen/mm.c
@@ -25,11 +25,12 @@
 
 unsigned long xen_get_swiotlb_free_pages(unsigned int order)
 {
-	struct memblock_region *reg;
+	phys_addr_t base;
 	gfp_t flags = __GFP_NOWARN|__GFP_KSWAPD_RECLAIM;
+	u64 i;
 
-	for_each_memblock(memory, reg) {
-		if (reg->base < (phys_addr_t)0xffffffff) {
+	for_each_mem_range(i, &base, NULL) {
+		if (base < (phys_addr_t)0xffffffff) {
 			if (IS_ENABLED(CONFIG_ZONE_DMA32))
 				flags |= __GFP_DMA32;
 			else
--- a/arch/c6x/kernel/setup.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/c6x/kernel/setup.c
@@ -287,7 +287,8 @@ notrace void __init machine_init(unsigne
 
 void __init setup_arch(char **cmdline_p)
 {
-	struct memblock_region *reg;
+	phys_addr_t start, end;
+	u64 i;
 
 	printk(KERN_INFO "Initializing kernel\n");
 
@@ -351,9 +352,9 @@ void __init setup_arch(char **cmdline_p)
 	disable_caching(ram_start, ram_end - 1);
 
 	/* Set caching of external RAM used by Linux */
-	for_each_memblock(memory, reg)
-		enable_caching(CACHE_REGION_START(reg->base),
-			       CACHE_REGION_START(reg->base + reg->size - 1));
+	for_each_mem_range(i, &start, &end)
+		enable_caching(CACHE_REGION_START(start),
+			       CACHE_REGION_START(end - 1));
 
 #ifdef CONFIG_BLK_DEV_INITRD
 	/*
--- a/arch/microblaze/mm/init.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/microblaze/mm/init.c
@@ -106,13 +106,14 @@ static void __init paging_init(void)
 void __init setup_memory(void)
 {
 #ifndef CONFIG_MMU
-	struct memblock_region *reg;
 	u32 kernel_align_start, kernel_align_size;
+	phys_addr_t start, end;
+	u64 i;
 
 	/* Find main memory where is the kernel */
-	for_each_memblock(memory, reg) {
-		memory_start = (u32)reg->base;
-		lowmem_size = reg->size;
+	for_each_mem_range(i, &start, &end) {
+		memory_start = start;
+		lowmem_size = end - start;
 		if ((memory_start <= (u32)_text) &&
 			((u32)_text <= (memory_start + lowmem_size - 1))) {
 			memory_size = lowmem_size;
--- a/arch/mips/cavium-octeon/dma-octeon.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/mips/cavium-octeon/dma-octeon.c
@@ -190,25 +190,25 @@ char *octeon_swiotlb;
 
 void __init plat_swiotlb_setup(void)
 {
-	struct memblock_region *mem;
+	phys_addr_t start, end;
 	phys_addr_t max_addr;
 	phys_addr_t addr_size;
 	size_t swiotlbsize;
 	unsigned long swiotlb_nslabs;
+	u64 i;
 
 	max_addr = 0;
 	addr_size = 0;
 
-	for_each_memblock(memory, mem) {
+	for_each_mem_range(i, &start, &end) {
 		/* These addresses map low for PCI. */
 		if (mem->base > 0x410000000ull && !OCTEON_IS_OCTEON2())
 			continue;
 
-		addr_size += mem->size;
-
-		if (max_addr < mem->base + mem->size)
-			max_addr = mem->base + mem->size;
+		addr_size += (end - start);
 
+		if (max_addr < end)
+			max_addr = end;
 	}
 
 	swiotlbsize = PAGE_SIZE;
--- a/arch/mips/kernel/setup.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/mips/kernel/setup.c
@@ -300,8 +300,9 @@ static void __init bootmem_init(void)
 
 static void __init bootmem_init(void)
 {
-	struct memblock_region *mem;
 	phys_addr_t ramstart, ramend;
+	phys_addr_t start, end;
+	u64 i;
 
 	ramstart = memblock_start_of_DRAM();
 	ramend = memblock_end_of_DRAM();
@@ -338,18 +339,13 @@ static void __init bootmem_init(void)
 
 	min_low_pfn = ARCH_PFN_OFFSET;
 	max_pfn = PFN_DOWN(ramend);
-	for_each_memblock(memory, mem) {
-		unsigned long start = memblock_region_memory_base_pfn(mem);
-		unsigned long end = memblock_region_memory_end_pfn(mem);
-
+	for_each_mem_range(i, &start, &end) {
 		/*
 		 * Skip highmem here so we get an accurate max_low_pfn if low
 		 * memory stops short of high memory.
 		 * If the region overlaps HIGHMEM_START, end is clipped so
 		 * max_pfn excludes the highmem portion.
 		 */
-		if (memblock_is_nomap(mem))
-			continue;
 		if (start >= PFN_DOWN(HIGHMEM_START))
 			continue;
 		if (end > PFN_DOWN(HIGHMEM_START))
@@ -450,13 +446,12 @@ early_param("memmap", early_parse_memmap
 unsigned long setup_elfcorehdr, setup_elfcorehdr_size;
 static int __init early_parse_elfcorehdr(char *p)
 {
-	struct memblock_region *mem;
+	phys_addr_t start, end;
+	u64 i;
 
 	setup_elfcorehdr = memparse(p, &p);
 
-	 for_each_memblock(memory, mem) {
-		unsigned long start = mem->base;
-		unsigned long end = start + mem->size;
+	for_each_mem_range(i, &start, &end) {
 		if (setup_elfcorehdr >= start && setup_elfcorehdr < end) {
 			/*
 			 * Reserve from the elf core header to the end of
@@ -720,7 +715,8 @@ static void __init arch_mem_init(char **
 
 static void __init resource_init(void)
 {
-	struct memblock_region *region;
+	phys_addr_t start, end;
+	u64 i;
 
 	if (UNCAC_BASE != IO_BASE)
 		return;
@@ -732,9 +728,7 @@ static void __init resource_init(void)
 	bss_resource.start = __pa_symbol(&__bss_start);
 	bss_resource.end = __pa_symbol(&__bss_stop) - 1;
 
-	for_each_memblock(memory, region) {
-		phys_addr_t start = PFN_PHYS(memblock_region_memory_base_pfn(region));
-		phys_addr_t end = PFN_PHYS(memblock_region_memory_end_pfn(region)) - 1;
+	for_each_mem_range(i, &start, &end) {
 		struct resource *res;
 
 		res = memblock_alloc(sizeof(struct resource), SMP_CACHE_BYTES);
@@ -743,7 +737,12 @@ static void __init resource_init(void)
 			      sizeof(struct resource));
 
 		res->start = start;
-		res->end = end;
+		/*
+		 * In memblock, end points to the first byte after the
+		 * range while in resourses, end points to the last byte in
+		 * the range.
+		 */
+		res->end = end - 1;
 		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 		res->name = "System RAM";
 
--- a/arch/openrisc/mm/init.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/openrisc/mm/init.c
@@ -64,6 +64,7 @@ extern const char _s_kernel_ro[], _e_ker
  */
 static void __init map_ram(void)
 {
+	phys_addr_t start, end;
 	unsigned long v, p, e;
 	pgprot_t prot;
 	pgd_t *pge;
@@ -71,6 +72,7 @@ static void __init map_ram(void)
 	pud_t *pue;
 	pmd_t *pme;
 	pte_t *pte;
+	u64 i;
 	/* These mark extents of read-only kernel pages...
 	 * ...from vmlinux.lds.S
 	 */
@@ -78,9 +80,9 @@ static void __init map_ram(void)
 
 	v = PAGE_OFFSET;
 
-	for_each_memblock(memory, region) {
-		p = (u32) region->base & PAGE_MASK;
-		e = p + (u32) region->size;
+	for_each_mem_range(i, &start, &end) {
+		p = (u32) start & PAGE_MASK;
+		e = (u32) end;
 
 		v = (u32) __va(p);
 		pge = pgd_offset_k(v);
--- a/arch/powerpc/kernel/fadump.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/powerpc/kernel/fadump.c
@@ -191,13 +191,13 @@ int is_fadump_active(void)
  */
 static bool is_fadump_mem_area_contiguous(u64 d_start, u64 d_end)
 {
-	struct memblock_region *reg;
+	phys_addr_t reg_start, reg_end;
 	bool ret = false;
-	u64 start, end;
+	u64 i, start, end;
 
-	for_each_memblock(memory, reg) {
-		start = max_t(u64, d_start, reg->base);
-		end = min_t(u64, d_end, (reg->base + reg->size));
+	for_each_mem_range(i, &reg_start, &reg_end) {
+		start = max_t(u64, d_start, reg_start);
+		end = min_t(u64, d_end, reg_end);
 		if (d_start < end) {
 			/* Memory hole from d_start to start */
 			if (start > d_start)
@@ -422,34 +422,34 @@ static int __init add_boot_mem_regions(u
 
 static int __init fadump_get_boot_mem_regions(void)
 {
-	unsigned long base, size, cur_size, hole_size, last_end;
+	unsigned long size, cur_size, hole_size, last_end;
 	unsigned long mem_size = fw_dump.boot_memory_size;
-	struct memblock_region *reg;
+	phys_addr_t reg_start, reg_end;
 	int ret = 1;
+	u64 i;
 
 	fw_dump.boot_mem_regs_cnt = 0;
 
 	last_end = 0;
 	hole_size = 0;
 	cur_size = 0;
-	for_each_memblock(memory, reg) {
-		base = reg->base;
-		size = reg->size;
-		hole_size += (base - last_end);
+	for_each_mem_range(i, &reg_start, &reg_end) {
+		size = reg_end - reg_start;
+		hole_size += (reg_start - last_end);
 
 		if ((cur_size + size) >= mem_size) {
 			size = (mem_size - cur_size);
-			ret = add_boot_mem_regions(base, size);
+			ret = add_boot_mem_regions(reg_start, size);
 			break;
 		}
 
 		mem_size -= size;
 		cur_size += size;
-		ret = add_boot_mem_regions(base, size);
+		ret = add_boot_mem_regions(reg_start, size);
 		if (!ret)
 			break;
 
-		last_end = base + size;
+		last_end = reg_end;
 	}
 	fw_dump.boot_mem_top = PAGE_ALIGN(fw_dump.boot_memory_size + hole_size);
 
@@ -985,9 +985,8 @@ static int fadump_init_elfcore_header(ch
  */
 static int fadump_setup_crash_memory_ranges(void)
 {
-	struct memblock_region *reg;
-	u64 start, end;
-	int i, ret;
+	u64 i, start, end;
+	int ret;
 
 	pr_debug("Setup crash memory ranges.\n");
 	crash_mrange_info.mem_range_cnt = 0;
@@ -1005,10 +1004,7 @@ static int fadump_setup_crash_memory_ran
 			return ret;
 	}
 
-	for_each_memblock(memory, reg) {
-		start = (u64)reg->base;
-		end = start + (u64)reg->size;
-
+	for_each_mem_range(i, &start, &end) {
 		/*
 		 * skip the memory chunk that is already added
 		 * (0 through boot_memory_top).
@@ -1242,7 +1238,9 @@ static void fadump_free_reserved_memory(
  */
 static void fadump_release_reserved_area(u64 start, u64 end)
 {
-	u64 tstart, tend, spfn, epfn, reg_spfn, reg_epfn, i;
+	unsigned long reg_spfn, reg_epfn;
+	u64 tstart, tend, spfn, epfn;
+	int i;
 
 	spfn = PHYS_PFN(start);
 	epfn = PHYS_PFN(end);
@@ -1685,12 +1683,10 @@ int __init fadump_reserve_mem(void)
 /* Preserve everything above the base address */
 static void __init fadump_reserve_crash_area(u64 base)
 {
-	struct memblock_region *reg;
-	u64 mstart, msize;
+	u64 i, mstart, mend, msize;
 
-	for_each_memblock(memory, reg) {
-		mstart = reg->base;
-		msize  = reg->size;
+	for_each_mem_range(i, &mstart, &mend) {
+		msize  = mend - mstart;
 
 		if ((mstart + msize) < base)
 			continue;
--- a/arch/powerpc/kexec/file_load_64.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/powerpc/kexec/file_load_64.c
@@ -138,15 +138,13 @@ out:
  */
 static int get_crash_memory_ranges(struct crash_mem **mem_ranges)
 {
-	struct memblock_region *reg;
+	phys_addr_t base, end;
 	struct crash_mem *tmem;
+	u64 i;
 	int ret;
 
-	for_each_memblock(memory, reg) {
-		u64 base, size;
-
-		base = (u64)reg->base;
-		size = (u64)reg->size;
+	for_each_mem_range(i, &base, &end) {
+		u64 size = end - base;
 
 		/* Skip backup memory region, which needs a separate entry */
 		if (base == BACKUP_SRC_START) {
--- a/arch/powerpc/mm/book3s64/hash_utils.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/powerpc/mm/book3s64/hash_utils.c
@@ -7,7 +7,7 @@
  *
  * SMP scalability work:
  *    Copyright (C) 2001 Anton Blanchard <anton@au.ibm.com>, IBM
- * 
+ *
  *    Module name: htab.c
  *
  *    Description:
@@ -865,8 +865,8 @@ static void __init htab_initialize(void)
 	unsigned long table;
 	unsigned long pteg_count;
 	unsigned long prot;
-	unsigned long base = 0, size = 0;
-	struct memblock_region *reg;
+	phys_addr_t base = 0, size = 0, end;
+	u64 i;
 
 	DBG(" -> htab_initialize()\n");
 
@@ -882,7 +882,7 @@ static void __init htab_initialize(void)
 	/*
 	 * Calculate the required size of the htab.  We want the number of
 	 * PTEGs to equal one half the number of real pages.
-	 */ 
+	 */
 	htab_size_bytes = htab_get_table_size();
 	pteg_count = htab_size_bytes >> 7;
 
@@ -892,7 +892,7 @@ static void __init htab_initialize(void)
 	    firmware_has_feature(FW_FEATURE_PS3_LV1)) {
 		/* Using a hypervisor which owns the htab */
 		htab_address = NULL;
-		_SDR1 = 0; 
+		_SDR1 = 0;
 #ifdef CONFIG_FA_DUMP
 		/*
 		 * If firmware assisted dump is active firmware preserves
@@ -958,9 +958,9 @@ static void __init htab_initialize(void)
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
 	/* create bolted the linear mapping in the hash table */
-	for_each_memblock(memory, reg) {
-		base = (unsigned long)__va(reg->base);
-		size = reg->size;
+	for_each_mem_range(i, &base, &end) {
+		size = end - base;
+		base = (unsigned long)__va(base);
 
 		DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
 		    base, size, prot);
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -329,7 +329,8 @@ static int __meminit create_physical_map
 static void __init radix_init_pgtable(void)
 {
 	unsigned long rts_field;
-	struct memblock_region *reg;
+	phys_addr_t start, end;
+	u64 i;
 
 	/* We don't support slb for radix */
 	mmu_slb_size = 0;
@@ -337,20 +338,19 @@ static void __init radix_init_pgtable(vo
 	/*
 	 * Create the linear mapping
 	 */
-	for_each_memblock(memory, reg) {
+	for_each_mem_range(i, &start, &end) {
 		/*
 		 * The memblock allocator  is up at this point, so the
 		 * page tables will be allocated within the range. No
 		 * need or a node (which we don't have yet).
 		 */
 
-		if ((reg->base + reg->size) >= RADIX_VMALLOC_START) {
+		if (end >= RADIX_VMALLOC_START) {
 			pr_warn("Outside the supported range\n");
 			continue;
 		}
 
-		WARN_ON(create_physical_mapping(reg->base,
-						reg->base + reg->size,
+		WARN_ON(create_physical_mapping(start, end,
 						radix_mem_block_size,
 						-1, PAGE_KERNEL));
 	}
--- a/arch/powerpc/mm/kasan/kasan_init_32.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -138,11 +138,11 @@ void __init kasan_mmu_init(void)
 
 void __init kasan_init(void)
 {
-	struct memblock_region *reg;
+	phys_addr_t base, end;
+	u64 i;
 
-	for_each_memblock(memory, reg) {
-		phys_addr_t base = reg->base;
-		phys_addr_t top = min(base + reg->size, total_lowmem);
+	for_each_mem_range(i, &base, &end) {
+		phys_addr_t top = min(end, total_lowmem);
 		int ret;
 
 		if (base >= top)
--- a/arch/powerpc/mm/mem.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/powerpc/mm/mem.c
@@ -585,20 +585,24 @@ void flush_icache_user_page(struct vm_ar
  */
 static int __init add_system_ram_resources(void)
 {
-	struct memblock_region *reg;
+	phys_addr_t start, end;
+	u64 i;
 
-	for_each_memblock(memory, reg) {
+	for_each_mem_range(i, &start, &end) {
 		struct resource *res;
-		unsigned long base = reg->base;
-		unsigned long size = reg->size;
 
 		res = kzalloc(sizeof(struct resource), GFP_KERNEL);
 		WARN_ON(!res);
 
 		if (res) {
 			res->name = "System RAM";
-			res->start = base;
-			res->end = base + size - 1;
+			res->start = start;
+			/*
+			 * In memblock, end points to the first byte after
+			 * the range while in resourses, end points to the
+			 * last byte in the range.
+			 */
+			res->end = end - 1;
 			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 			WARN_ON(request_resource(&iomem_resource, res) < 0);
 		}
--- a/arch/powerpc/mm/pgtable_32.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/powerpc/mm/pgtable_32.c
@@ -123,11 +123,11 @@ static void __init __mapin_ram_chunk(uns
 
 void __init mapin_ram(void)
 {
-	struct memblock_region *reg;
+	phys_addr_t base, end;
+	u64 i;
 
-	for_each_memblock(memory, reg) {
-		phys_addr_t base = reg->base;
-		phys_addr_t top = min(base + reg->size, total_lowmem);
+	for_each_mem_range(i, &base, &end) {
+		phys_addr_t top = min(end, total_lowmem);
 
 		if (base >= top)
 			continue;
--- a/arch/riscv/mm/init.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/riscv/mm/init.c
@@ -145,21 +145,21 @@ static phys_addr_t dtb_early_pa __initda
 
 void __init setup_bootmem(void)
 {
-	struct memblock_region *reg;
 	phys_addr_t mem_size = 0;
 	phys_addr_t total_mem = 0;
-	phys_addr_t mem_start, end = 0;
+	phys_addr_t mem_start, start, end = 0;
 	phys_addr_t vmlinux_end = __pa_symbol(&_end);
 	phys_addr_t vmlinux_start = __pa_symbol(&_start);
+	u64 i;
 
 	/* Find the memory region containing the kernel */
-	for_each_memblock(memory, reg) {
-		end = reg->base + reg->size;
+	for_each_mem_range(i, &start, &end) {
+		phys_addr_t size = end - start;
 		if (!total_mem)
-			mem_start = reg->base;
-		if (reg->base <= vmlinux_start && vmlinux_end <= end)
-			BUG_ON(reg->size == 0);
-		total_mem = total_mem + reg->size;
+			mem_start = start;
+		if (start <= vmlinux_start && vmlinux_end <= end)
+			BUG_ON(size == 0);
+		total_mem = total_mem + size;
 	}
 
 	/*
@@ -456,7 +456,7 @@ static void __init setup_vm_final(void)
 {
 	uintptr_t va, map_size;
 	phys_addr_t pa, start, end;
-	struct memblock_region *reg;
+	u64 i;
 
 	/* Set mmu_enabled flag */
 	mmu_enabled = true;
@@ -467,14 +467,9 @@ static void __init setup_vm_final(void)
 			   PGDIR_SIZE, PAGE_TABLE);
 
 	/* Map all memory banks */
-	for_each_memblock(memory, reg) {
-		start = reg->base;
-		end = start + reg->size;
-
+	for_each_mem_range(i, &start, &end) {
 		if (start >= end)
 			break;
-		if (memblock_is_nomap(reg))
-			continue;
 		if (start <= __pa(PAGE_OFFSET) &&
 		    __pa(PAGE_OFFSET) < end)
 			start = __pa(PAGE_OFFSET);
--- a/arch/riscv/mm/kasan_init.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/riscv/mm/kasan_init.c
@@ -85,16 +85,16 @@ static void __init populate(void *start,
 
 void __init kasan_init(void)
 {
-	struct memblock_region *reg;
-	unsigned long i;
+	phys_addr_t _start, _end;
+	u64 i;
 
 	kasan_populate_early_shadow((void *)KASAN_SHADOW_START,
 				    (void *)kasan_mem_to_shadow((void *)
 								VMALLOC_END));
 
-	for_each_memblock(memory, reg) {
-		void *start = (void *)__va(reg->base);
-		void *end = (void *)__va(reg->base + reg->size);
+	for_each_mem_range(i, &_start, &_end) {
+		void *start = (void *)_start;
+		void *end = (void *)_end;
 
 		if (start >= end)
 			break;
--- a/arch/s390/kernel/setup.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/s390/kernel/setup.c
@@ -484,8 +484,9 @@ static struct resource __initdata *stand
 static void __init setup_resources(void)
 {
 	struct resource *res, *std_res, *sub_res;
-	struct memblock_region *reg;
+	phys_addr_t start, end;
 	int j;
+	u64 i;
 
 	code_resource.start = (unsigned long) _text;
 	code_resource.end = (unsigned long) _etext - 1;
@@ -494,7 +495,7 @@ static void __init setup_resources(void)
 	bss_resource.start = (unsigned long) __bss_start;
 	bss_resource.end = (unsigned long) __bss_stop - 1;
 
-	for_each_memblock(memory, reg) {
+	for_each_mem_range(i, &start, &end) {
 		res = memblock_alloc(sizeof(*res), 8);
 		if (!res)
 			panic("%s: Failed to allocate %zu bytes align=0x%x\n",
@@ -502,8 +503,13 @@ static void __init setup_resources(void)
 		res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM;
 
 		res->name = "System RAM";
-		res->start = reg->base;
-		res->end = reg->base + reg->size - 1;
+		res->start = start;
+		/*
+		 * In memblock, end points to the first byte after the
+		 * range while in resourses, end points to the last byte in
+		 * the range.
+		 */
+		res->end = end - 1;
 		request_resource(&iomem_resource, res);
 
 		for (j = 0; j < ARRAY_SIZE(standard_resources); j++) {
@@ -819,14 +825,15 @@ static void __init reserve_kernel(void)
 
 static void __init setup_memory(void)
 {
-	struct memblock_region *reg;
+	phys_addr_t start, end;
+	u64 i;
 
 	/*
 	 * Init storage key for present memory
 	 */
-	for_each_memblock(memory, reg) {
-		storage_key_init_range(reg->base, reg->base + reg->size);
-	}
+	for_each_mem_range(i, &start, &end)
+		storage_key_init_range(start, end);
+
 	psw_set_key(PAGE_DEFAULT_KEY);
 
 	/* Only cosmetics */
--- a/arch/s390/mm/vmem.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/s390/mm/vmem.c
@@ -554,10 +554,11 @@ int vmem_add_mapping(unsigned long start
  */
 void __init vmem_map_init(void)
 {
-	struct memblock_region *reg;
+	phys_addr_t base, end;
+	u64 i;
 
-	for_each_memblock(memory, reg)
-		vmem_add_range(reg->base, reg->size);
+	for_each_mem_range(i, &base, &end)
+		vmem_add_range(base, end - base);
 	__set_memory((unsigned long)_stext,
 		     (unsigned long)(_etext - _stext) >> PAGE_SHIFT,
 		     SET_MEMORY_RO | SET_MEMORY_X);
--- a/arch/sparc/mm/init_64.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/arch/sparc/mm/init_64.c
@@ -1192,18 +1192,14 @@ int of_node_to_nid(struct device_node *d
 
 static void __init add_node_ranges(void)
 {
-	struct memblock_region *reg;
+	phys_addr_t start, end;
 	unsigned long prev_max;
+	u64 i;
 
 memblock_resized:
 	prev_max = memblock.memory.max;
 
-	for_each_memblock(memory, reg) {
-		unsigned long size = reg->size;
-		unsigned long start, end;
-
-		start = reg->base;
-		end = start + size;
+	for_each_mem_range(i, &start, &end) {
 		while (start < end) {
 			unsigned long this_end;
 			int nid;
@@ -1211,7 +1207,7 @@ memblock_resized:
 			this_end = memblock_nid_range(start, end, &nid);
 
 			numadbg("Setting memblock NUMA node nid[%d] "
-				"start[%lx] end[%lx]\n",
+				"start[%llx] end[%lx]\n",
 				nid, start, this_end);
 
 			memblock_set_node(start, this_end - start,
--- a/drivers/bus/mvebu-mbus.c~arch-drivers-replace-for_each_membock-with-for_each_mem_range
+++ a/drivers/bus/mvebu-mbus.c
@@ -610,23 +610,23 @@ static unsigned int armada_xp_mbus_win_r
 static void __init
 mvebu_mbus_find_bridge_hole(uint64_t *start, uint64_t *end)
 {
-	struct memblock_region *r;
-	uint64_t s = 0;
+	phys_addr_t reg_start, reg_end;
+	uint64_t i, s = 0;
 
-	for_each_memblock(memory, r) {
+	for_each_mem_range(i, &reg_start, &reg_end) {
 		/*
 		 * This part of the memory is above 4 GB, so we don't
 		 * care for the MBus bridge hole.
 		 */
-		if (r->base >= 0x100000000ULL)
+		if (reg_start >= 0x100000000ULL)
 			continue;
 
 		/*
 		 * The MBus bridge hole is at the end of the RAM under
 		 * the 4 GB limit.
 		 */
-		if (r->base + r->size > s)
-			s = r->base + r->size;
+		if (reg_end > s)
+			s = reg_end;
 	}
 
 	*start = s;
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + x86-setup-simplify-initrd-relocation-and-reservation.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (115 preceding siblings ...)
  2020-08-19 19:27 ` + arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch " Andrew Morton
@ 2020-08-19 19:28 ` Andrew Morton
  2020-08-19 19:28 ` + x86-setup-simplify-reserve_crashkernel.patch " Andrew Morton
                   ` (22 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:28 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: x86/setup: simplify initrd relocation and reservation
has been added to the -mm tree.  Its filename is
     x86-setup-simplify-initrd-relocation-and-reservation.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/x86-setup-simplify-initrd-relocation-and-reservation.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/x86-setup-simplify-initrd-relocation-and-reservation.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: x86/setup: simplify initrd relocation and reservation

Currently, initrd image is reserved very early during setup and then it
might be relocated and re-reserved after the initial physical memory
mapping is created.  The "late" reservation of memblock verifies that
mapped memory size exceeds the size of initrd, then checks whether the
relocation required and, if yes, relocates inirtd to a new memory
allocated from memblock and frees the old location.

The check for memory size is excessive as memblock allocation will anyway
fail if there is not enough memory.  Besides, there is no point to
allocate memory from memblock using memblock_find_in_range() +
memblock_reserve() when there exists memblock_phys_alloc_range() with
required functionality.

Remove the redundant check and simplify memblock allocation.

Link: https://lkml.kernel.org/r/20200818151634.14343-14-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/kernel/setup.c |   16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

--- a/arch/x86/kernel/setup.c~x86-setup-simplify-initrd-relocation-and-reservation
+++ a/arch/x86/kernel/setup.c
@@ -263,16 +263,12 @@ static void __init relocate_initrd(void)
 	u64 area_size     = PAGE_ALIGN(ramdisk_size);
 
 	/* We need to move the initrd down into directly mapped mem */
-	relocated_ramdisk = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
-						   area_size, PAGE_SIZE);
-
+	relocated_ramdisk = memblock_phys_alloc_range(area_size, PAGE_SIZE, 0,
+						      PFN_PHYS(max_pfn_mapped));
 	if (!relocated_ramdisk)
 		panic("Cannot find place for new RAMDISK of size %lld\n",
 		      ramdisk_size);
 
-	/* Note: this includes all the mem currently occupied by
-	   the initrd, we rely on that fact to keep the data intact. */
-	memblock_reserve(relocated_ramdisk, area_size);
 	initrd_start = relocated_ramdisk + PAGE_OFFSET;
 	initrd_end   = initrd_start + ramdisk_size;
 	printk(KERN_INFO "Allocated new RAMDISK: [mem %#010llx-%#010llx]\n",
@@ -299,13 +295,13 @@ static void __init early_reserve_initrd(
 
 	memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
 }
+
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
 	u64 ramdisk_image = get_ramdisk_image();
 	u64 ramdisk_size  = get_ramdisk_size();
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-	u64 mapped_size;
 
 	if (!boot_params.hdr.type_of_loader ||
 	    !ramdisk_image || !ramdisk_size)
@@ -313,12 +309,6 @@ static void __init reserve_initrd(void)
 
 	initrd_start = 0;
 
-	mapped_size = memblock_mem_size(max_pfn_mapped);
-	if (ramdisk_size >= (mapped_size>>1))
-		panic("initrd too large to handle, "
-		       "disabling initrd (%lld needed, %lld available)\n",
-		       ramdisk_size, mapped_size>>1);
-
 	printk(KERN_INFO "RAMDISK: [mem %#010llx-%#010llx]\n", ramdisk_image,
 			ramdisk_end - 1);
 
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + x86-setup-simplify-reserve_crashkernel.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (116 preceding siblings ...)
  2020-08-19 19:28 ` + x86-setup-simplify-initrd-relocation-and-reservation.patch " Andrew Morton
@ 2020-08-19 19:28 ` Andrew Morton
  2020-08-19 19:28 ` + memblock-remove-unused-memblock_mem_size.patch " Andrew Morton
                   ` (21 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:28 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: x86/setup: simplify reserve_crashkernel()
has been added to the -mm tree.  Its filename is
     x86-setup-simplify-reserve_crashkernel.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/x86-setup-simplify-reserve_crashkernel.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/x86-setup-simplify-reserve_crashkernel.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: x86/setup: simplify reserve_crashkernel()

* Replace magic numbers with defines
* Replace memblock_find_in_range() + memblock_reserve() with
  memblock_phys_alloc_range()
* Stop checking for low memory size in reserve_crashkernel_low(). The
  allocation from limited range will anyway fail if there is no enough
  memory, so there is no need for extra traversal of memblock.memory

Link: https://lkml.kernel.org/r/20200818151634.14343-15-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/kernel/setup.c |   40 +++++++++++++-------------------------
 1 file changed, 14 insertions(+), 26 deletions(-)

--- a/arch/x86/kernel/setup.c~x86-setup-simplify-reserve_crashkernel
+++ a/arch/x86/kernel/setup.c
@@ -420,13 +420,13 @@ static int __init reserve_crashkernel_lo
 {
 #ifdef CONFIG_X86_64
 	unsigned long long base, low_base = 0, low_size = 0;
-	unsigned long total_low_mem;
+	unsigned long low_mem_limit;
 	int ret;
 
-	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
+	low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
 
 	/* crashkernel=Y,low */
-	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
+	ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
 	if (ret) {
 		/*
 		 * two parts from kernel/dma/swiotlb.c:
@@ -444,23 +444,17 @@ static int __init reserve_crashkernel_lo
 			return 0;
 	}
 
-	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
+	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
 	if (!low_base) {
 		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
 		       (unsigned long)(low_size >> 20));
 		return -ENOMEM;
 	}
 
-	ret = memblock_reserve(low_base, low_size);
-	if (ret) {
-		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
-		return ret;
-	}
-
-	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
+	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
 		(unsigned long)(low_size >> 20),
 		(unsigned long)(low_base >> 20),
-		(unsigned long)(total_low_mem >> 20));
+		(unsigned long)(low_mem_limit >> 20));
 
 	crashk_low_res.start = low_base;
 	crashk_low_res.end   = low_base + low_size - 1;
@@ -504,13 +498,13 @@ static void __init reserve_crashkernel(v
 		 * unless "crashkernel=size[KMG],high" is specified.
 		 */
 		if (!high)
-			crash_base = memblock_find_in_range(CRASH_ALIGN,
-						CRASH_ADDR_LOW_MAX,
-						crash_size, CRASH_ALIGN);
+			crash_base = memblock_phys_alloc_range(crash_size,
+						CRASH_ALIGN, CRASH_ALIGN,
+						CRASH_ADDR_LOW_MAX);
 		if (!crash_base)
-			crash_base = memblock_find_in_range(CRASH_ALIGN,
-						CRASH_ADDR_HIGH_MAX,
-						crash_size, CRASH_ALIGN);
+			crash_base = memblock_phys_alloc_range(crash_size,
+						CRASH_ALIGN, CRASH_ALIGN,
+						CRASH_ADDR_HIGH_MAX);
 		if (!crash_base) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
@@ -518,19 +512,13 @@ static void __init reserve_crashkernel(v
 	} else {
 		unsigned long long start;
 
-		start = memblock_find_in_range(crash_base,
-					       crash_base + crash_size,
-					       crash_size, 1 << 20);
+		start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
+						  crash_base + crash_size);
 		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - memory is in use.\n");
 			return;
 		}
 	}
-	ret = memblock_reserve(crash_base, crash_size);
-	if (ret) {
-		pr_err("%s: Error reserving crashkernel memblock.\n", __func__);
-		return;
-	}
 
 	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
 		memblock_free(crash_base, crash_size);
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + memblock-remove-unused-memblock_mem_size.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (117 preceding siblings ...)
  2020-08-19 19:28 ` + x86-setup-simplify-reserve_crashkernel.patch " Andrew Morton
@ 2020-08-19 19:28 ` Andrew Morton
  2020-08-19 19:28 ` + memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch " Andrew Morton
                   ` (20 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:28 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: memblock: remove unused memblock_mem_size()
has been added to the -mm tree.  Its filename is
     memblock-remove-unused-memblock_mem_size.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/memblock-remove-unused-memblock_mem_size.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/memblock-remove-unused-memblock_mem_size.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: memblock: remove unused memblock_mem_size()

The only user of memblock_mem_size() was x86 setup code, it is gone now
and memblock_mem_size() funciton can be removed.

Link: https://lkml.kernel.org/r/20200818151634.14343-16-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memblock.h |    1 -
 mm/memblock.c            |   15 ---------------
 2 files changed, 16 deletions(-)

--- a/include/linux/memblock.h~memblock-remove-unused-memblock_mem_size
+++ a/include/linux/memblock.h
@@ -481,7 +481,6 @@ static inline bool memblock_bottom_up(vo
 
 phys_addr_t memblock_phys_mem_size(void);
 phys_addr_t memblock_reserved_size(void);
-phys_addr_t memblock_mem_size(unsigned long limit_pfn);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
--- a/mm/memblock.c~memblock-remove-unused-memblock_mem_size
+++ a/mm/memblock.c
@@ -1660,21 +1660,6 @@ phys_addr_t __init_memblock memblock_res
 	return memblock.reserved.total_size;
 }
 
-phys_addr_t __init memblock_mem_size(unsigned long limit_pfn)
-{
-	unsigned long pages = 0;
-	unsigned long start_pfn, end_pfn;
-	int i;
-
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
-		start_pfn = min_t(unsigned long, start_pfn, limit_pfn);
-		end_pfn = min_t(unsigned long, end_pfn, limit_pfn);
-		pages += end_pfn - start_pfn;
-	}
-
-	return PFN_PHYS(pages);
-}
-
 /* lowest address */
 phys_addr_t __init_memblock memblock_start_of_DRAM(void)
 {
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (118 preceding siblings ...)
  2020-08-19 19:28 ` + memblock-remove-unused-memblock_mem_size.patch " Andrew Morton
@ 2020-08-19 19:28 ` Andrew Morton
  2020-08-19 19:28 ` + memblock-use-separate-iterators-for-memory-and-reserved-regions.patch " Andrew Morton
                   ` (19 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:28 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: memblock: implement for_each_reserved_mem_region() using __next_mem_region()
has been added to the -mm tree.  Its filename is
     memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: memblock: implement for_each_reserved_mem_region() using __next_mem_region()

Iteration over memblock.reserved with for_each_reserved_mem_region() used
__next_reserved_mem_region() that implemented a subset of
__next_mem_region().

Use __for_each_mem_range() and, essentially, __next_mem_region() with
appropriate parameters to reduce code duplication.

While on it, rename for_each_reserved_mem_region() to
for_each_reserved_mem_range() for consistency.

Link: https://lkml.kernel.org/r/20200818151634.14343-17-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	[.clang-format]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .clang-format                    |    2 -
 arch/arm64/kernel/setup.c        |    2 -
 drivers/irqchip/irq-gic-v3-its.c |    2 -
 include/linux/memblock.h         |   12 ++----
 mm/memblock.c                    |   56 ++++++++++-------------------
 5 files changed, 27 insertions(+), 47 deletions(-)

--- a/arch/arm64/kernel/setup.c~memblock-implement-for_each_reserved_mem_region-using-__next_mem_region
+++ a/arch/arm64/kernel/setup.c
@@ -257,7 +257,7 @@ static int __init reserve_memblock_reser
 		if (!memblock_is_region_reserved(mem->start, mem_size))
 			continue;
 
-		for_each_reserved_mem_region(j, &r_start, &r_end) {
+		for_each_reserved_mem_range(j, &r_start, &r_end) {
 			resource_size_t start, end;
 
 			start = max(PFN_PHYS(PFN_DOWN(r_start)), mem->start);
--- a/.clang-format~memblock-implement-for_each_reserved_mem_region-using-__next_mem_region
+++ a/.clang-format
@@ -267,7 +267,7 @@ ForEachMacros:
   - 'for_each_process_thread'
   - 'for_each_property_of_node'
   - 'for_each_registered_fb'
-  - 'for_each_reserved_mem_region'
+  - 'for_each_reserved_mem_range'
   - 'for_each_rtd_codec_dais'
   - 'for_each_rtd_codec_dais_rollback'
   - 'for_each_rtd_components'
--- a/drivers/irqchip/irq-gic-v3-its.c~memblock-implement-for_each_reserved_mem_region-using-__next_mem_region
+++ a/drivers/irqchip/irq-gic-v3-its.c
@@ -2192,7 +2192,7 @@ static bool gic_check_reserved_range(phy
 
 	addr_end = addr + size - 1;
 
-	for_each_reserved_mem_region(i, &start, &end) {
+	for_each_reserved_mem_range(i, &start, &end) {
 		if (addr >= start && addr_end <= end)
 			return true;
 	}
--- a/include/linux/memblock.h~memblock-implement-for_each_reserved_mem_region-using-__next_mem_region
+++ a/include/linux/memblock.h
@@ -132,9 +132,6 @@ void __next_mem_range_rev(u64 *idx, int
 			  struct memblock_type *type_b, phys_addr_t *out_start,
 			  phys_addr_t *out_end, int *out_nid);
 
-void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
-				phys_addr_t *out_end);
-
 void __memblock_free_late(phys_addr_t base, phys_addr_t size);
 
 #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
@@ -224,7 +221,7 @@ static inline void __next_physmem_range(
 				 MEMBLOCK_NONE, p_start, p_end, NULL)
 
 /**
- * for_each_reserved_mem_region - iterate over all reserved memblock areas
+ * for_each_reserved_mem_range - iterate over all reserved memblock areas
  * @i: u64 used as loop variable
  * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
  * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
@@ -232,10 +229,9 @@ static inline void __next_physmem_range(
  * Walks over reserved areas of memblock. Available as soon as memblock
  * is initialized.
  */
-#define for_each_reserved_mem_region(i, p_start, p_end)			\
-	for (i = 0UL, __next_reserved_mem_region(&i, p_start, p_end);	\
-	     i != (u64)ULLONG_MAX;					\
-	     __next_reserved_mem_region(&i, p_start, p_end))
+#define for_each_reserved_mem_range(i, p_start, p_end)			\
+	__for_each_mem_range(i, &memblock.reserved, NULL, NUMA_NO_NODE,	\
+			     MEMBLOCK_NONE, p_start, p_end, NULL)
 
 static inline bool memblock_is_hotpluggable(struct memblock_region *m)
 {
--- a/mm/memblock.c~memblock-implement-for_each_reserved_mem_region-using-__next_mem_region
+++ a/mm/memblock.c
@@ -132,6 +132,14 @@ struct memblock_type physmem = {
 };
 #endif
 
+/*
+ * keep a pointer to &memblock.memory in the text section to use it in
+ * __next_mem_range() and its helpers.
+ *  For architectures that do not keep memblock data after init, this
+ * pointer will be reset to NULL at memblock_discard()
+ */
+static __refdata struct memblock_type *memblock_memory = &memblock.memory;
+
 #define for_each_memblock_type(i, memblock_type, rgn)			\
 	for (i = 0, rgn = &memblock_type->regions[0];			\
 	     i < memblock_type->cnt;					\
@@ -402,6 +410,8 @@ void __init memblock_discard(void)
 				  memblock.memory.max);
 		__memblock_free_late(addr, size);
 	}
+
+	memblock_memory = NULL;
 }
 #endif
 
@@ -952,42 +962,16 @@ int __init_memblock memblock_clear_nomap
 	return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP);
 }
 
-/**
- * __next_reserved_mem_region - next function for for_each_reserved_region()
- * @idx: pointer to u64 loop variable
- * @out_start: ptr to phys_addr_t for start address of the region, can be %NULL
- * @out_end: ptr to phys_addr_t for end address of the region, can be %NULL
- *
- * Iterate over all reserved memory regions.
- */
-void __init_memblock __next_reserved_mem_region(u64 *idx,
-					   phys_addr_t *out_start,
-					   phys_addr_t *out_end)
-{
-	struct memblock_type *type = &memblock.reserved;
-
-	if (*idx < type->cnt) {
-		struct memblock_region *r = &type->regions[*idx];
-		phys_addr_t base = r->base;
-		phys_addr_t size = r->size;
-
-		if (out_start)
-			*out_start = base;
-		if (out_end)
-			*out_end = base + size - 1;
-
-		*idx += 1;
-		return;
-	}
-
-	/* signal end of iteration */
-	*idx = ULLONG_MAX;
-}
-
-static bool should_skip_region(struct memblock_region *m, int nid, int flags)
+static bool should_skip_region(struct memblock_type *type,
+			       struct memblock_region *m,
+			       int nid, int flags)
 {
 	int m_nid = memblock_get_region_node(m);
 
+	/* we never skip regions when iterating memblock.reserved or physmem */
+	if (type != memblock_memory)
+		return false;
+
 	/* only memory regions are associated with nodes, check it */
 	if (nid != NUMA_NO_NODE && nid != m_nid)
 		return true;
@@ -1052,7 +1036,7 @@ void __next_mem_range(u64 *idx, int nid,
 		phys_addr_t m_end = m->base + m->size;
 		int	    m_nid = memblock_get_region_node(m);
 
-		if (should_skip_region(m, nid, flags))
+		if (should_skip_region(type_a, m, nid, flags))
 			continue;
 
 		if (!type_b) {
@@ -1156,7 +1140,7 @@ void __init_memblock __next_mem_range_re
 		phys_addr_t m_end = m->base + m->size;
 		int m_nid = memblock_get_region_node(m);
 
-		if (should_skip_region(m, nid, flags))
+		if (should_skip_region(type_a, m, nid, flags))
 			continue;
 
 		if (!type_b) {
@@ -1981,7 +1965,7 @@ static unsigned long __init free_low_mem
 
 	memblock_clear_hotplug(0, -1);
 
-	for_each_reserved_mem_region(i, &start, &end)
+	for_each_reserved_mem_range(i, &start, &end)
 		reserve_bootmem_region(start, end);
 
 	/*
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + memblock-use-separate-iterators-for-memory-and-reserved-regions.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (119 preceding siblings ...)
  2020-08-19 19:28 ` + memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch " Andrew Morton
@ 2020-08-19 19:28 ` Andrew Morton
  2020-08-19 19:31 ` + fs-ocfs2-delete-repeated-words-in-comments.patch " Andrew Morton
                   ` (18 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:28 UTC (permalink / raw)
  To: benh, bhe, bp, catalin.marinas, dave.hansen, dja, hbathini, hch,
	jcmvbkbc, Jonathan.Cameron, kernel, linux, luto, m.szyprowski,
	miguel.ojeda.sandonis, mingo, mingo, mm-commits, monstr, mpe,
	palmer, paul.walmsley, paulus, peterz, rppt, shorne, tglx,
	tsbogend, will, ysato


The patch titled
     Subject: memblock: use separate iterators for memory and reserved regions
has been added to the -mm tree.  Its filename is
     memblock-use-separate-iterators-for-memory-and-reserved-regions.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/memblock-use-separate-iterators-for-memory-and-reserved-regions.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/memblock-use-separate-iterators-for-memory-and-reserved-regions.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: memblock: use separate iterators for memory and reserved regions

for_each_memblock() is used to iterate over memblock.memory in a few
places that use data from memblock_region rather than the memory ranges.

Introduce separate for_each_mem_region() and
for_each_reserved_mem_region() to improve encapsulation of memblock
internals from its users.

Link: https://lkml.kernel.org/r/20200818151634.14343-18-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>			[x86]
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	[MIPS]
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	[.clang-format]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .clang-format                  |    3 ++-
 arch/arm64/kernel/setup.c      |    2 +-
 arch/arm64/mm/numa.c           |    2 +-
 arch/mips/netlogic/xlp/setup.c |    2 +-
 arch/riscv/mm/init.c           |    2 +-
 arch/x86/mm/numa.c             |    2 +-
 include/linux/memblock.h       |   19 ++++++++++++++++---
 mm/memblock.c                  |    4 ++--
 mm/page_alloc.c                |    8 ++++----
 9 files changed, 29 insertions(+), 15 deletions(-)

--- a/arch/arm64/kernel/setup.c~memblock-use-separate-iterators-for-memory-and-reserved-regions
+++ a/arch/arm64/kernel/setup.c
@@ -217,7 +217,7 @@ static void __init request_standard_reso
 	if (!standard_resources)
 		panic("%s: Failed to allocate %zu bytes\n", __func__, res_size);
 
-	for_each_memblock(memory, region) {
+	for_each_mem_region(region) {
 		res = &standard_resources[i++];
 		if (memblock_is_nomap(region)) {
 			res->name  = "reserved";
--- a/arch/arm64/mm/numa.c~memblock-use-separate-iterators-for-memory-and-reserved-regions
+++ a/arch/arm64/mm/numa.c
@@ -350,7 +350,7 @@ static int __init numa_register_nodes(vo
 	struct memblock_region *mblk;
 
 	/* Check that valid nid is set to memblks */
-	for_each_memblock(memory, mblk) {
+	for_each_mem_region(mblk) {
 		int mblk_nid = memblock_get_region_node(mblk);
 
 		if (mblk_nid == NUMA_NO_NODE || mblk_nid >= MAX_NUMNODES) {
--- a/arch/mips/netlogic/xlp/setup.c~memblock-use-separate-iterators-for-memory-and-reserved-regions
+++ a/arch/mips/netlogic/xlp/setup.c
@@ -70,7 +70,7 @@ static void nlm_fixup_mem(void)
 	const int pref_backup = 512;
 	struct memblock_region *mem;
 
-	for_each_memblock(memory, mem) {
+	for_each_mem_region(mem) {
 		memblock_remove(mem->base + mem->size - pref_backup,
 			pref_backup);
 	}
--- a/arch/riscv/mm/init.c~memblock-use-separate-iterators-for-memory-and-reserved-regions
+++ a/arch/riscv/mm/init.c
@@ -531,7 +531,7 @@ static void __init resource_init(void)
 {
 	struct memblock_region *region;
 
-	for_each_memblock(memory, region) {
+	for_each_mem_region(region) {
 		struct resource *res;
 
 		res = memblock_alloc(sizeof(struct resource), SMP_CACHE_BYTES);
--- a/arch/x86/mm/numa.c~memblock-use-separate-iterators-for-memory-and-reserved-regions
+++ a/arch/x86/mm/numa.c
@@ -516,7 +516,7 @@ static void __init numa_clear_kernel_nod
 	 *   memory ranges, because quirks such as trim_snb_memory()
 	 *   reserve specific pages for Sandy Bridge graphics. ]
 	 */
-	for_each_memblock(reserved, mb_region) {
+	for_each_reserved_mem_region(mb_region) {
 		int nid = memblock_get_region_node(mb_region);
 
 		if (nid != MAX_NUMNODES)
--- a/.clang-format~memblock-use-separate-iterators-for-memory-and-reserved-regions
+++ a/.clang-format
@@ -201,7 +201,7 @@ ForEachMacros:
   - 'for_each_matching_node'
   - 'for_each_matching_node_and_match'
   - 'for_each_member'
-  - 'for_each_memblock'
+  - 'for_each_mem_region'
   - 'for_each_memblock_type'
   - 'for_each_memcg_cache_index'
   - 'for_each_mem_pfn_range'
@@ -268,6 +268,7 @@ ForEachMacros:
   - 'for_each_property_of_node'
   - 'for_each_registered_fb'
   - 'for_each_reserved_mem_range'
+  - 'for_each_reserved_mem_region'
   - 'for_each_rtd_codec_dais'
   - 'for_each_rtd_codec_dais_rollback'
   - 'for_each_rtd_components'
--- a/include/linux/memblock.h~memblock-use-separate-iterators-for-memory-and-reserved-regions
+++ a/include/linux/memblock.h
@@ -553,9 +553,22 @@ static inline unsigned long memblock_reg
 	return PFN_UP(reg->base + reg->size);
 }
 
-#define for_each_memblock(memblock_type, region)					\
-	for (region = memblock.memblock_type.regions;					\
-	     region < (memblock.memblock_type.regions + memblock.memblock_type.cnt);	\
+/**
+ * for_each_mem_region - itereate over memory regions
+ * @region: loop variable
+ */
+#define for_each_mem_region(region)					\
+	for (region = memblock.memory.regions;				\
+	     region < (memblock.memory.regions + memblock.memory.cnt);	\
+	     region++)
+
+/**
+ * for_each_reserved_mem_region - itereate over reserved memory regions
+ * @region: loop variable
+ */
+#define for_each_reserved_mem_region(region)				\
+	for (region = memblock.reserved.regions;			\
+	     region < (memblock.reserved.regions + memblock.reserved.cnt); \
 	     region++)
 
 extern void *alloc_large_system_hash(const char *tablename,
--- a/mm/memblock.c~memblock-use-separate-iterators-for-memory-and-reserved-regions
+++ a/mm/memblock.c
@@ -1667,7 +1667,7 @@ static phys_addr_t __init_memblock __fin
 	 * the memory memblock regions, if the @limit exceeds the total size
 	 * of those regions, max_addr will keep original value PHYS_ADDR_MAX
 	 */
-	for_each_memblock(memory, r) {
+	for_each_mem_region(r) {
 		if (limit <= r->size) {
 			max_addr = r->base + limit;
 			break;
@@ -1837,7 +1837,7 @@ void __init_memblock memblock_trim_memor
 	phys_addr_t start, end, orig_start, orig_end;
 	struct memblock_region *r;
 
-	for_each_memblock(memory, r) {
+	for_each_mem_region(r) {
 		orig_start = r->base;
 		orig_end = r->base + r->size;
 		start = round_up(orig_start, align);
--- a/mm/page_alloc.c~memblock-use-separate-iterators-for-memory-and-reserved-regions
+++ a/mm/page_alloc.c
@@ -5955,7 +5955,7 @@ overlap_memmap_init(unsigned long zone,
 
 	if (mirrored_kernelcore && zone == ZONE_MOVABLE) {
 		if (!r || *pfn >= memblock_region_memory_end_pfn(r)) {
-			for_each_memblock(memory, r) {
+			for_each_mem_region(r) {
 				if (*pfn < memblock_region_memory_end_pfn(r))
 					break;
 			}
@@ -6540,7 +6540,7 @@ static unsigned long __init zone_absent_
 		unsigned long start_pfn, end_pfn;
 		struct memblock_region *r;
 
-		for_each_memblock(memory, r) {
+		for_each_mem_region(r) {
 			start_pfn = clamp(memblock_region_memory_base_pfn(r),
 					  zone_start_pfn, zone_end_pfn);
 			end_pfn = clamp(memblock_region_memory_end_pfn(r),
@@ -7134,7 +7134,7 @@ static void __init find_zone_movable_pfn
 	 * options.
 	 */
 	if (movable_node_is_enabled()) {
-		for_each_memblock(memory, r) {
+		for_each_mem_region(r) {
 			if (!memblock_is_hotpluggable(r))
 				continue;
 
@@ -7155,7 +7155,7 @@ static void __init find_zone_movable_pfn
 	if (mirrored_kernelcore) {
 		bool mem_below_4gb_not_mirrored = false;
 
-		for_each_memblock(memory, r) {
+		for_each_mem_region(r) {
 			if (memblock_is_mirror(r))
 				continue;
 
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
dma-contiguous-simplify-cma_early_percent_memory.patch
arm-xtensa-simplify-initialization-of-high-memory-pages.patch
arm64-numa-simplify-dummy_numa_init.patch
h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
riscv-drop-unneeded-node-initialization.patch
mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
memblock-make-for_each_memblock_type-iterator-private.patch
memblock-make-memblock_debug-and-related-functionality-private.patch
memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
x86-setup-simplify-initrd-relocation-and-reservation.patch
x86-setup-simplify-reserve_crashkernel.patch
memblock-remove-unused-memblock_mem_size.patch
memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
memblock-use-separate-iterators-for-memory-and-reserved-regions.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + fs-ocfs2-delete-repeated-words-in-comments.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (120 preceding siblings ...)
  2020-08-19 19:28 ` + memblock-use-separate-iterators-for-memory-and-reserved-regions.patch " Andrew Morton
@ 2020-08-19 19:31 ` Andrew Morton
  2020-08-19 19:32 ` + fs-configfs-delete-repeated-words-in-comments.patch " Andrew Morton
                   ` (17 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:31 UTC (permalink / raw)
  To: jlbec, joseph.qi, mark, mm-commits, rdunlap


The patch titled
     Subject: ocfs2: delete repeated words in comments
has been added to the -mm tree.  Its filename is
     fs-ocfs2-delete-repeated-words-in-comments.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/fs-ocfs2-delete-repeated-words-in-comments.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/fs-ocfs2-delete-repeated-words-in-comments.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Randy Dunlap <rdunlap@infradead.org>
Subject: ocfs2: delete repeated words in comments

Drop duplicated words {the, and} in comments.

Link: https://lkml.kernel.org/r/20200811021845.25134-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/alloc.c      |    2 +-
 fs/ocfs2/localalloc.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/fs/ocfs2/alloc.c~fs-ocfs2-delete-repeated-words-in-comments
+++ a/fs/ocfs2/alloc.c
@@ -6013,7 +6013,7 @@ int __ocfs2_flush_truncate_log(struct oc
 		goto out;
 	}
 
-	/* Appending truncate log(TA) and and flushing truncate log(TF) are
+	/* Appending truncate log(TA) and flushing truncate log(TF) are
 	 * two separated transactions. They can be both committed but not
 	 * checkpointed. If crash occurs then, both two transaction will be
 	 * replayed with several already released to global bitmap clusters.
--- a/fs/ocfs2/localalloc.c~fs-ocfs2-delete-repeated-words-in-comments
+++ a/fs/ocfs2/localalloc.c
@@ -677,7 +677,7 @@ int ocfs2_reserve_local_alloc_bits(struc
 		/*
 		 * Under certain conditions, the window slide code
 		 * might have reduced the number of bits available or
-		 * disabled the the local alloc entirely. Re-check
+		 * disabled the local alloc entirely. Re-check
 		 * here and return -ENOSPC if necessary.
 		 */
 		status = -ENOSPC;
_

Patches currently in -mm which might be from rdunlap@infradead.org are

fs-ocfs2-delete-repeated-words-in-comments.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + fs-configfs-delete-repeated-words-in-comments.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (121 preceding siblings ...)
  2020-08-19 19:31 ` + fs-ocfs2-delete-repeated-words-in-comments.patch " Andrew Morton
@ 2020-08-19 19:32 ` Andrew Morton
  2020-08-19 19:37 ` + mm-slub-make-add_full-condition-more-explicit.patch " Andrew Morton
                   ` (16 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:32 UTC (permalink / raw)
  To: hch, jlbec, mm-commits, rdunlap


The patch titled
     Subject: fs: configfs: delete repeated words in comments
has been added to the -mm tree.  Its filename is
     fs-configfs-delete-repeated-words-in-comments.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/fs-configfs-delete-repeated-words-in-comments.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/fs-configfs-delete-repeated-words-in-comments.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Randy Dunlap <rdunlap@infradead.org>
Subject: fs: configfs: delete repeated words in comments

Drop duplicated words {the, that} in comments.

Link: https://lkml.kernel.org/r/20200811021826.25032-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/configfs/dir.c  |    2 +-
 fs/configfs/file.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/fs/configfs/dir.c~fs-configfs-delete-repeated-words-in-comments
+++ a/fs/configfs/dir.c
@@ -1168,7 +1168,7 @@ EXPORT_SYMBOL(configfs_depend_item);
 
 /*
  * Release the dependent linkage.  This is much simpler than
- * configfs_depend_item() because we know that that the client driver is
+ * configfs_depend_item() because we know that the client driver is
  * pinned, thus the subsystem is pinned, and therefore configfs is pinned.
  */
 void configfs_undepend_item(struct config_item *target)
--- a/fs/configfs/file.c~fs-configfs-delete-repeated-words-in-comments
+++ a/fs/configfs/file.c
@@ -267,7 +267,7 @@ flush_write_buffer(struct file *file, st
  *	There is no easy way for us to know if userspace is only doing a partial
  *	write, so we don't support them. We expect the entire buffer to come
  *	on the first write.
- *	Hint: if you're writing a value, first read the file, modify only the
+ *	Hint: if you're writing a value, first read the file, modify only
  *	the value you're changing, then write entire buffer back.
  */
 
_

Patches currently in -mm which might be from rdunlap@infradead.org are

fs-ocfs2-delete-repeated-words-in-comments.patch
fs-configfs-delete-repeated-words-in-comments.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-slub-make-add_full-condition-more-explicit.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (122 preceding siblings ...)
  2020-08-19 19:32 ` + fs-configfs-delete-repeated-words-in-comments.patch " Andrew Morton
@ 2020-08-19 19:37 ` Andrew Morton
  2020-08-19 19:39 ` + memremap-convert-devmap-static-branch-to-incdec.patch " Andrew Morton
                   ` (15 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:37 UTC (permalink / raw)
  To: cl, iamjoonsoo.kim, liu.xiang6, mm-commits, penberg, rientjes, wuyun.wu


The patch titled
     Subject: mm/slub: make add_full() condition more explicit
has been added to the -mm tree.  Its filename is
     mm-slub-make-add_full-condition-more-explicit.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-slub-make-add_full-condition-more-explicit.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-slub-make-add_full-condition-more-explicit.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Abel Wu <wuyun.wu@huawei.com>
Subject: mm/slub: make add_full() condition more explicit

The commit below is incomplete, as it didn't handle the add_full() part. 
commit a4d3f8916c65 ("slub: remove useless kmem_cache_debug() before
remove_full()")

This patch checks for SLAB_STORE_USER instead of kmem_cache_debug(), since
that should be the only context in which we need the list_lock for
add_full().

Link: https://lkml.kernel.org/r/20200811020240.1231-1-wuyun.wu@huawei.com
Signed-off-by: Abel Wu <wuyun.wu@huawei.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Liu Xiang <liu.xiang6@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/slub.c~mm-slub-make-add_full-condition-more-explicit
+++ a/mm/slub.c
@@ -2250,7 +2250,8 @@ redo:
 		}
 	} else {
 		m = M_FULL;
-		if (kmem_cache_debug(s) && !lock) {
+#ifdef CONFIG_SLUB_DEBUG
+		if ((s->flags & SLAB_STORE_USER) && !lock) {
 			lock = 1;
 			/*
 			 * This also ensures that the scanning of full
@@ -2259,6 +2260,7 @@ redo:
 			 */
 			spin_lock(&n->list_lock);
 		}
+#endif
 	}
 
 	if (l != m) {
_

Patches currently in -mm which might be from wuyun.wu@huawei.com are

mm-slub-branch-optimization-in-free-slowpath.patch
mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch
mm-slub-make-add_full-condition-more-explicit.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + memremap-convert-devmap-static-branch-to-incdec.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (123 preceding siblings ...)
  2020-08-19 19:37 ` + mm-slub-make-add_full-condition-more-explicit.patch " Andrew Morton
@ 2020-08-19 19:39 ` Andrew Morton
  2020-08-19 19:53 ` + scripts-tagssh-exclude-tools-directory-from-tags-generation.patch " Andrew Morton
                   ` (14 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:39 UTC (permalink / raw)
  To: dan.j.williams, ira.weiny, mm-commits, vishal.l.verma, william.kucharski


The patch titled
     Subject: mm/memremap.c: convert devmap static branch to {inc,dec}
has been added to the -mm tree.  Its filename is
     memremap-convert-devmap-static-branch-to-incdec.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/memremap-convert-devmap-static-branch-to-incdec.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/memremap-convert-devmap-static-branch-to-incdec.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Ira Weiny <ira.weiny@intel.com>
Subject: mm/memremap.c: convert devmap static branch to {inc,dec}

While reviewing Protection Key Supervisor support it was pointed out that
using a counter to track static branch enable was an anti-pattern which
was better solved using the provided static_branch_{inc,dec} functions.[1]

Fix up devmap_managed_key to work the same way.  Also this should be safer
because there is a very small (very unlikely) race when multiple callers
try to enable at the same time.

[1] https://lore.kernel.org/lkml/20200714194031.GI5523@worktop.programming.kicks-ass.net/

Link: https://lkml.kernel.org/r/20200810235319.2796597-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memremap.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/mm/memremap.c~memremap-convert-devmap-static-branch-to-incdec
+++ a/mm/memremap.c
@@ -40,12 +40,10 @@ EXPORT_SYMBOL_GPL(memremap_compat_align)
 #ifdef CONFIG_DEV_PAGEMAP_OPS
 DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
 EXPORT_SYMBOL(devmap_managed_key);
-static atomic_t devmap_managed_enable;
 
 static void devmap_managed_enable_put(void)
 {
-	if (atomic_dec_and_test(&devmap_managed_enable))
-		static_branch_disable(&devmap_managed_key);
+	static_branch_dec(&devmap_managed_key);
 }
 
 static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
@@ -56,8 +54,7 @@ static int devmap_managed_enable_get(str
 		return -EINVAL;
 	}
 
-	if (atomic_inc_return(&devmap_managed_enable) == 1)
-		static_branch_enable(&devmap_managed_key);
+	static_branch_inc(&devmap_managed_key);
 	return 0;
 }
 #else
_

Patches currently in -mm which might be from ira.weiny@intel.com are

memremap-convert-devmap-static-branch-to-incdec.patch
mm-highmem-clean-up-endif-comments.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + scripts-tagssh-exclude-tools-directory-from-tags-generation.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (124 preceding siblings ...)
  2020-08-19 19:39 ` + memremap-convert-devmap-static-branch-to-incdec.patch " Andrew Morton
@ 2020-08-19 19:53 ` Andrew Morton
  2020-08-19 19:54 ` + docs-vm-fix-mm_count-vs-mm_users-counter-confusion.patch " Andrew Morton
                   ` (13 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:53 UTC (permalink / raw)
  To: corbet, gregkh, masahiroy, mm-commits, rkovhaev, xujialu


The patch titled
     Subject: scripts/tags.sh: exclude tools directory from tags generation
has been added to the -mm tree.  Its filename is
     scripts-tagssh-exclude-tools-directory-from-tags-generation.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/scripts-tagssh-exclude-tools-directory-from-tags-generation.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/scripts-tagssh-exclude-tools-directory-from-tags-generation.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Rustam Kovhaev <rkovhaev@gmail.com>
Subject: scripts/tags.sh: exclude tools directory from tags generation

When COMPILED_SOURCE is set, running 'make ARCH=x86_64 COMPILED_SOURCE=1
cscope tags' in KBUILD_OUTPUT directory produces lots of "No such file or
directory" warnings:

...
realpath: sigchain.h: No such file or directory
realpath: orc_gen.c: No such file or directory
realpath: objtool.c: No such file or directory
...

Let's exclude tools directory from tags generation

Link: https://lkml.kernel.org/r/20200810153650.1822316-1-rkovhaev@gmail.com
Link: https://lore.kernel.org/lkml/20200809210056.GA1344537@thinkpad
Fixes: 4f491bb6ea2a ("scripts/tags.sh: collect compiled source precisely")
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jialu Xu <xujialu@vimux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/tags.sh |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/scripts/tags.sh~scripts-tagssh-exclude-tools-directory-from-tags-generation
+++ a/scripts/tags.sh
@@ -26,7 +26,11 @@ else
 fi
 
 # ignore userspace tools
-ignore="$ignore ( -path ${tree}tools ) -prune -o"
+if [ -n "$COMPILED_SOURCE" ]; then
+	ignore="$ignore ( -path ./tools ) -prune -o"
+else
+	ignore="$ignore ( -path ${tree}tools ) -prune -o"
+fi
 
 # Detect if ALLSOURCE_ARCHS is set. If not, we assume SRCARCH
 if [ "${ALLSOURCE_ARCHS}" = "" ]; then
@@ -92,7 +96,7 @@ all_sources()
 all_compiled_sources()
 {
 	realpath -es $([ -z "$KBUILD_ABS_SRCTREE" ] && echo --relative-to=.) \
-		include/generated/autoconf.h $(find -name "*.cmd" -exec \
+		include/generated/autoconf.h $(find $ignore -name "*.cmd" -exec \
 		grep -Poh '(?(?=^source_.* \K).*|(?=^  \K\S).*(?= \\))' {} \+ |
 		awk '!a[$0]++') | sort -u
 }
_

Patches currently in -mm which might be from rkovhaev@gmail.com are

scripts-tagssh-exclude-tools-directory-from-tags-generation.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + docs-vm-fix-mm_count-vs-mm_users-counter-confusion.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (125 preceding siblings ...)
  2020-08-19 19:53 ` + scripts-tagssh-exclude-tools-directory-from-tags-generation.patch " Andrew Morton
@ 2020-08-19 19:54 ` Andrew Morton
  2020-08-19 20:08 ` + mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch " Andrew Morton
                   ` (12 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 19:54 UTC (permalink / raw)
  To: agordeev, corbet, mm-commits


The patch titled
     Subject: docs/vm: fix 'mm_count' vs 'mm_users' counter confusion
has been added to the -mm tree.  Its filename is
     docs-vm-fix-mm_count-vs-mm_users-counter-confusion.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/docs-vm-fix-mm_count-vs-mm_users-counter-confusion.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/docs-vm-fix-mm_count-vs-mm_users-counter-confusion.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Alexander Gordeev <agordeev@linux.ibm.com>
Subject: docs/vm: fix 'mm_count' vs 'mm_users' counter confusion

In the context of the anonymous address space lifespan description the
'mm_users' reference counter is confused with 'mm_count'.  I.e a "zombie"
mm gets released when "mm_count" becomes zero, not "mm_users".

Link: https://lkml.kernel.org/r/1597040695-32633-1-git-send-email-agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/vm/active_mm.rst |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/Documentation/vm/active_mm.rst~docs-vm-fix-mm_count-vs-mm_users-counter-confusion
+++ a/Documentation/vm/active_mm.rst
@@ -64,7 +64,7 @@ Active MM
  actually get cases where you have a address space that is _only_ used by
  lazy users. That is often a short-lived state, because once that thread
  gets scheduled away in favour of a real thread, the "zombie" mm gets
- released because "mm_users" becomes zero.
+ released because "mm_count" becomes zero.
 
  Also, a new rule is that _nobody_ ever has "init_mm" as a real MM any
  more. "init_mm" should be considered just a "lazy context when no other
_

Patches currently in -mm which might be from agordeev@linux.ibm.com are

docs-vm-fix-mm_count-vs-mm_users-counter-confusion.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (126 preceding siblings ...)
  2020-08-19 19:54 ` + docs-vm-fix-mm_count-vs-mm_users-counter-confusion.patch " Andrew Morton
@ 2020-08-19 20:08 ` Andrew Morton
  2020-08-19 20:14 ` + mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch " Andrew Morton
                   ` (11 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 20:08 UTC (permalink / raw)
  To: hsiangkao, mm-commits, stable, ying.huang


The patch titled
     Subject: mm, THP, swap: fix allocating cluster for swapfile by mistake
has been added to the -mm tree.  Its filename is
     mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Gao Xiang <hsiangkao@redhat.com>
Subject: mm, THP, swap: fix allocating cluster for swapfile by mistake

SWP_FS doesn't mean the device is file-backed swap device, which just
means each writeback request should go through fs by DIO.  Or it'll just
use extents added by .swap_activate(), but it also works as file-backed
swap device.

So in order to achieve the goal of the original patch, SWP_BLKDEV should
be used instead.

FS corruption can be observed with SSD device + XFS + fragmented swapfile
due to CONFIG_THP_SWAP=y.

I reproduced the issue with the following details:

Environment:
QEMU + upstream kernel + buildroot + NVMe (2 GB)

Kernel config:
CONFIG_BLK_DEV_NVME=y
CONFIG_THP_SWAP=y

Some reproducable steps:
mkfs.xfs -f /dev/nvme0n1
mkdir /tmp/mnt
mount /dev/nvme0n1 /tmp/mnt
bs="32k"
sz="1024m"    # doesn't matter too much, I also tried 16m
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw

mkswap /tmp/mnt/sw
swapon /tmp/mnt/sw

stress --vm 2 --vm-bytes 600M   # doesn't matter too much as well

Symptoms:
 - FS corruption (e.g. checksum failure)
 - memory corruption at: 0xd2808010
 - segfault

Link: https://lkml.kernel.org/r/20200819195613.24269-1-hsiangkao@redhat.com
Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device")
Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/swapfile.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/swapfile.c~mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake
+++ a/mm/swapfile.c
@@ -1078,7 +1078,7 @@ start_over:
 			goto nextsi;
 		}
 		if (size == SWAPFILE_CLUSTER) {
-			if (!(si->flags & SWP_FS))
+			if (si->flags & SWP_BLKDEV)
 				n_ret = swap_alloc_cluster(si, swp_entries);
 		} else
 			n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
_

Patches currently in -mm which might be from hsiangkao@redhat.com are

mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (127 preceding siblings ...)
  2020-08-19 20:08 ` + mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch " Andrew Morton
@ 2020-08-19 20:14 ` Andrew Morton
  2020-08-19 20:14 ` + mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch " Andrew Morton
                   ` (10 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 20:14 UTC (permalink / raw)
  To: akpm, mm-commits, richard.weiyang


The patch titled
     Subject: mm/mmap: rename __vma_unlink_common() to __vma_unlink()
has been added to the -mm tree.  Its filename is
     mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Wei Yang <richard.weiyang@linux.alibaba.com>
Subject: mm/mmap: rename __vma_unlink_common() to __vma_unlink()

__vma_unlink_common() and __vma_unlink() are counterparts.  Since there is
no function named __vma_unlink(), let's rename __vma_unlink_common() to
__vma_unlink() to make the code more self-explanatory and easy for
audience to understand.

Otherwise we may expect there are several variants of vma_unlink() and
__vma_unlink_common() is used by them.

Link: https://lkml.kernel.org/r/20200809232057.23477-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mmap.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/mmap.c~mm-mmap-rename-__vma_unlink_common-to-__vma_unlink
+++ a/mm/mmap.c
@@ -677,7 +677,7 @@ static void __insert_vm_struct(struct mm
 	mm->map_count++;
 }
 
-static __always_inline void __vma_unlink_common(struct mm_struct *mm,
+static __always_inline void __vma_unlink(struct mm_struct *mm,
 						struct vm_area_struct *vma,
 						struct vm_area_struct *ignore)
 {
@@ -859,7 +859,7 @@ again:
 		 * us to remove next before dropping the locks.
 		 */
 		if (remove_next != 3)
-			__vma_unlink_common(mm, next, next);
+			__vma_unlink(mm, next, next);
 		else
 			/*
 			 * vma is not before next if they've been
@@ -870,7 +870,7 @@ again:
 			 * "next" (which is stored in post-swap()
 			 * "vma").
 			 */
-			__vma_unlink_common(mm, next, vma);
+			__vma_unlink(mm, next, vma);
 		if (file)
 			__remove_shared_vm_struct(next, file, mapping);
 	} else if (insert) {
_

Patches currently in -mm which might be from richard.weiyang@linux.alibaba.com are

mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch
mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch
mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (128 preceding siblings ...)
  2020-08-19 20:14 ` + mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch " Andrew Morton
@ 2020-08-19 20:14 ` Andrew Morton
  2020-08-19 20:19 ` + mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes.patch " Andrew Morton
                   ` (9 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 20:14 UTC (permalink / raw)
  To: akpm, mm-commits, richard.weiyang


The patch titled
     Subject: mm/mmap: leverage vma_rb_erase_ignore() to implement vma_rb_erase()
has been added to the -mm tree.  Its filename is
     mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Wei Yang <richard.weiyang@linux.alibaba.com>
Subject: mm/mmap: leverage vma_rb_erase_ignore() to implement vma_rb_erase()

These two functions share the same logic except ignore a different vma.

Let's reuse the code.

Link: https://lkml.kernel.org/r/20200809232057.23477-2-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mmap.c |   16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

--- a/mm/mmap.c~mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase
+++ a/mm/mmap.c
@@ -474,8 +474,12 @@ static __always_inline void vma_rb_erase
 {
 	/*
 	 * All rb_subtree_gap values must be consistent prior to erase,
-	 * with the possible exception of the "next" vma being erased if
-	 * next->vm_start was reduced.
+	 * with the possible exception of
+	 *
+	 * a. the "next" vma being erased if next->vm_start was reduced in
+	 *    __vma_adjust() -> __vma_unlink()
+	 * b. the vma being erased in detach_vmas_to_be_unmapped() ->
+	 *    vma_rb_erase()
 	 */
 	validate_mm_rb(root, ignore);
 
@@ -485,13 +489,7 @@ static __always_inline void vma_rb_erase
 static __always_inline void vma_rb_erase(struct vm_area_struct *vma,
 					 struct rb_root *root)
 {
-	/*
-	 * All rb_subtree_gap values must be consistent prior to erase,
-	 * with the possible exception of the vma being erased.
-	 */
-	validate_mm_rb(root, vma);
-
-	__vma_rb_erase(vma, root);
+	vma_rb_erase_ignore(vma, root, vma);
 }
 
 /*
_

Patches currently in -mm which might be from richard.weiyang@linux.alibaba.com are

mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch
mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch
mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (129 preceding siblings ...)
  2020-08-19 20:14 ` + mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch " Andrew Morton
@ 2020-08-19 20:19 ` Andrew Morton
  2020-08-19 20:32 ` + mm-dump_page-rename-head_mapcount-head_compound_mapcount.patch " Andrew Morton
                   ` (8 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 20:19 UTC (permalink / raw)
  To: ari-pekka.verta, cl, iamjoonsoo.kim, keun-o.park, mm-commits,
	penberg, rientjes, thgarnie, timo.simola


The patch titled
     Subject: mm: slub: re-initialize randomized freelist sequence in calculate_sizes
has been added to the -mm tree.  Its filename is
     mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Sahara <keun-o.park@digital14.com>
Subject: mm: slub: re-initialize randomized freelist sequence in calculate_sizes

Slab cache flags are exported to sysfs and are allowed to get modified
from userspace.  Some of those may call calculate_sizes function because
the changed flag can take an effect on slab object size and layout, which
means kmem_cache may have different order and objects.  The freelist
pointer corruption occurs if some slab flags are modified while
CONFIG_SLAB_FREELIST_RANDOM is turned on.

 $ echo 0 > /sys/kernel/slab/zs_handle/store_user
 $ echo 0 > /sys/kernel/slab/zspage/store_user
 $ mkswap /dev/block/zram0
 $ swapon /dev/block/zram0 -p 32758

 =============================================================================
 BUG zs_handle (Not tainted): Freepointer corrupt
 -----------------------------------------------------------------------------

 Disabling lock debugging due to kernel taint
 INFO: Slab 0xffffffbf29603600 objects=102 used=102 fp=0x0000000000000000 flags=0x0200
 INFO: Object 0xffffffca580d8d78 @offset=3448 fp=0xffffffca580d8ed0

 Redzone 00000000f3cddd6c: bb bb bb bb bb bb bb bb                          ........
 Object 0000000082d5d74e: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
 Redzone 000000008fd80359: bb bb bb bb bb bb bb bb                          ........
 Padding 00000000c7f56047: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ

In this example, an Android device tries to use zram as a swap and to turn
off store_user in order to reduce the slub object size.  When
calculate_sizes is called in kmem_cache_open, size, order and objects for
zs_handle is:

 size:360, order:0, objects:22
However, if the SLAB_STORE_USER bit is cleared in store_user_store:
 size: 56, order:1, objects:73

All the size, order, and objects is changed by calculate_sizes(), but the
size of the random_seq array is still old value(22).  As a result,
out-of-bound array access can occur at shuffle_freelist() when slab
allocation is requested.

This patch fixes the problem by re-allocating the random_seq array with
re-calculated correct objects value.

Link: https://lkml.kernel.org/r/20200808095030.13368-1-kpark3469@gmail.com
Fixes: 210e7a43fa905 ("mm: SLUB freelist randomization")
Reported-by: Ari-Pekka Verta <ari-pekka.verta@digital14.com>
Reported-by: Timo Simola <timo.simola@digital14.com>
Signed-off-by: Sahara <keun-o.park@digital14.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |   23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

--- a/mm/slub.c~mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes
+++ a/mm/slub.c
@@ -3781,7 +3781,22 @@ static int calculate_sizes(struct kmem_c
 	if (oo_objects(s->oo) > oo_objects(s->max))
 		s->max = s->oo;
 
-	return !!oo_objects(s->oo);
+	if (!oo_objects(s->oo))
+		return 0;
+
+	/*
+	 * Initialize the pre-computed randomized freelist if slab is up.
+	 * If the randomized freelist random_seq is already initialized,
+	 * free and re-initialize it with re-calculated value.
+	 */
+	if (slab_state >= UP) {
+		if (s->random_seq)
+			cache_random_seq_destroy(s);
+		if (init_cache_random_seq(s))
+			return 0;
+	}
+
+	return 1;
 }
 
 static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
@@ -3825,12 +3840,6 @@ static int kmem_cache_open(struct kmem_c
 	s->remote_node_defrag_ratio = 1000;
 #endif
 
-	/* Initialize the pre-computed randomized freelist if slab is up */
-	if (slab_state >= UP) {
-		if (init_cache_random_seq(s))
-			goto error;
-	}
-
 	if (!init_kmem_cache_nodes(s))
 		goto error;
 
_

Patches currently in -mm which might be from keun-o.park@digital14.com are

mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-dump_page-rename-head_mapcount-head_compound_mapcount.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (130 preceding siblings ...)
  2020-08-19 20:19 ` + mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes.patch " Andrew Morton
@ 2020-08-19 20:32 ` Andrew Morton
  2020-08-19 20:35 ` + bitops-simplify-get_count_order_long.patch " Andrew Morton
                   ` (7 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 20:32 UTC (permalink / raw)
  To: cai, jhubbard, kirill.shutemov, mm-commits, rppt, vbabka,
	william.kucharski, willy


The patch titled
     Subject: mm, dump_page: rename head_mapcount() --> head_compound_mapcount()
has been added to the -mm tree.  Its filename is
     mm-dump_page-rename-head_mapcount-head_compound_mapcount.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-dump_page-rename-head_mapcount-head_compound_mapcount.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-dump_page-rename-head_mapcount-head_compound_mapcount.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: John Hubbard <jhubbard@nvidia.com>
Subject: mm, dump_page: rename head_mapcount() --> head_compound_mapcount()

Rename head_pincount() --> head_compound_pincount().  These names are more
accurate (or less misleading) than the original ones.

Link: https://lkml.kernel.org/r/20200807183358.105097-1-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    8 ++++----
 mm/debug.c         |    6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

--- a/include/linux/mm.h~mm-dump_page-rename-head_mapcount-head_compound_mapcount
+++ a/include/linux/mm.h
@@ -776,7 +776,7 @@ static inline void *kvcalloc(size_t n, s
 extern void kvfree(const void *addr);
 extern void kvfree_sensitive(const void *addr, size_t len);
 
-static inline int head_mapcount(struct page *head)
+static inline int head_compound_mapcount(struct page *head)
 {
 	return atomic_read(compound_mapcount_ptr(head)) + 1;
 }
@@ -790,7 +790,7 @@ static inline int compound_mapcount(stru
 {
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
 	page = compound_head(page);
-	return head_mapcount(page);
+	return head_compound_mapcount(page);
 }
 
 /*
@@ -903,7 +903,7 @@ static inline bool hpage_pincount_availa
 	return PageCompound(page) && compound_order(page) > 1;
 }
 
-static inline int head_pincount(struct page *head)
+static inline int head_compound_pincount(struct page *head)
 {
 	return atomic_read(compound_pincount_ptr(head));
 }
@@ -912,7 +912,7 @@ static inline int compound_pincount(stru
 {
 	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
 	page = compound_head(page);
-	return head_pincount(page);
+	return head_compound_pincount(page);
 }
 
 static inline void set_compound_order(struct page *page, unsigned int order)
--- a/mm/debug.c~mm-dump_page-rename-head_mapcount-head_compound_mapcount
+++ a/mm/debug.c
@@ -102,12 +102,12 @@ void __dump_page(struct page *page, cons
 		if (hpage_pincount_available(page)) {
 			pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
 					head, compound_order(head),
-					head_mapcount(head),
-					head_pincount(head));
+					head_compound_mapcount(head),
+					head_compound_pincount(head));
 		} else {
 			pr_warn("head:%p order:%u compound_mapcount:%d\n",
 					head, compound_order(head),
-					head_mapcount(head));
+					head_compound_mapcount(head));
 		}
 	}
 	if (PageKsm(page))
_

Patches currently in -mm which might be from jhubbard@nvidia.com are

mm-dump_page-rename-head_mapcount-head_compound_mapcount.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + bitops-simplify-get_count_order_long.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (131 preceding siblings ...)
  2020-08-19 20:32 ` + mm-dump_page-rename-head_mapcount-head_compound_mapcount.patch " Andrew Morton
@ 2020-08-19 20:35 ` Andrew Morton
  2020-08-19 20:35 ` + bitops-use-the-same-mechanism-for-get_count_order.patch " Andrew Morton
                   ` (6 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 20:35 UTC (permalink / raw)
  To: andriy.shevchenko, christian.brauner, mm-commits, richard.weiyang


The patch titled
     Subject: bitops: simplify get_count_order_long()
has been added to the -mm tree.  Its filename is
     bitops-simplify-get_count_order_long.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/bitops-simplify-get_count_order_long.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/bitops-simplify-get_count_order_long.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Wei Yang <richard.weiyang@linux.alibaba.com>
Subject: bitops: simplify get_count_order_long()

These two cases could be unified into one.

Link: https://lkml.kernel.org/r/20200807085837.11697-2-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/bitops.h |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/include/linux/bitops.h~bitops-simplify-get_count_order_long
+++ a/include/linux/bitops.h
@@ -206,10 +206,7 @@ static inline int get_count_order_long(u
 {
 	if (l == 0UL)
 		return -1;
-	else if (l & (l - 1UL))
-		return (int)fls_long(l);
-	else
-		return (int)fls_long(l) - 1;
+	return (int)fls_long(--l);
 }
 
 /**
_

Patches currently in -mm which might be from richard.weiyang@linux.alibaba.com are

mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch
mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch
mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch
bitops-simplify-get_count_order_long.patch
bitops-use-the-same-mechanism-for-get_count_order.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + bitops-use-the-same-mechanism-for-get_count_order.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (132 preceding siblings ...)
  2020-08-19 20:35 ` + bitops-simplify-get_count_order_long.patch " Andrew Morton
@ 2020-08-19 20:35 ` Andrew Morton
  2020-08-19 21:14 ` + panic-dump-registers-on-panic_on_warn.patch " Andrew Morton
                   ` (5 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 20:35 UTC (permalink / raw)
  To: andriy.shevchenko, christian.brauner, mm-commits, richard.weiyang


The patch titled
     Subject: bitops: use the same mechanism for get_count_order[_long]
has been added to the -mm tree.  Its filename is
     bitops-use-the-same-mechanism-for-get_count_order.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/bitops-use-the-same-mechanism-for-get_count_order.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/bitops-use-the-same-mechanism-for-get_count_order.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Wei Yang <richard.weiyang@linux.alibaba.com>
Subject: bitops: use the same mechanism for get_count_order[_long]

These two functions share the same logic.

Link: https://lkml.kernel.org/r/20200807085837.11697-3-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/bitops.h |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

--- a/include/linux/bitops.h~bitops-use-the-same-mechanism-for-get_count_order
+++ a/include/linux/bitops.h
@@ -188,12 +188,10 @@ static inline unsigned fls_long(unsigned
 
 static inline int get_count_order(unsigned int count)
 {
-	int order;
+	if (count == 0)
+		return -1;
 
-	order = fls(count) - 1;
-	if (count & (count - 1))
-		order++;
-	return order;
+	return fls(--count);
 }
 
 /**
_

Patches currently in -mm which might be from richard.weiyang@linux.alibaba.com are

mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch
mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch
mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch
bitops-simplify-get_count_order_long.patch
bitops-use-the-same-mechanism-for-get_count_order.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + panic-dump-registers-on-panic_on_warn.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (133 preceding siblings ...)
  2020-08-19 20:35 ` + bitops-use-the-same-mechanism-for-get_count_order.patch " Andrew Morton
@ 2020-08-19 21:14 ` Andrew Morton
  2020-08-19 21:29 ` + mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix.patch " Andrew Morton
                   ` (4 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 21:14 UTC (permalink / raw)
  To: aik, aquini, dianders, keescook, mingo, mm-commits, npiggin, tglx, will


The patch titled
     Subject: panic: dump registers on panic_on_warn
has been added to the -mm tree.  Its filename is
     panic-dump-registers-on-panic_on_warn.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/panic-dump-registers-on-panic_on_warn.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/panic-dump-registers-on-panic_on_warn.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Alexey Kardashevskiy <aik@ozlabs.ru>
Subject: panic: dump registers on panic_on_warn

Currently we print stack and registers for ordinary warnings but we do not
for panic_on_warn which looks as oversight - panic() will reboot the
machine but won't print registers.

This moves printing of registers and modules earlier.

This does not move the stack dumping as panic() dumps it.

Link: https://lkml.kernel.org/r/20200804095054.68724-1-aik@ozlabs.ru
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/panic.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/kernel/panic.c~panic-dump-registers-on-panic_on_warn
+++ a/kernel/panic.c
@@ -589,6 +589,11 @@ void __warn(const char *file, int line,
 	if (args)
 		vprintk(args->fmt, args->args);
 
+	print_modules();
+
+	if (regs)
+		show_regs(regs);
+
 	if (panic_on_warn) {
 		/*
 		 * This thread may hit another WARN() in the panic path.
@@ -600,12 +605,7 @@ void __warn(const char *file, int line,
 		panic("panic_on_warn set ...\n");
 	}
 
-	print_modules();
-
-	if (regs)
-		show_regs(regs);
-	else
-		dump_stack();
+	dump_stack();
 
 	print_irqtrace_events(current);
 
_

Patches currently in -mm which might be from aik@ozlabs.ru are

panic-dump-registers-on-panic_on_warn.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (134 preceding siblings ...)
  2020-08-19 21:14 ` + panic-dump-registers-on-panic_on_warn.patch " Andrew Morton
@ 2020-08-19 21:29 ` Andrew Morton
  2020-08-19 21:31 ` + checkpatch-add-test-for-comma-use-that-should-be-semicolon.patch " Andrew Morton
                   ` (3 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 21:29 UTC (permalink / raw)
  To: akpm, keun-o.park, mm-commits, thgarnie


The patch titled
     Subject: mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix
has been added to the -mm tree.  Its filename is
     mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix

simplification, per Thomas

Cc: Sahara <keun-o.park@digital14.com>
Cc: Thomas Garnier <thgarnie@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/mm/slub.c~mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix
+++ a/mm/slub.c
@@ -3786,8 +3786,7 @@ static int calculate_sizes(struct kmem_c
 	 * free and re-initialize it with re-calculated value.
 	 */
 	if (slab_state >= UP) {
-		if (s->random_seq)
-			cache_random_seq_destroy(s);
+		cache_random_seq_destroy(s);
 		if (init_cache_random_seq(s))
 			return 0;
 	}
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix.patch
mm.patch
memblock-make-memblock_debug-and-related-functionality-private-fix.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix-2.patch
kernel-forkc-export-kernel_thread-to-modules.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + checkpatch-add-test-for-comma-use-that-should-be-semicolon.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (135 preceding siblings ...)
  2020-08-19 21:29 ` + mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix.patch " Andrew Morton
@ 2020-08-19 21:31 ` Andrew Morton
  2020-08-19 21:43 ` + mm-memcontrol-use-flex_array_size-helper-in-memcpy.patch " Andrew Morton
                   ` (2 subsequent siblings)
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 21:31 UTC (permalink / raw)
  To: joe, mm-commits


The patch titled
     Subject: checkpatch: add test for comma use that should be semicolon
has been added to the -mm tree.  Its filename is
     checkpatch-add-test-for-comma-use-that-should-be-semicolon.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/checkpatch-add-test-for-comma-use-that-should-be-semicolon.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/checkpatch-add-test-for-comma-use-that-should-be-semicolon.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Joe Perches <joe@perches.com>
Subject: checkpatch: add test for comma use that should be semicolon

There are commas used as statement terminations that should typically have
used semicolons instead.  Only direct assignments or use of a single
function or value on a single line are detected by this test.

e.g.:
	foo = bar(),		/* typical use is semicolon not comma */
	bar = baz();

Add an imperfect test to detect these comma uses.

No false positives were found in testing, but many types of false
negatives are possible.

e.g.:
	foo = bar() + 1,	/* comma use, but not direct assignment */
	bar = baz();

Link: https://lkml.kernel.org/r/3bf27caf462007dfa75647b040ab3191374a59de.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- a/scripts/checkpatch.pl~checkpatch-add-test-for-comma-use-that-should-be-semicolon
+++ a/scripts/checkpatch.pl
@@ -4942,6 +4942,17 @@ sub process {
 			}
 		}
 
+# check if a statement with a comma should be two statements like:
+#	foo = bar(),	/* comma should be semicolon */
+#	bar = baz();
+		if (defined($stat) &&
+		    $stat =~ /^\+\s*(?:$Lval\s*$Assignment\s*)?$FuncArg\s*,\s*(?:$Lval\s*$Assignment\s*)?$FuncArg\s*;\s*$/) {
+			my $cnt = statement_rawlines($stat);
+			my $herectx = get_stat_here($linenr, $cnt, $here);
+			WARN("SUSPECT_COMMA_SEMICOLON",
+			     "Possible comma where semicolon could be used\n" . $herectx);
+		}
+
 # return is not a function
 		if (defined($stat) && $stat =~ /^.\s*return(\s*)\(/s) {
 			my $spacing = $1;
_

Patches currently in -mm which might be from joe@perches.com are

checkpatch-test-git_dir-changes.patch
checkpatch-move-repeated-word-test.patch
checkpatch-add-test-for-comma-use-that-should-be-semicolon.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memcontrol-use-flex_array_size-helper-in-memcpy.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (136 preceding siblings ...)
  2020-08-19 21:31 ` + checkpatch-add-test-for-comma-use-that-should-be-semicolon.patch " Andrew Morton
@ 2020-08-19 21:43 ` Andrew Morton
  2020-08-19 21:43 ` + mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type.patch " Andrew Morton
  2020-08-19 23:09 ` mmotm 2020-08-19-16-09 uploaded Andrew Morton
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 21:43 UTC (permalink / raw)
  To: gustavoars, hannes, mhocko, mm-commits, vdavydov.dev


The patch titled
     Subject: mm: memcontrol: use flex_array_size() helper in memcpy()
has been added to the -mm tree.  Its filename is
     mm-memcontrol-use-flex_array_size-helper-in-memcpy.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-use-flex_array_size-helper-in-memcpy.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-use-flex_array_size-helper-in-memcpy.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Subject: mm: memcontrol: use flex_array_size() helper in memcpy()

Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure.

This helper offers defense-in-depth against potential integer overflows,
while at the same time makes it explicitly clear that we are dealing with
a flexible array member.

Also, remove unnecessary braces.

Link: https://lkml.kernel.org/r/ddd60dae2d9aea1ccdd2be66634815c93696125e.1596214831.git.gustavoars@kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/mm/memcontrol.c~mm-memcontrol-use-flex_array_size-helper-in-memcpy
+++ a/mm/memcontrol.c
@@ -4253,10 +4253,9 @@ static int __mem_cgroup_usage_register_e
 	new->size = size;
 
 	/* Copy thresholds (if any) to new array */
-	if (thresholds->primary) {
-		memcpy(new->entries, thresholds->primary->entries, (size - 1) *
-				sizeof(struct mem_cgroup_threshold));
-	}
+	if (thresholds->primary)
+		memcpy(new->entries, thresholds->primary->entries,
+		       flex_array_size(new, entries, size - 1));
 
 	/* Add new threshold */
 	new->entries[size - 1].eventfd = eventfd;
_

Patches currently in -mm which might be from gustavoars@kernel.org are

mm-memcontrol-use-flex_array_size-helper-in-memcpy.patch
mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* + mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type.patch added to -mm tree
  2020-08-15  0:29 incoming Andrew Morton
                   ` (137 preceding siblings ...)
  2020-08-19 21:43 ` + mm-memcontrol-use-flex_array_size-helper-in-memcpy.patch " Andrew Morton
@ 2020-08-19 21:43 ` Andrew Morton
  2020-08-19 23:09 ` mmotm 2020-08-19-16-09 uploaded Andrew Morton
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 21:43 UTC (permalink / raw)
  To: gustavoars, hannes, mhocko, mm-commits, vdavydov.dev


The patch titled
     Subject: mm: memcontrol: Use the preferred form for passing the size of a structure type
has been added to the -mm tree.  Its filename is
     mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Subject: mm: memcontrol: Use the preferred form for passing the size of a structure type

Use the preferred form for passing the size of a structure type.  The
alternative form where the structure type is spelled out hurts readability
and introduces an opportunity for a bug when the object type is changed
but the corresponding object identifier to which the sizeof operator is
applied is not.

Link: https://lkml.kernel.org/r/773e013ff2f07fe2a0b47153f14dea054c0c04f1.1596214831.git.gustavoars@kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/memcontrol.c~mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type
+++ a/mm/memcontrol.c
@@ -4262,7 +4262,7 @@ static int __mem_cgroup_usage_register_e
 	new->entries[size - 1].threshold = threshold;
 
 	/* Sort thresholds. Registering of new threshold isn't time-critical */
-	sort(new->entries, size, sizeof(struct mem_cgroup_threshold),
+	sort(new->entries, size, sizeof(*new->entries),
 			compare_thresholds, NULL);
 
 	/* Find current threshold */
_

Patches currently in -mm which might be from gustavoars@kernel.org are

mm-memcontrol-use-flex_array_size-helper-in-memcpy.patch
mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type.patch


^ permalink raw reply	[flat|nested] 285+ messages in thread

* mmotm 2020-08-19-16-09 uploaded
  2020-08-15  0:29 incoming Andrew Morton
                   ` (138 preceding siblings ...)
  2020-08-19 21:43 ` + mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type.patch " Andrew Morton
@ 2020-08-19 23:09 ` Andrew Morton
  139 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-08-19 23:09 UTC (permalink / raw)
  To: broonie, linux-fsdevel, linux-kernel, linux-mm, linux-next,
	mhocko, mm-commits, sfr

The mm-of-the-moment snapshot 2020-08-19-16-09 has been uploaded to

   https://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

https://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (5.x
or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
https://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE-yyyy-mm-dd-hh-mm-ss.  Both contain the string yyyy-mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

	https://github.com/hnaz/linux-mm

The directory https://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is also available at

	https://github.com/hnaz/linux-mm



This mmotm tree contains the following patches against 5.9-rc1:
(patches marked "*" will be included in linux-next)

  origin.patch
* mailmap-add-andi-kleen.patch
* hugetlb_cgroup-convert-comma-to-semicolon.patch
* khugepaged-adjust-vm_bug_on_mm-in-__khugepaged_enter.patch
* mm-vunmap-add-cond_resched-in-vunmap_pmd_range.patch
* mm-slub-fix-conversion-of-freelist_corrupted.patch
* proc-kpageflags-prevent-an-integer-overflow-in-stable_page_flags.patch
* proc-kpageflags-do-not-use-uninitialized-struct-pages.patch
* mm-fix-missing-function-declaration.patch
* fork-silence-a-false-postive-warning-in-__mmdrop.patch
* romfs-fix-uninitialized-memory-leak-in-romfs_dev_read.patch
* kernel-relayc-fix-memleak-on-destroy-relay-channel.patch
* uprobes-__replace_page-avoid-bug-in-munlock_vma_page.patch
* squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch
* mm-include-cma-pages-in-lowmem_reserve-at-boot.patch
* mm-page_alloc-fix-core-hung-in-free_pcppages_bulk.patch
* mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes.patch
* mm-slub-re-initialize-randomized-freelist-sequence-in-calculate_sizes-fix.patch
* mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
* checkpatch-test-git_dir-changes.patch
* scripts-tagssh-exclude-tools-directory-from-tags-generation.patch
* fs-ocfs2-delete-repeated-words-in-comments.patch
* ocfs2-clear-links-count-in-ocfs2_mknod-if-an-error-occurs.patch
* ocfs2-fix-ocfs2-corrupt-when-iputting-an-inode.patch
* ramfs-support-o_tmpfile.patch
* kernel-watchdog-flush-all-printk-nmi-buffers-when-hardlockup-detected.patch
  mm.patch
* mm-slub-branch-optimization-in-free-slowpath.patch
* mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch
* mm-slub-make-add_full-condition-more-explicit.patch
* device-dax-fix-mismatches-of-request_mem_region.patch
* mm-debug-do-not-dereference-i_ino-blindly.patch
* mm-dump_page-rename-head_mapcount-head_compound_mapcount.patch
* mm-gup_benchmark-use-pin_user_pages-for-foll_longterm-flag.patch
* mm-gup-dont-permit-users-to-call-get_user_pages-with-foll_longterm.patch
* mm-remove-activate_page-from-unuse_pte.patch
* mm-remove-superfluous-__clearpageactive.patch
* mm-remove-superfluous-__clearpagewaiters.patch
* memremap-convert-devmap-static-branch-to-incdec.patch
* mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
* mm-memcg-remove-useless-check-on-page-mem_cgroup.patch
* mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
* mm-thp-clean-up-lru_add_page_tail.patch
* mm-thp-remove-code-path-which-never-got-into.patch
* mm-thp-narrow-lru-locking.patch
* mm-memcontrol-use-flex_array_size-helper-in-memcpy.patch
* mm-memcontrol-use-the-preferred-form-for-passing-the-size-of-a-structure-type.patch
* mm-account-pmd-tables-like-pte-tables.patch
* mm-memory-fix-typo-in-__do_fault-comment.patch
* mm-memoryc-replace-vmf-vma-with-variable-vma.patch
* mm-mmap-rename-__vma_unlink_common-to-__vma_unlink.patch
* mm-mmap-leverage-vma_rb_erase_ignore-to-implement-vma_rb_erase.patch
* mmap-locking-api-add-mmap_lock_is_contended.patch
* mm-smaps-extend-smap_gather_stats-to-support-specified-beginning.patch
* mm-proc-smaps_rollup-do-not-stall-write-attempts-on-mmap_lock.patch
* mm-mmap-fix-the-adjusted-length-error.patch
* mm-dmapoolc-replace-open-coded-list_for_each_entry_safe.patch
* mm-dmapoolc-replace-hard-coded-function-name-with-__func__.patch
* mm-memory-failure-do-pgoff-calculation-before-for_each_process.patch
* docs-vm-fix-mm_count-vs-mm_users-counter-confusion.patch
* mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch
* mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch
* mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch
* mm-page_isolation-cleanup-set_migratetype_isolate.patch
* virtio-mem-dont-special-case-zone_movable.patch
* mm-document-semantics-of-zone_movable.patch
* mm-huge_memoryc-update-tlb-entry-if-pmd-is-changed.patch
* mips-do-not-call-flush_tlb_all-when-setting-pmd-entry.patch
* kvm-ppc-book3s-hv-simplify-kvm_cma_reserve.patch
* dma-contiguous-simplify-cma_early_percent_memory.patch
* arm-xtensa-simplify-initialization-of-high-memory-pages.patch
* arm64-numa-simplify-dummy_numa_init.patch
* h8300-nds32-openrisc-simplify-detection-of-memory-extents.patch
* riscv-drop-unneeded-node-initialization.patch
* mircoblaze-drop-unneeded-numa-and-sparsemem-initializations.patch
* memblock-make-for_each_memblock_type-iterator-private.patch
* memblock-make-memblock_debug-and-related-functionality-private.patch
* memblock-make-memblock_debug-and-related-functionality-private-fix.patch
* memblock-reduce-number-of-parameters-in-for_each_mem_range.patch
* arch-mm-replace-for_each_memblock-with-for_each_mem_pfn_range.patch
* arch-drivers-replace-for_each_membock-with-for_each_mem_range.patch
* x86-setup-simplify-initrd-relocation-and-reservation.patch
* x86-setup-simplify-reserve_crashkernel.patch
* memblock-remove-unused-memblock_mem_size.patch
* memblock-implement-for_each_reserved_mem_region-using-__next_mem_region.patch
* memblock-use-separate-iterators-for-memory-and-reserved-regions.patch
* mmhwpoison-cleanup-unused-pagehuge-check.patch
* mm-hwpoison-remove-recalculating-hpage.patch
* mmhwpoison-inject-dont-pin-for-hwpoison_filter.patch
* mmhwpoison-un-export-get_hwpoison_page-and-make-it-static.patch
* mmhwpoison-kill-put_hwpoison_page.patch
* mmhwpoison-unify-thp-handling-for-hard-and-soft-offline.patch
* mmhwpoison-rework-soft-offline-for-free-pages.patch
* mmhwpoison-rework-soft-offline-for-in-use-pages.patch
* mmhwpoison-refactor-soft_offline_huge_page-and-__soft_offline_page.patch
* mmhwpoison-return-0-if-the-page-is-already-poisoned-in-soft-offline.patch
* mmhwpoison-introduce-mf_msg_unsplit_thp.patch
* mmhwpoison-double-check-page-count-in-__get_any_page.patch
* mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings.patch
* mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix.patch
* mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix-2.patch
* mm-util-update-the-kerneldoc-for-kstrdup_const.patch
* mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch
* mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch
* mm-memory_hotplug-simplify-page-offlining.patch
* mm-page_alloc-simplify-__offline_isolated_pages.patch
* mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch
* mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch
* mm-memory_hotplug-simplify-page-onlining.patch
* mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch
* mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch
* mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch
* mm-slab-remove-duplicate-include.patch
* mm-page_reporting-drop-stale-list-head-check-in-page_reporting_cycle.patch
* mm-highmem-clean-up-endif-comments.patch
* info-task-hung-in-generic_file_write_iter.patch
* info-task-hung-in-generic_file_write-fix.patch
* kernel-hung_taskc-monitor-killed-tasks.patch
* proc-sysctl-make-protected_-world-readable.patch
* fs-configfs-delete-repeated-words-in-comments.patch
* bitops-simplify-get_count_order_long.patch
* bitops-use-the-same-mechanism-for-get_count_order.patch
* checkpatch-add-kconfig-prefix.patch
* checkpatch-move-repeated-word-test.patch
* checkpatch-add-test-for-comma-use-that-should-be-semicolon.patch
* panic-dump-registers-on-panic_on_warn.patch
* aio-simplify-read_events.patch
* proc-add-struct-mount-struct-super_block-addr-in-lx-mounts-command.patch
* tasks-add-headers-and-improve-spacing-format.patch
* romfs-support-inode-blocks-calculation.patch
  linux-next.patch
* ia64-fix-build-error-with-coredump.patch
* mm-madvise-pass-task-and-mm-to-do_madvise.patch
* pid-move-pidfd_get_pid-to-pidc.patch
* mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api.patch
* mm-madvise-introduce-process_madvise-syscall-an-external-memory-hinting-api-fix.patch
* mm-madvise-check-fatal-signal-pending-of-target-process.patch
* mm-memory-failure-remove-a-wrapper-for-alloc_migration_target.patch
* mm-memory_hotplug-remove-a-wrapper-for-alloc_migration_target.patch
* mm-migrate-avoid-possible-unnecessary-process-right-check-in-kernel_move_pages.patch
* mm-mmap-add-inline-vma_next-for-readability-of-mmap-code.patch
* mm-mmap-add-inline-munmap_vma_range-for-code-readability.patch
  make-sure-nobodys-leaking-resources.patch
  releasing-resources-with-children.patch
  mutex-subsystem-synchro-test-module.patch
  kernel-forkc-export-kernel_thread-to-modules.patch
  workaround-for-a-pci-restoring-bug.patch

^ permalink raw reply	[flat|nested] 285+ messages in thread

* Re: incoming
  2021-01-12 23:48 incoming Andrew Morton
@ 2021-01-15 23:32 ` Linus Torvalds
  0 siblings, 0 replies; 285+ messages in thread
From: Linus Torvalds @ 2021-01-15 23:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Tue, Jan 12, 2021 at 3:48 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 10 patches, based on e609571b5ffa3528bf85292de1ceaddac342bc1c.

Whee. I had completely dropped the ball on this - I had built my usual
"akpm" branch with the patches, but then had completely forgotten
about it after doing my basic build tests.

I tend to leave it for a while to see if people send belated ACK/NAK's
for the patches, but that "for a while" is typically "overnight", not
several days.

So if you ever notice that I haven't merged your patch submission, and
you haven't seen me comment on them, feel free to ping me to remind
me.

Because it might just have gotten lost in the shuffle for some random
reason. Admittedly it's rare - I think this is the first time I just
randomly noticed three days later that I'd never done the actual merge
of the patch-series).

               Linus

^ permalink raw reply	[flat|nested] 285+ messages in thread

* incoming
@ 2021-01-12 23:48 Andrew Morton
  2021-01-15 23:32 ` incoming Linus Torvalds
  0 siblings, 1 reply; 285+ messages in thread
From: Andrew Morton @ 2021-01-12 23:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

10 patches, based on e609571b5ffa3528bf85292de1ceaddac342bc1c.

Subsystems affected by this patch series:

  mm/slub
  mm/pagealloc
  mm/memcg
  mm/kasan
  mm/vmalloc
  mm/migration
  mm/hugetlb
  MAINTAINERS
  mm/memory-failure
  mm/process_vm_access

Subsystem: mm/slub

    Jann Horn <jannh@google.com>:
      mm, slub: consider rest of partial list if acquire_slab() fails

Subsystem: mm/pagealloc

    Hailong liu <liu.hailong6@zte.com.cn>:
      mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint

Subsystem: mm/memcg

    Hugh Dickins <hughd@google.com>:
      mm/memcontrol: fix warning in mem_cgroup_page_lruvec()

Subsystem: mm/kasan

    Hailong Liu <liu.hailong6@zte.com.cn>:
      arm/kasan: fix the array size of kasan_early_shadow_pte[]

Subsystem: mm/vmalloc

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/vmalloc.c: fix potential memory leak

Subsystem: mm/migration

    Jan Stancek <jstancek@redhat.com>:
      mm: migrate: initialize err in do_migrate_pages

Subsystem: mm/hugetlb

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/hugetlb: fix potential missing huge page size info

Subsystem: MAINTAINERS

    Vlastimil Babka <vbabka@suse.cz>:
      MAINTAINERS: add Vlastimil as slab allocators maintainer

Subsystem: mm/memory-failure

    Oscar Salvador <osalvador@suse.de>:
      mm,hwpoison: fix printing of page flags

Subsystem: mm/process_vm_access

    Andrew Morton <akpm@linux-foundation.org>:
      mm/process_vm_access.c: include compat.h

 MAINTAINERS                |    1 +
 include/linux/kasan.h      |    6 +++++-
 include/linux/memcontrol.h |    2 +-
 mm/hugetlb.c               |    2 +-
 mm/kasan/init.c            |    3 ++-
 mm/memory-failure.c        |    2 +-
 mm/mempolicy.c             |    2 +-
 mm/page_alloc.c            |   31 ++++++++++++++++---------------
 mm/process_vm_access.c     |    1 +
 mm/slub.c                  |    2 +-
 mm/vmalloc.c               |    4 +++-
 11 files changed, 33 insertions(+), 23 deletions(-)


^ permalink raw reply	[flat|nested] 285+ messages in thread

* incoming
@ 2020-12-29 23:13 Andrew Morton
  0 siblings, 0 replies; 285+ messages in thread
From: Andrew Morton @ 2020-12-29 23:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

16 patches, based on dea8dcf2a9fa8cc540136a6cd885c3beece16ec3.

Subsystems affected by this patch series:

  mm/selftests
  mm/hugetlb
  kbuild
  checkpatch
  mm/pagecache
  mm/mremap
  mm/kasan
  misc
  lib
  mm/slub

Subsystem: mm/selftests

    Harish <harish@linux.ibm.com>:
      selftests/vm: fix building protection keys test

Subsystem: mm/hugetlb

    Mike Kravetz <mike.kravetz@oracle.com>:
      mm/hugetlb: fix deadlock in hugetlb_cow error path

Subsystem: kbuild

    Masahiro Yamada <masahiroy@kernel.org>:
      Revert "kbuild: avoid static_assert for genksyms"

Subsystem: checkpatch

    Joe Perches <joe@perches.com>:
      checkpatch: prefer strscpy to strlcpy

Subsystem: mm/pagecache

    Souptick Joarder <jrdr.linux@gmail.com>:
      mm: add prototype for __add_to_page_cache_locked()

    Baoquan He <bhe@redhat.com>:
      mm: memmap defer init doesn't work as expected

Subsystem: mm/mremap

    Kalesh Singh <kaleshsingh@google.com>:
      mm/mremap.c: fix extent calculation

    Nicholas Piggin <npiggin@gmail.com>:
      mm: generalise COW SMC TLB flushing race comment

Subsystem: mm/kasan

    Walter Wu <walter-zh.wu@mediatek.com>:
      kasan: fix null pointer dereference in kasan_record_aux_stack

Subsystem: misc

    Randy Dunlap <rdunlap@infradead.org>:
      local64.h: make <asm/local64.h> mandatory

    Huang Shijie <sjhuang@iluvatar.ai>:
      sizes.h: add SZ_8G/SZ_16G/SZ_32G macros

    Josh Poimboeuf <jpoimboe@redhat.com>:
      kdev_t: always inline major/minor helper functions

Subsystem: lib

    Huang Shijie <sjhuang@iluvatar.ai>:
      lib/genalloc: fix the overflow when size is too big

    Ilya Leoshkevich <iii@linux.ibm.com>:
      lib/zlib: fix inflating zlib streams on s390

    Randy Dunlap <rdunlap@infradead.org>:
      zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c

Subsystem: mm/slub

    Roman Gushchin <guro@fb.com>:
      mm: slub: call account_slab_page() after slab page initialization

 arch/alpha/include/asm/local64.h    |    1 -
 arch/arc/include/asm/Kbuild         |    1 -
 arch/arm/include/asm/Kbuild         |    1 -
 arch/arm64/include/asm/Kbuild       |    1 -
 arch/csky/include/asm/Kbuild        |    1 -
 arch/h8300/include/asm/Kbuild       |    1 -
 arch/hexagon/include/asm/Kbuild     |    1 -
 arch/ia64/include/asm/local64.h     |    1 -
 arch/ia64/mm/init.c                 |    4 ++--
 arch/m68k/include/asm/Kbuild        |    1 -
 arch/microblaze/include/asm/Kbuild  |    1 -
 arch/mips/include/asm/Kbuild        |    1 -
 arch/nds32/include/asm/Kbuild       |    1 -
 arch/openrisc/include/asm/Kbuild    |    1 -
 arch/parisc/include/asm/Kbuild      |    1 -
 arch/powerpc/include/asm/Kbuild     |    1 -
 arch/riscv/include/asm/Kbuild       |    1 -
 arch/s390/include/asm/Kbuild        |    1 -
 arch/sh/include/asm/Kbuild          |    1 -
 arch/sparc/include/asm/Kbuild       |    1 -
 arch/x86/include/asm/local64.h      |    1 -
 arch/xtensa/include/asm/Kbuild      |    1 -
 include/asm-generic/Kbuild          |    1 +
 include/linux/build_bug.h           |    5 -----
 include/linux/kdev_t.h              |   22 +++++++++++-----------
 include/linux/mm.h                  |   12 ++++++++++--
 include/linux/sizes.h               |    3 +++
 lib/genalloc.c                      |   25 +++++++++++++------------
 lib/zlib_dfltcc/Makefile            |    2 +-
 lib/zlib_dfltcc/dfltcc.c            |    6 +++++-
 lib/zlib_dfltcc/dfltcc_deflate.c    |    3 +++
 lib/zlib_dfltcc/dfltcc_inflate.c    |    4 ++--
 lib/zlib_dfltcc/dfltcc_syms.c       |   17 -----------------
 mm/hugetlb.c                        |   22 +++++++++++++++++++++-
 mm/kasan/generic.c                  |    2 ++
 mm/memory.c                         |    8 +++++---
 mm/memory_hotplug.c                 |    2 +-
 mm/mremap.c                         |    4 +++-
 mm/page_alloc.c                     |    8 +++++---
 mm/slub.c                           |    5 ++---
 scripts/checkpatch.pl               |    6 ++++++
 tools/testing/selftests/vm/Makefile |   10 +++++-----
 42 files changed, 101 insertions(+), 91 deletions(-)


^ permalink raw reply	[flat|nested] 285+ messages in thread

* Re: incoming
  2020-12-22 19:58 incoming Andrew Morton
@ 2020-12-22 21:43 ` Linus Torvalds
  0 siblings, 0 replies; 285+ messages in thread
From: Linus Torvalds @ 2020-12-22 21:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Tue, Dec 22, 2020 at 11:58 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> 60 patches, based on 8653b778e454a7708847aeafe689bce07aeeb94e.

I see that you enabled renaming in the patches. Lovely.

Can you also enable it in the diffstat?

>  74 files changed, 2869 insertions(+), 1553 deletions(-)

With -M in the diffstat, you should have seen

 72 files changed, 2775 insertions(+), 1460 deletions(-)

and if you add "--summary", you'll also see the rename part ofthe file
create/delete summary:

 rename mm/kasan/{tags_report.c => report_sw_tags.c} (78%)

which is often nice to see in addition to the line stats..

           Linus

^ permalink raw reply	[flat|nested] 285+ messages in thread

* incoming
@ 2020-12-22 19:58 Andrew Morton
  2020-12-22 21:43 ` incoming Linus Torvalds
  0 siblings, 1 reply; 285+ messages in thread
From: Andrew Morton @ 2020-12-22 19:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


60 patches, based on 8653b778e454a7708847aeafe689bce07aeeb94e.

Subsystems affected by this patch series:

  mm/kasan

Subsystem: mm/kasan

    Andrey Konovalov <andreyknvl@google.com>:
    Patch series "kasan: add hardware tag-based mode for arm64", v11:
      kasan: drop unnecessary GPL text from comment headers
      kasan: KASAN_VMALLOC depends on KASAN_GENERIC
      kasan: group vmalloc code
      kasan: shadow declarations only for software modes
      kasan: rename (un)poison_shadow to (un)poison_range
      kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_*
      kasan: only build init.c for software modes
      kasan: split out shadow.c from common.c
      kasan: define KASAN_MEMORY_PER_SHADOW_PAGE
      kasan: rename report and tags files
      kasan: don't duplicate config dependencies
      kasan: hide invalid free check implementation
      kasan: decode stack frame only with KASAN_STACK_ENABLE
      kasan, arm64: only init shadow for software modes
      kasan, arm64: only use kasan_depth for software modes
      kasan, arm64: move initialization message
      kasan, arm64: rename kasan_init_tags and mark as __init
      kasan: rename addr_has_shadow to addr_has_metadata
      kasan: rename print_shadow_for_address to print_memory_metadata
      kasan: rename SHADOW layout macros to META
      kasan: separate metadata_fetch_row for each mode
      kasan: introduce CONFIG_KASAN_HW_TAGS

    Vincenzo Frascino <vincenzo.frascino@arm.com>:
      arm64: enable armv8.5-a asm-arch option
      arm64: mte: add in-kernel MTE helpers
      arm64: mte: reset the page tag in page->flags
      arm64: mte: add in-kernel tag fault handler
      arm64: kasan: allow enabling in-kernel MTE
      arm64: mte: convert gcr_user into an exclude mask
      arm64: mte: switch GCR_EL1 in kernel entry and exit
      kasan, mm: untag page address in free_reserved_area

    Andrey Konovalov <andreyknvl@google.com>:
      arm64: kasan: align allocations for HW_TAGS
      arm64: kasan: add arch layer for memory tagging helpers
      kasan: define KASAN_GRANULE_SIZE for HW_TAGS
      kasan, x86, s390: update undef CONFIG_KASAN
      kasan, arm64: expand CONFIG_KASAN checks
      kasan, arm64: implement HW_TAGS runtime
      kasan, arm64: print report from tag fault handler
      kasan, mm: reset tags when accessing metadata
      kasan, arm64: enable CONFIG_KASAN_HW_TAGS
      kasan: add documentation for hardware tag-based mode

    Vincenzo Frascino <vincenzo.frascino@arm.com>:
      kselftest/arm64: check GCR_EL1 after context switch

    Andrey Konovalov <andreyknvl@google.com>:
    Patch series "kasan: boot parameters for hardware tag-based mode", v4:
      kasan: simplify quarantine_put call site
      kasan: rename get_alloc/free_info
      kasan: introduce set_alloc_info
      kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK
      kasan: allow VMAP_STACK for HW_TAGS mode
      kasan: remove __kasan_unpoison_stack
      kasan: inline kasan_reset_tag for tag-based modes
      kasan: inline random_tag for HW_TAGS
      kasan: open-code kasan_unpoison_slab
      kasan: inline (un)poison_range and check_invalid_free
      kasan: add and integrate kasan boot parameters
      kasan, mm: check kasan_enabled in annotations
      kasan, mm: rename kasan_poison_kfree
      kasan: don't round_up too much
      kasan: simplify assign_tag and set_tag calls
      kasan: clarify comment in __kasan_kfree_large
      kasan: sanitize objects when metadata doesn't fit
      kasan, mm: allow cache merging with no metadata
      kasan: update documentation

 Documentation/dev-tools/kasan.rst                         |  274 ++-
 arch/Kconfig                                              |    8 
 arch/arm64/Kconfig                                        |    9 
 arch/arm64/Makefile                                       |    7 
 arch/arm64/include/asm/assembler.h                        |    2 
 arch/arm64/include/asm/cache.h                            |    3 
 arch/arm64/include/asm/esr.h                              |    1 
 arch/arm64/include/asm/kasan.h                            |   17 
 arch/arm64/include/asm/memory.h                           |   15 
 arch/arm64/include/asm/mte-def.h                          |   16 
 arch/arm64/include/asm/mte-kasan.h                        |   67 
 arch/arm64/include/asm/mte.h                              |   22 
 arch/arm64/include/asm/processor.h                        |    2 
 arch/arm64/include/asm/string.h                           |    5 
 arch/arm64/include/asm/uaccess.h                          |   23 
 arch/arm64/kernel/asm-offsets.c                           |    3 
 arch/arm64/kernel/cpufeature.c                            |    3 
 arch/arm64/kernel/entry.S                                 |   41 
 arch/arm64/kernel/head.S                                  |    2 
 arch/arm64/kernel/hibernate.c                             |    5 
 arch/arm64/kernel/image-vars.h                            |    2 
 arch/arm64/kernel/kaslr.c                                 |    3 
 arch/arm64/kernel/module.c                                |    6 
 arch/arm64/kernel/mte.c                                   |  124 +
 arch/arm64/kernel/setup.c                                 |    2 
 arch/arm64/kernel/sleep.S                                 |    2 
 arch/arm64/kernel/smp.c                                   |    2 
 arch/arm64/lib/mte.S                                      |   16 
 arch/arm64/mm/copypage.c                                  |    9 
 arch/arm64/mm/fault.c                                     |   59 
 arch/arm64/mm/kasan_init.c                                |   41 
 arch/arm64/mm/mteswap.c                                   |    9 
 arch/arm64/mm/proc.S                                      |   23 
 arch/arm64/mm/ptdump.c                                    |    6 
 arch/s390/boot/string.c                                   |    1 
 arch/x86/boot/compressed/misc.h                           |    1 
 arch/x86/kernel/acpi/wakeup_64.S                          |    2 
 include/linux/kasan-checks.h                              |    2 
 include/linux/kasan.h                                     |  423 ++++-
 include/linux/mm.h                                        |   24 
 include/linux/moduleloader.h                              |    3 
 include/linux/page-flags-layout.h                         |    2 
 include/linux/sched.h                                     |    2 
 include/linux/string.h                                    |    2 
 init/init_task.c                                          |    2 
 kernel/fork.c                                             |    4 
 lib/Kconfig.kasan                                         |   71 
 lib/test_kasan.c                                          |    2 
 lib/test_kasan_module.c                                   |    2 
 mm/kasan/Makefile                                         |   33 
 mm/kasan/common.c                                         | 1006 +++-----------
 mm/kasan/generic.c                                        |   72 -
 mm/kasan/generic_report.c                                 |   13 
 mm/kasan/hw_tags.c                                        |  276 +++
 mm/kasan/init.c                                           |   25 
 mm/kasan/kasan.h                                          |  195 ++
 mm/kasan/quarantine.c                                     |   35 
 mm/kasan/report.c                                         |  363 +----
 mm/kasan/report_generic.c                                 |  169 ++
 mm/kasan/report_hw_tags.c                                 |   44 
 mm/kasan/report_sw_tags.c                                 |   22 
 mm/kasan/shadow.c                                         |  528 +++++++
 mm/kasan/sw_tags.c                                        |   34 
 mm/kasan/tags.c                                           |    7 
 mm/kasan/tags_report.c                                    |    7 
 mm/mempool.c                                              |    4 
 mm/page_alloc.c                                           |    9 
 mm/page_poison.c                                          |    2 
 mm/ptdump.c                                               |   13 
 mm/slab_common.c                                          |    5 
 mm/slub.c                                                 |   29 
 scripts/Makefile.lib                                      |    2 
 tools/testing/selftests/arm64/mte/Makefile                |    2 
 tools/testing/selftests/arm64/mte/check_gcr_el1_cswitch.c |  155 ++
 74 files changed, 2869 insertions(+), 1553 deletions(-)