mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* incoming
@ 2021-01-24  5:00 Andrew Morton
  2021-01-24  5:00 ` [patch 01/19] x86/setup: don't remove E820_TYPE_RAM for pfn 0 Andrew Morton
                   ` (18 more replies)
  0 siblings, 19 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

19 patches, based on e1ae4b0be15891faf46d390e9f3dc9bd71a8cae1.

Subsystems affected by this patch series:

  mm/pagealloc
  mm/memcg
  mm/kasan
  ubsan
  mm/memory-failure
  mm/highmem
  proc
  MAINTAINERS

Subsystem: mm/pagealloc

    Mike Rapoport <rppt@linux.ibm.com>:
    Patch series "mm: fix initialization of struct page for holes in memory layout", v3:
      x86/setup: don't remove E820_TYPE_RAM for pfn 0
      mm: fix initialization of struct page for holes in memory layout

Subsystem: mm/memcg

    Roman Gushchin <guro@fb.com>:
      mm: memcg/slab: optimize objcg stock draining

    Shakeel Butt <shakeelb@google.com>:
      mm: memcg: fix memcg file_dirty numa stat
      mm: fix numa stats for thp migration

    Johannes Weiner <hannes@cmpxchg.org>:
      mm: memcontrol: prevent starvation when writing memory.high

Subsystem: mm/kasan

    Lecopzer Chen <lecopzer@gmail.com>:
      kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
      kasan: fix incorrect arguments passing in kasan_add_zero_shadow

    Andrey Konovalov <andreyknvl@google.com>:
      kasan: fix HW_TAGS boot parameters
      kasan, mm: fix conflicts with init_on_alloc/free
      kasan, mm: fix resetting page_alloc tags for HW_TAGS

Subsystem: ubsan

    Arnd Bergmann <arnd@arndb.de>:
      ubsan: disable unsigned-overflow check for i386

Subsystem: mm/memory-failure

    Dan Williams <dan.j.williams@intel.com>:
      mm: fix page reference leak in soft_offline_page()

Subsystem: mm/highmem

    Thomas Gleixner <tglx@linutronix.de>:
    Patch series "mm/highmem: Fix fallout from generic kmap_local conversions":
      sparc/mm/highmem: flush cache and TLB
      mm/highmem: prepare for overriding set_pte_at()
      mips/mm/highmem: use set_pte() for kmap_local()
      powerpc/mm/highmem: use __set_pte_at() for kmap_local()

Subsystem: proc

    Xiaoming Ni <nixiaoming@huawei.com>:
      proc_sysctl: fix oops caused by incorrect command parameters

Subsystem: MAINTAINERS

    Nathan Chancellor <natechancellor@gmail.com>:
      MAINTAINERS: add a couple more files to the Clang/LLVM section

 Documentation/dev-tools/kasan.rst  |   27 ++---------
 MAINTAINERS                        |    2 
 arch/mips/include/asm/highmem.h    |    1 
 arch/powerpc/include/asm/highmem.h |    2 
 arch/sparc/include/asm/highmem.h   |    9 ++-
 arch/x86/kernel/setup.c            |   20 +++-----
 fs/proc/proc_sysctl.c              |    7 ++-
 lib/Kconfig.ubsan                  |    1 
 mm/highmem.c                       |    7 ++-
 mm/kasan/hw_tags.c                 |   77 +++++++++++++--------------------
 mm/kasan/init.c                    |   23 +++++----
 mm/memcontrol.c                    |   11 +---
 mm/memory-failure.c                |   20 ++++++--
 mm/migrate.c                       |   27 ++++++-----
 mm/page_alloc.c                    |   86 ++++++++++++++++++++++---------------
 mm/slub.c                          |    7 +--
 16 files changed, 173 insertions(+), 154 deletions(-)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 01/19] x86/setup: don't remove E820_TYPE_RAM for pfn 0
  2021-01-24  5:00 incoming Andrew Morton
@ 2021-01-24  5:00 ` Andrew Morton
  2021-01-24  5:01 ` [patch 02/19] mm: fix initialization of struct page for holes in memory layout Andrew Morton
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:00 UTC (permalink / raw)
  To: akpm, bhe, bp, cai, david, hpa, linux-mm, mgorman, mhocko, mingo,
	mm-commits, rppt, stable, tglx, torvalds, vbabka

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: x86/setup: don't remove E820_TYPE_RAM for pfn 0

Patch series "mm: fix initialization of struct page for holes in  memory layout", v3.

Commit 73a6e474cb37 ("mm: memmap_init: iterate over
memblock regions rather that check each PFN") exposed several issues with
the memory map initialization and these patches fix those issues.

Initially there were crashes during compaction that Qian Cai reported back
in April [1]. It seemed back then that the problem was fixed, but a few
weeks ago Andrea Arcangeli hit the same bug [2] and there was an additional
discussion at [3].

[1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
[2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
[3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foundation.org


This patch (of 2):

The first 4Kb of memory is a BIOS owned area and to avoid its allocation
for the kernel it was not listed in e820 tables as memory.  As the result,
pfn 0 was never recognised by the generic memory management and it is not
a part of neither node 0 nor ZONE_DMA.

If set_pfnblock_flags_mask() would be ever called for the pageblock
corresponding to the first 2Mbytes of memory, having pfn 0 outside of
ZONE_DMA would trigger

	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);

Along with reserving the first 4Kb in e820 tables, several first pages are
reserved with memblock in several places during setup_arch().  These
reservations are enough to ensure the kernel does not touch the BIOS area
and it is not necessary to remove E820_TYPE_RAM for pfn 0.

Remove the update of e820 table that changes the type of pfn 0 and move
the comment describing why it was done to trim_low_memory_range() that
reserves the beginning of the memory.

Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/kernel/setup.c |   20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

--- a/arch/x86/kernel/setup.c~x86-setup-dont-remove-e820_type_ram-for-pfn-0
+++ a/arch/x86/kernel/setup.c
@@ -661,17 +661,6 @@ static void __init trim_platform_memory_
 static void __init trim_bios_range(void)
 {
 	/*
-	 * A special case is the first 4Kb of memory;
-	 * This is a BIOS owned area, not kernel ram, but generally
-	 * not listed as such in the E820 table.
-	 *
-	 * This typically reserves additional memory (64KiB by default)
-	 * since some BIOSes are known to corrupt low memory.  See the
-	 * Kconfig help text for X86_RESERVE_LOW.
-	 */
-	e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED);
-
-	/*
 	 * special case: Some BIOSes report the PC BIOS
 	 * area (640Kb -> 1Mb) as RAM even though it is not.
 	 * take them out.
@@ -728,6 +717,15 @@ early_param("reservelow", parse_reservel
 
 static void __init trim_low_memory_range(void)
 {
+	/*
+	 * A special case is the first 4Kb of memory;
+	 * This is a BIOS owned area, not kernel ram, but generally
+	 * not listed as such in the E820 table.
+	 *
+	 * This typically reserves additional memory (64KiB by default)
+	 * since some BIOSes are known to corrupt low memory.  See the
+	 * Kconfig help text for X86_RESERVE_LOW.
+	 */
 	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
 }
 	
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 02/19] mm: fix initialization of struct page for holes in memory layout
  2021-01-24  5:00 incoming Andrew Morton
  2021-01-24  5:00 ` [patch 01/19] x86/setup: don't remove E820_TYPE_RAM for pfn 0 Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 03/19] mm: memcg/slab: optimize objcg stock draining Andrew Morton
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: aarcange, akpm, bhe, bp, cai, david, hpa, linux-mm, mgorman,
	mhocko, mingo, mm-commits, rppt, stable, tglx, torvalds, vbabka

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: mm: fix initialization of struct page for holes in memory layout

There could be struct pages that are not backed by actual physical memory.
This can happen when the actual memory bank is not a multiple of
SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.

Such pages are currently initialized using init_unavailable_mem() function
that iterates through PFNs in holes in memblock.memory and if there is a
struct page corresponding to a PFN, the fields if this page are set to
default values and the page is marked as Reserved.

init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.

On a system that has firmware reserved holes in a zone above ZONE_DMA, for
instance in a configuration below:

	# grep -A1 E820 /proc/iomem
	7a17b000-7a216fff : Unknown E820 type
	7a217000-7bffffff : System RAM

unset zone link in struct page will trigger

	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);

because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
in struct page) in the same pageblock.

Update init_unavailable_mem() to use zone constraints defined by an
architecture to properly setup the zone link and use node ID of the
adjacent range in memblock.memory to set the node link.

Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org
Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |   84 +++++++++++++++++++++++++++-------------------
 1 file changed, 50 insertions(+), 34 deletions(-)

--- a/mm/page_alloc.c~mm-fix-initialization-of-struct-page-for-holes-in-memory-layout
+++ a/mm/page_alloc.c
@@ -7078,23 +7078,26 @@ void __init free_area_init_memoryless_no
  * Initialize all valid struct pages in the range [spfn, epfn) and mark them
  * PageReserved(). Return the number of struct pages that were initialized.
  */
-static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn)
+static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn,
+					 int zone, int nid)
 {
-	unsigned long pfn;
+	unsigned long pfn, zone_spfn, zone_epfn;
 	u64 pgcnt = 0;
 
+	zone_spfn = arch_zone_lowest_possible_pfn[zone];
+	zone_epfn = arch_zone_highest_possible_pfn[zone];
+
+	spfn = clamp(spfn, zone_spfn, zone_epfn);
+	epfn = clamp(epfn, zone_spfn, zone_epfn);
+
 	for (pfn = spfn; pfn < epfn; pfn++) {
 		if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
 			pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
 				+ pageblock_nr_pages - 1;
 			continue;
 		}
-		/*
-		 * Use a fake node/zone (0) for now. Some of these pages
-		 * (in memblock.reserved but not in memblock.memory) will
-		 * get re-initialized via reserve_bootmem_region() later.
-		 */
-		__init_single_page(pfn_to_page(pfn), pfn, 0, 0);
+
+		__init_single_page(pfn_to_page(pfn), pfn, zone, nid);
 		__SetPageReserved(pfn_to_page(pfn));
 		pgcnt++;
 	}
@@ -7103,51 +7106,64 @@ static u64 __init init_unavailable_range
 }
 
 /*
- * Only struct pages that are backed by physical memory are zeroed and
- * initialized by going through __init_single_page(). But, there are some
- * struct pages which are reserved in memblock allocator and their fields
- * may be accessed (for example page_to_pfn() on some configuration accesses
- * flags). We must explicitly initialize those struct pages.
+ * Only struct pages that correspond to ranges defined by memblock.memory
+ * are zeroed and initialized by going through __init_single_page() during
+ * memmap_init().
  *
- * This function also addresses a similar issue where struct pages are left
- * uninitialized because the physical address range is not covered by
- * memblock.memory or memblock.reserved. That could happen when memblock
- * layout is manually configured via memmap=, or when the highest physical
- * address (max_pfn) does not end on a section boundary.
+ * But, there could be struct pages that correspond to holes in
+ * memblock.memory. This can happen because of the following reasons:
+ * - phyiscal memory bank size is not necessarily the exact multiple of the
+ *   arbitrary section size
+ * - early reserved memory may not be listed in memblock.memory
+ * - memory layouts defined with memmap= kernel parameter may not align
+ *   nicely with memmap sections
+ *
+ * Explicitly initialize those struct pages so that:
+ * - PG_Reserved is set
+ * - zone link is set accorging to the architecture constrains
+ * - node is set to node id of the next populated region except for the
+ *   trailing hole where last node id is used
  */
-static void __init init_unavailable_mem(void)
+static void __init init_zone_unavailable_mem(int zone)
 {
-	phys_addr_t start, end;
-	u64 i, pgcnt;
-	phys_addr_t next = 0;
+	unsigned long start, end;
+	int i, nid;
+	u64 pgcnt;
+	unsigned long next = 0;
 
 	/*
-	 * Loop through unavailable ranges not covered by memblock.memory.
+	 * Loop through holes in memblock.memory and initialize struct
+	 * pages corresponding to these holes
 	 */
 	pgcnt = 0;
-	for_each_mem_range(i, &start, &end) {
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
 		if (next < start)
-			pgcnt += init_unavailable_range(PFN_DOWN(next),
-							PFN_UP(start));
+			pgcnt += init_unavailable_range(next, start, zone, nid);
 		next = end;
 	}
 
 	/*
-	 * Early sections always have a fully populated memmap for the whole
-	 * section - see pfn_valid(). If the last section has holes at the
-	 * end and that section is marked "online", the memmap will be
-	 * considered initialized. Make sure that memmap has a well defined
-	 * state.
+	 * Last section may surpass the actual end of memory (e.g. we can
+	 * have 1Gb section and 512Mb of RAM pouplated).
+	 * Make sure that memmap has a well defined state in this case.
 	 */
-	pgcnt += init_unavailable_range(PFN_DOWN(next),
-					round_up(max_pfn, PAGES_PER_SECTION));
+	end = round_up(max_pfn, PAGES_PER_SECTION);
+	pgcnt += init_unavailable_range(next, end, zone, nid);
 
 	/*
 	 * Struct pages that do not have backing memory. This could be because
 	 * firmware is using some of this memory, or for some other reasons.
 	 */
 	if (pgcnt)
-		pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
+		pr_info("Zone %s: zeroed struct page in unavailable ranges: %lld pages", zone_names[zone], pgcnt);
+}
+
+static void __init init_unavailable_mem(void)
+{
+	int zone;
+
+	for (zone = 0; zone < ZONE_MOVABLE; zone++)
+		init_zone_unavailable_mem(zone);
 }
 #else
 static inline void __init init_unavailable_mem(void)
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 03/19] mm: memcg/slab: optimize objcg stock draining
  2021-01-24  5:00 incoming Andrew Morton
  2021-01-24  5:00 ` [patch 01/19] x86/setup: don't remove E820_TYPE_RAM for pfn 0 Andrew Morton
  2021-01-24  5:01 ` [patch 02/19] mm: fix initialization of struct page for holes in memory layout Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 04/19] mm: memcg: fix memcg file_dirty numa stat Andrew Morton
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, guro, hannes, imran.f.khan, linux-mm, mkoutny, mm-commits,
	shakeelb, stable, torvalds

From: Roman Gushchin <guro@fb.com>
Subject: mm: memcg/slab: optimize objcg stock draining

Imran Khan reported a 16% regression in hackbench results caused by the
commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects
instead of pages").  The regression is noticeable in the case of a
consequent allocation of several relatively large slab objects, e.g. 
skb's.  As soon as the amount of stocked bytes exceeds PAGE_SIZE,
drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads
to a number of atomic operations in page_counter_uncharge().

The corresponding call graph is below (provided by Imran Khan):
  |__alloc_skb
  |    |
  |    |__kmalloc_reserve.isra.61
  |    |    |
  |    |    |__kmalloc_node_track_caller
  |    |    |    |
  |    |    |    |slab_pre_alloc_hook.constprop.88
  |    |    |     obj_cgroup_charge
  |    |    |    |    |
  |    |    |    |    |__memcg_kmem_charge
  |    |    |    |    |    |
  |    |    |    |    |    |page_counter_try_charge
  |    |    |    |    |
  |    |    |    |    |refill_obj_stock
  |    |    |    |    |    |
  |    |    |    |    |    |drain_obj_stock.isra.68
  |    |    |    |    |    |    |
  |    |    |    |    |    |    |__memcg_kmem_uncharge
  |    |    |    |    |    |    |    |
  |    |    |    |    |    |    |    |page_counter_uncharge
  |    |    |    |    |    |    |    |    |
  |    |    |    |    |    |    |    |    |page_counter_cancel
  |    |    |    |
  |    |    |    |
  |    |    |    |__slab_alloc
  |    |    |    |    |
  |    |    |    |    |___slab_alloc
  |    |    |    |    |
  |    |    |    |slab_post_alloc_hook

Instead of directly uncharging the accounted kernel memory, it's possible
to refill the generic page-sized per-cpu stock instead.  It's a much
faster operation, especially on a default hierarchy.  As a bonus,
__memcg_kmem_uncharge_page() will also get faster, so the freeing of
page-sized kernel allocations (e.g.  large kmallocs) will become faster.

A similar change has been done earlier for the socket memory by the commit
475d0487a2ad ("mm: memcontrol: use per-cpu stocks for socket memory
uncharging").

Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com
Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of
pages")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Imran Khan <imran.f.khan@oracle.com>
Tested-by: Imran Khan <imran.f.khan@oracle.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Michal Koutn <mkoutny@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/mm/memcontrol.c~mm-memcg-slab-optimize-objcg-stock-draining
+++ a/mm/memcontrol.c
@@ -3115,9 +3115,7 @@ void __memcg_kmem_uncharge(struct mem_cg
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		page_counter_uncharge(&memcg->kmem, nr_pages);
 
-	page_counter_uncharge(&memcg->memory, nr_pages);
-	if (do_memsw_account())
-		page_counter_uncharge(&memcg->memsw, nr_pages);
+	refill_stock(memcg, nr_pages);
 }
 
 /**
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 04/19] mm: memcg: fix memcg file_dirty numa stat
  2021-01-24  5:00 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2021-01-24  5:01 ` [patch 03/19] mm: memcg/slab: optimize objcg stock draining Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 05/19] mm: fix numa stats for thp migration Andrew Morton
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, guro, hannes, linux-mm, mhocko, mm-commits, shakeelb,
	shy828301, songmuchun, stable, torvalds

From: Shakeel Butt <shakeelb@google.com>
Subject: mm: memcg: fix memcg file_dirty numa stat

The kernel updates the per-node NR_FILE_DIRTY stats on page migration but
not the memcg numa stats.  That was not an issue until recently the commit
5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for
cgroup v2") exposed numa stats for the memcg.  So fix the file_dirty
per-memcg numa stat.

Link: https://lkml.kernel.org/r/20210108155813.2914586-1-shakeelb@google.com
Fixes: 5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for cgroup v2")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/migrate.c~mm-memcg-fix-memcg-file_dirty-numa-stat
+++ a/mm/migrate.c
@@ -500,9 +500,9 @@ int migrate_page_move_mapping(struct add
 			__inc_lruvec_state(new_lruvec, NR_SHMEM);
 		}
 		if (dirty && mapping_can_writeback(mapping)) {
-			__dec_node_state(oldzone->zone_pgdat, NR_FILE_DIRTY);
+			__dec_lruvec_state(old_lruvec, NR_FILE_DIRTY);
 			__dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING);
-			__inc_node_state(newzone->zone_pgdat, NR_FILE_DIRTY);
+			__inc_lruvec_state(new_lruvec, NR_FILE_DIRTY);
 			__inc_zone_state(newzone, NR_ZONE_WRITE_PENDING);
 		}
 	}
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 05/19] mm: fix numa stats for thp migration
  2021-01-24  5:00 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2021-01-24  5:01 ` [patch 04/19] mm: memcg: fix memcg file_dirty numa stat Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 06/19] mm: memcontrol: prevent starvation when writing memory.high Andrew Morton
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, guro, hannes, linux-mm, mhocko, mm-commits, shakeelb,
	shy828301, songmuchun, stable, torvalds

From: Shakeel Butt <shakeelb@google.com>
Subject: mm: fix numa stats for thp migration

Currently the kernel is not correctly updating the numa stats for
NR_FILE_PAGES and NR_SHMEM on THP migration.  Fix that.  For NR_FILE_DIRTY
and NR_ZONE_WRITE_PENDING, although at the moment there is no need to
handle THP migration as kernel still does not have write support for file
THP but to be more future proof, this patch adds the THP support for those
stats as well.

Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com
Fixes: e71769ae52609 ("mm: enable thp migration for shmem thp")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

--- a/mm/migrate.c~mm-fix-numa-stats-for-thp-migration
+++ a/mm/migrate.c
@@ -402,6 +402,7 @@ int migrate_page_move_mapping(struct add
 	struct zone *oldzone, *newzone;
 	int dirty;
 	int expected_count = expected_page_refs(mapping, page) + extra_count;
+	int nr = thp_nr_pages(page);
 
 	if (!mapping) {
 		/* Anonymous page without mapping */
@@ -437,7 +438,7 @@ int migrate_page_move_mapping(struct add
 	 */
 	newpage->index = page->index;
 	newpage->mapping = page->mapping;
-	page_ref_add(newpage, thp_nr_pages(page)); /* add cache reference */
+	page_ref_add(newpage, nr); /* add cache reference */
 	if (PageSwapBacked(page)) {
 		__SetPageSwapBacked(newpage);
 		if (PageSwapCache(page)) {
@@ -459,7 +460,7 @@ int migrate_page_move_mapping(struct add
 	if (PageTransHuge(page)) {
 		int i;
 
-		for (i = 1; i < HPAGE_PMD_NR; i++) {
+		for (i = 1; i < nr; i++) {
 			xas_next(&xas);
 			xas_store(&xas, newpage);
 		}
@@ -470,7 +471,7 @@ int migrate_page_move_mapping(struct add
 	 * to one less reference.
 	 * We know this isn't the last reference.
 	 */
-	page_ref_unfreeze(page, expected_count - thp_nr_pages(page));
+	page_ref_unfreeze(page, expected_count - nr);
 
 	xas_unlock(&xas);
 	/* Leave irq disabled to prevent preemption while updating stats */
@@ -493,17 +494,17 @@ int migrate_page_move_mapping(struct add
 		old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat);
 		new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat);
 
-		__dec_lruvec_state(old_lruvec, NR_FILE_PAGES);
-		__inc_lruvec_state(new_lruvec, NR_FILE_PAGES);
+		__mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr);
+		__mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr);
 		if (PageSwapBacked(page) && !PageSwapCache(page)) {
-			__dec_lruvec_state(old_lruvec, NR_SHMEM);
-			__inc_lruvec_state(new_lruvec, NR_SHMEM);
+			__mod_lruvec_state(old_lruvec, NR_SHMEM, -nr);
+			__mod_lruvec_state(new_lruvec, NR_SHMEM, nr);
 		}
 		if (dirty && mapping_can_writeback(mapping)) {
-			__dec_lruvec_state(old_lruvec, NR_FILE_DIRTY);
-			__dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING);
-			__inc_lruvec_state(new_lruvec, NR_FILE_DIRTY);
-			__inc_zone_state(newzone, NR_ZONE_WRITE_PENDING);
+			__mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr);
+			__mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr);
+			__mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr);
+			__mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr);
 		}
 	}
 	local_irq_enable();
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 06/19] mm: memcontrol: prevent starvation when writing memory.high
  2021-01-24  5:00 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2021-01-24  5:01 ` [patch 05/19] mm: fix numa stats for thp migration Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24 18:01   ` Shakeel Butt
  2021-01-24  5:01 ` [patch 07/19] kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow Andrew Morton
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, guro, hannes, linux-mm, mhocko, mkoutny, mm-commits,
	shakeelb, stable, tj, torvalds

From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: prevent starvation when writing memory.high

When a value is written to a cgroup's memory.high control file, the
write() context first tries to reclaim the cgroup to size before putting
the limit in place for the workload.  Concurrent charges from the workload
can keep such a write() looping in reclaim indefinitely.

In the past, a write to memory.high would first put the limit in place for
the workload, then do targeted reclaim until the new limit has been met -
similar to how we do it for memory.max.  This wasn't prone to the
described starvation issue.  However, this sequence could cause excessive
latencies in the workload, when allocating threads could be put into long
penalty sleeps on the sudden memory.high overage created by the write(),
before that had a chance to work it off.

Now that memory_high_write() performs reclaim before enforcing the new
limit, reflect that the cgroup may well fail to converge due to concurrent
workload activity.  Bail out of the loop after a few tries.

Link: https://lkml.kernel.org/r/20210112163011.127833-1-hannes@cmpxchg.org
Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Tejun Heo <tj@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/mm/memcontrol.c~mm-memcontrol-prevent-starvation-when-writing-memoryhigh
+++ a/mm/memcontrol.c
@@ -6273,7 +6273,6 @@ static ssize_t memory_high_write(struct
 
 	for (;;) {
 		unsigned long nr_pages = page_counter_read(&memcg->memory);
-		unsigned long reclaimed;
 
 		if (nr_pages <= high)
 			break;
@@ -6287,10 +6286,10 @@ static ssize_t memory_high_write(struct
 			continue;
 		}
 
-		reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
-							 GFP_KERNEL, true);
+		try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
+					     GFP_KERNEL, true);
 
-		if (!reclaimed && !nr_retries--)
+		if (!nr_retries--)
 			break;
 	}
 
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 07/19] kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
  2021-01-24  5:00 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2021-01-24  5:01 ` [patch 06/19] mm: memcontrol: prevent starvation when writing memory.high Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 08/19] kasan: fix incorrect arguments passing in kasan_add_zero_shadow Andrew Morton
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, dan.j.williams, dvyukov, glider,
	lecopzer.chen, lecopzer, linux-mm, mm-commits, torvalds,
	yj.chiang

From: Lecopzer Chen <lecopzer@gmail.com>
Subject: kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow

During testing kasan_populate_early_shadow and kasan_remove_zero_shadow,
if the shadow start and end address in kasan_remove_zero_shadow() is not
aligned to PMD_SIZE, the remain unaligned PTE won't be removed.

In the test case for kasan_remove_zero_shadow():
    shadow_start: 0xffffffb802000000, shadow end: 0xffffffbfbe000000
    3-level page table:
      PUD_SIZE: 0x40000000 PMD_SIZE: 0x200000 PAGE_SIZE: 4K
0xffffffbf80000000 ~ 0xffffffbfbdf80000 will not be removed because
in kasan_remove_pud_table(), kasan_pmd_table(*pud) is true but the
next address is 0xffffffbfbdf80000 which is not aligned to PUD_SIZE.

In the correct condition, this should fallback to the next level
kasan_remove_pmd_table() but the condition flow always continue to skip
the unaligned part.

Fix by correcting the condition when next and addr are neither aligned.

Link: https://lkml.kernel.org/r/20210103135621.83129-1-lecopzer@gmail.com
Fixes: 0207df4fa1a86 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: YJ Chiang <yj.chiang@mediatek.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/init.c |   20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

--- a/mm/kasan/init.c~kasan-fix-unaligned-address-is-unhandled-in-kasan_remove_zero_shadow
+++ a/mm/kasan/init.c
@@ -373,9 +373,10 @@ static void kasan_remove_pmd_table(pmd_t
 
 		if (kasan_pte_table(*pmd)) {
 			if (IS_ALIGNED(addr, PMD_SIZE) &&
-			    IS_ALIGNED(next, PMD_SIZE))
+			    IS_ALIGNED(next, PMD_SIZE)) {
 				pmd_clear(pmd);
-			continue;
+				continue;
+			}
 		}
 		pte = pte_offset_kernel(pmd, addr);
 		kasan_remove_pte_table(pte, addr, next);
@@ -398,9 +399,10 @@ static void kasan_remove_pud_table(pud_t
 
 		if (kasan_pmd_table(*pud)) {
 			if (IS_ALIGNED(addr, PUD_SIZE) &&
-			    IS_ALIGNED(next, PUD_SIZE))
+			    IS_ALIGNED(next, PUD_SIZE)) {
 				pud_clear(pud);
-			continue;
+				continue;
+			}
 		}
 		pmd = pmd_offset(pud, addr);
 		pmd_base = pmd_offset(pud, 0);
@@ -424,9 +426,10 @@ static void kasan_remove_p4d_table(p4d_t
 
 		if (kasan_pud_table(*p4d)) {
 			if (IS_ALIGNED(addr, P4D_SIZE) &&
-			    IS_ALIGNED(next, P4D_SIZE))
+			    IS_ALIGNED(next, P4D_SIZE)) {
 				p4d_clear(p4d);
-			continue;
+				continue;
+			}
 		}
 		pud = pud_offset(p4d, addr);
 		kasan_remove_pud_table(pud, addr, next);
@@ -457,9 +460,10 @@ void kasan_remove_zero_shadow(void *star
 
 		if (kasan_p4d_table(*pgd)) {
 			if (IS_ALIGNED(addr, PGDIR_SIZE) &&
-			    IS_ALIGNED(next, PGDIR_SIZE))
+			    IS_ALIGNED(next, PGDIR_SIZE)) {
 				pgd_clear(pgd);
-			continue;
+				continue;
+			}
 		}
 
 		p4d = p4d_offset(pgd, addr);
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 08/19] kasan: fix incorrect arguments passing in kasan_add_zero_shadow
  2021-01-24  5:00 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2021-01-24  5:01 ` [patch 07/19] kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 09/19] kasan: fix HW_TAGS boot parameters Andrew Morton
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, dan.j.williams, dvyukov, glider,
	lecopzer.chen, lecopzer, linux-mm, mm-commits, torvalds

From: Lecopzer Chen <lecopzer@gmail.com>
Subject: kasan: fix incorrect arguments passing in kasan_add_zero_shadow

kasan_remove_zero_shadow() shall use original virtual address, start and
size, instead of shadow address.

Link: https://lkml.kernel.org/r/20210103063847.5963-1-lecopzer@gmail.com
Fixes: 0207df4fa1a86 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/init.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/mm/kasan/init.c~kasan-fix-incorrect-arguments-passing-in-kasan_add_zero_shadow
+++ a/mm/kasan/init.c
@@ -486,7 +486,6 @@ int kasan_add_zero_shadow(void *start, u
 
 	ret = kasan_populate_early_shadow(shadow_start, shadow_end);
 	if (ret)
-		kasan_remove_zero_shadow(shadow_start,
-					size >> KASAN_SHADOW_SCALE_SHIFT);
+		kasan_remove_zero_shadow(start, size);
 	return ret;
 }
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 09/19] kasan: fix HW_TAGS boot parameters
  2021-01-24  5:00 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2021-01-24  5:01 ` [patch 08/19] kasan: fix incorrect arguments passing in kasan_add_zero_shadow Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 10/19] kasan, mm: fix conflicts with init_on_alloc/free Andrew Morton
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: fix HW_TAGS boot parameters

The initially proposed KASAN command line parameters are redundant.

This change drops the complex "kasan.mode=off/prod/full" parameter
and adds a simpler kill switch "kasan=off/on" instead. The new parameter
together with the already existing ones provides a cleaner way to
express the same set of features.

The full set of parameters with this change:

kasan=off/on             - whether KASAN is enabled
kasan.fault=report/panic - whether to only print a report or also panic
kasan.stacktrace=off/on  - whether to collect alloc/free stack traces

Default values:

kasan=on
kasan.fault=report
kasan.stacktrace=on  (if CONFIG_DEBUG_KERNEL=y)
kasan.stacktrace=off (otherwise)

Link: https://linux-review.googlesource.com/id/Ib3694ed90b1e8ccac6cf77dfd301847af4aba7b8
Link: https://lkml.kernel.org/r/4e9c4a4bdcadc168317deb2419144582a9be6e61.1610736745.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/dev-tools/kasan.rst |   27 ++-------
 mm/kasan/hw_tags.c                |   77 +++++++++++-----------------
 2 files changed, 38 insertions(+), 66 deletions(-)

--- a/Documentation/dev-tools/kasan.rst~kasan-fix-hw_tags-boot-parameters
+++ a/Documentation/dev-tools/kasan.rst
@@ -160,29 +160,14 @@ intended for use in production as a secu
 boot parameters that allow to disable KASAN competely or otherwise control
 particular KASAN features.
 
-The things that can be controlled are:
+- ``kasan=off`` or ``=on`` controls whether KASAN is enabled (default: ``on``).
 
-1. Whether KASAN is enabled at all.
-2. Whether KASAN collects and saves alloc/free stacks.
-3. Whether KASAN panics on a detected bug or not.
+- ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack
+  traces collection (default: ``on`` for ``CONFIG_DEBUG_KERNEL=y``, otherwise
+  ``off``).
 
-The ``kasan.mode`` boot parameter allows to choose one of three main modes:
-
-- ``kasan.mode=off`` - KASAN is disabled, no tag checks are performed
-- ``kasan.mode=prod`` - only essential production features are enabled
-- ``kasan.mode=full`` - all KASAN features are enabled
-
-The chosen mode provides default control values for the features mentioned
-above. However it's also possible to override the default values by providing:
-
-- ``kasan.stacktrace=off`` or ``=on`` - enable alloc/free stack collection
-					(default: ``on`` for ``mode=full``,
-					 otherwise ``off``)
-- ``kasan.fault=report`` or ``=panic`` - only print KASAN report or also panic
-					 (default: ``report``)
-
-If ``kasan.mode`` parameter is not provided, it defaults to ``full`` when
-``CONFIG_DEBUG_KERNEL`` is enabled, and to ``prod`` otherwise.
+- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN
+  report or also panic the kernel (default: ``report``).
 
 For developers
 ~~~~~~~~~~~~~~
--- a/mm/kasan/hw_tags.c~kasan-fix-hw_tags-boot-parameters
+++ a/mm/kasan/hw_tags.c
@@ -19,11 +19,10 @@
 
 #include "kasan.h"
 
-enum kasan_arg_mode {
-	KASAN_ARG_MODE_DEFAULT,
-	KASAN_ARG_MODE_OFF,
-	KASAN_ARG_MODE_PROD,
-	KASAN_ARG_MODE_FULL,
+enum kasan_arg {
+	KASAN_ARG_DEFAULT,
+	KASAN_ARG_OFF,
+	KASAN_ARG_ON,
 };
 
 enum kasan_arg_stacktrace {
@@ -38,7 +37,7 @@ enum kasan_arg_fault {
 	KASAN_ARG_FAULT_PANIC,
 };
 
-static enum kasan_arg_mode kasan_arg_mode __ro_after_init;
+static enum kasan_arg kasan_arg __ro_after_init;
 static enum kasan_arg_stacktrace kasan_arg_stacktrace __ro_after_init;
 static enum kasan_arg_fault kasan_arg_fault __ro_after_init;
 
@@ -52,26 +51,24 @@ DEFINE_STATIC_KEY_FALSE(kasan_flag_stack
 /* Whether panic or disable tag checking on fault. */
 bool kasan_flag_panic __ro_after_init;
 
-/* kasan.mode=off/prod/full */
-static int __init early_kasan_mode(char *arg)
+/* kasan=off/on */
+static int __init early_kasan_flag(char *arg)
 {
 	if (!arg)
 		return -EINVAL;
 
 	if (!strcmp(arg, "off"))
-		kasan_arg_mode = KASAN_ARG_MODE_OFF;
-	else if (!strcmp(arg, "prod"))
-		kasan_arg_mode = KASAN_ARG_MODE_PROD;
-	else if (!strcmp(arg, "full"))
-		kasan_arg_mode = KASAN_ARG_MODE_FULL;
+		kasan_arg = KASAN_ARG_OFF;
+	else if (!strcmp(arg, "on"))
+		kasan_arg = KASAN_ARG_ON;
 	else
 		return -EINVAL;
 
 	return 0;
 }
-early_param("kasan.mode", early_kasan_mode);
+early_param("kasan", early_kasan_flag);
 
-/* kasan.stack=off/on */
+/* kasan.stacktrace=off/on */
 static int __init early_kasan_flag_stacktrace(char *arg)
 {
 	if (!arg)
@@ -113,8 +110,8 @@ void kasan_init_hw_tags_cpu(void)
 	 * as this function is only called for MTE-capable hardware.
 	 */
 
-	/* If KASAN is disabled, do nothing. */
-	if (kasan_arg_mode == KASAN_ARG_MODE_OFF)
+	/* If KASAN is disabled via command line, don't initialize it. */
+	if (kasan_arg == KASAN_ARG_OFF)
 		return;
 
 	hw_init_tags(KASAN_TAG_MAX);
@@ -124,43 +121,28 @@ void kasan_init_hw_tags_cpu(void)
 /* kasan_init_hw_tags() is called once on boot CPU. */
 void __init kasan_init_hw_tags(void)
 {
-	/* If hardware doesn't support MTE, do nothing. */
+	/* If hardware doesn't support MTE, don't initialize KASAN. */
 	if (!system_supports_mte())
 		return;
 
-	/* Choose KASAN mode if kasan boot parameter is not provided. */
-	if (kasan_arg_mode == KASAN_ARG_MODE_DEFAULT) {
-		if (IS_ENABLED(CONFIG_DEBUG_KERNEL))
-			kasan_arg_mode = KASAN_ARG_MODE_FULL;
-		else
-			kasan_arg_mode = KASAN_ARG_MODE_PROD;
-	}
-
-	/* Preset parameter values based on the mode. */
-	switch (kasan_arg_mode) {
-	case KASAN_ARG_MODE_DEFAULT:
-		/* Shouldn't happen as per the check above. */
-		WARN_ON(1);
-		return;
-	case KASAN_ARG_MODE_OFF:
-		/* If KASAN is disabled, do nothing. */
+	/* If KASAN is disabled via command line, don't initialize it. */
+	if (kasan_arg == KASAN_ARG_OFF)
 		return;
-	case KASAN_ARG_MODE_PROD:
-		static_branch_enable(&kasan_flag_enabled);
-		break;
-	case KASAN_ARG_MODE_FULL:
-		static_branch_enable(&kasan_flag_enabled);
-		static_branch_enable(&kasan_flag_stacktrace);
-		break;
-	}
 
-	/* Now, optionally override the presets. */
+	/* Enable KASAN. */
+	static_branch_enable(&kasan_flag_enabled);
 
 	switch (kasan_arg_stacktrace) {
 	case KASAN_ARG_STACKTRACE_DEFAULT:
+		/*
+		 * Default to enabling stack trace collection for
+		 * debug kernels.
+		 */
+		if (IS_ENABLED(CONFIG_DEBUG_KERNEL))
+			static_branch_enable(&kasan_flag_stacktrace);
 		break;
 	case KASAN_ARG_STACKTRACE_OFF:
-		static_branch_disable(&kasan_flag_stacktrace);
+		/* Do nothing, kasan_flag_stacktrace keeps its default value. */
 		break;
 	case KASAN_ARG_STACKTRACE_ON:
 		static_branch_enable(&kasan_flag_stacktrace);
@@ -169,11 +151,16 @@ void __init kasan_init_hw_tags(void)
 
 	switch (kasan_arg_fault) {
 	case KASAN_ARG_FAULT_DEFAULT:
+		/*
+		 * Default to no panic on report.
+		 * Do nothing, kasan_flag_panic keeps its default value.
+		 */
 		break;
 	case KASAN_ARG_FAULT_REPORT:
-		kasan_flag_panic = false;
+		/* Do nothing, kasan_flag_panic keeps its default value. */
 		break;
 	case KASAN_ARG_FAULT_PANIC:
+		/* Enable panic on report. */
 		kasan_flag_panic = true;
 		break;
 	}
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 10/19] kasan, mm: fix conflicts with init_on_alloc/free
  2021-01-24  5:00 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2021-01-24  5:01 ` [patch 09/19] kasan: fix HW_TAGS boot parameters Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 11/19] kasan, mm: fix resetting page_alloc tags for HW_TAGS Andrew Morton
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vbabka, vincenzo.frascino,
	will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan, mm: fix conflicts with init_on_alloc/free

A few places where SLUB accesses object's data or metadata were missed in
a previous patch.  This leads to false positives with hardware tag-based
KASAN when bulk allocations are used with init_on_alloc/free.

Fix the false-positives by resetting pointer tags during these accesses.

(The kasan_reset_tag call is removed from slab_alloc_node, as it's added
 into maybe_wipe_obj_freeptr.)

Link: https://linux-review.googlesource.com/id/I50dd32838a666e173fe06c3c5c766f2c36aae901
Link: https://lkml.kernel.org/r/093428b5d2ca8b507f4a79f92f9929b35f7fada7.1610731872.git.andreyknvl@google.com
Fixes: aa1ef4d7b3f67 ("kasan, mm: reset tags when accessing metadata")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/mm/slub.c~kasan-mm-fix-conflicts-with-init_on_alloc-free
+++ a/mm/slub.c
@@ -2791,7 +2791,8 @@ static __always_inline void maybe_wipe_o
 						   void *obj)
 {
 	if (unlikely(slab_want_init_on_free(s)) && obj)
-		memset((void *)((char *)obj + s->offset), 0, sizeof(void *));
+		memset((void *)((char *)kasan_reset_tag(obj) + s->offset),
+			0, sizeof(void *));
 }
 
 /*
@@ -2883,7 +2884,7 @@ redo:
 		stat(s, ALLOC_FASTPATH);
 	}
 
-	maybe_wipe_obj_freeptr(s, kasan_reset_tag(object));
+	maybe_wipe_obj_freeptr(s, object);
 
 	if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object)
 		memset(kasan_reset_tag(object), 0, s->object_size);
@@ -3329,7 +3330,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca
 		int j;
 
 		for (j = 0; j < i; j++)
-			memset(p[j], 0, s->object_size);
+			memset(kasan_reset_tag(p[j]), 0, s->object_size);
 	}
 
 	/* memcg and kmem_cache debug support */
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 11/19] kasan, mm: fix resetting page_alloc tags for HW_TAGS
  2021-01-24  5:00 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2021-01-24  5:01 ` [patch 10/19] kasan, mm: fix conflicts with init_on_alloc/free Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 12/19] ubsan: disable unsigned-overflow check for i386 Andrew Morton
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan, mm: fix resetting page_alloc tags for HW_TAGS

A previous commit added resetting KASAN page tags to
kernel_init_free_pages() to avoid false-positives due to accesses to
metadata with the hardware tag-based mode.

That commit did reset page tags before the metadata access, but didn't
restore them after.  As the result, KASAN fails to detect bad accesses to
page_alloc allocations on some configurations.

Fix this by recovering the tag after the metadata access.

Link: https://lkml.kernel.org/r/02b5bcd692e912c27d484030f666b350ad7e4ae4.1611074450.git.andreyknvl@google.com
Fixes: aa1ef4d7b3f6 ("kasan, mm: reset tags when accessing metadata")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/mm/page_alloc.c~kasan-mm-fix-resetting-page_alloc-tags-for-hw_tags
+++ a/mm/page_alloc.c
@@ -1207,8 +1207,10 @@ static void kernel_init_free_pages(struc
 	/* s390's use of memset() could override KASAN redzones. */
 	kasan_disable_current();
 	for (i = 0; i < numpages; i++) {
+		u8 tag = page_kasan_tag(page + i);
 		page_kasan_tag_reset(page + i);
 		clear_highpage(page + i);
+		page_kasan_tag_set(page + i, tag);
 	}
 	kasan_enable_current();
 }
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 12/19] ubsan: disable unsigned-overflow check for i386
  2021-01-24  5:00 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2021-01-24  5:01 ` [patch 11/19] kasan, mm: fix resetting page_alloc tags for HW_TAGS Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 13/19] mm: fix page reference leak in soft_offline_page() Andrew Morton
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, arnd, elver, georgepope, keescook, linux-mm, mm-commits,
	natechancellor, ndesaulniers, sfr, torvalds

From: Arnd Bergmann <arnd@arndb.de>
Subject: ubsan: disable unsigned-overflow check for i386

Building ubsan kernels even for compile-testing introduced these warnings
in my randconfig environment:

crypto/blake2b_generic.c:98:13: error: stack frame size of 9636 bytes in function 'blake2b_compress' [-Werror,-Wframe-larger-than=]
static void blake2b_compress(struct blake2b_state *S,
crypto/sha512_generic.c:151:13: error: stack frame size of 1292 bytes in function 'sha512_generic_block_fn' [-Werror,-Wframe-larger-than=]
static void sha512_generic_block_fn(struct sha512_state *sst, u8 const *src,
lib/crypto/curve25519-fiat32.c:312:22: error: stack frame size of 2180 bytes in function 'fe_mul_impl' [-Werror,-Wframe-larger-than=]
static noinline void fe_mul_impl(u32 out[10], const u32 in1[10], const u32 in2[10])
lib/crypto/curve25519-fiat32.c:444:22: error: stack frame size of 1588 bytes in function 'fe_sqr_impl' [-Werror,-Wframe-larger-than=]
static noinline void fe_sqr_impl(u32 out[10], const u32 in1[10])

Further testing showed that this is caused by
-fsanitize=unsigned-integer-overflow, but is isolated to the 32-bit x86
architecture.

The one in blake2b immediately overflows the 8KB stack area architectures,
so better ensure this never happens by disabling the option for 32-bit
x86.

Link: https://lkml.kernel.org/r/20210112202922.2454435-1-arnd@kernel.org
Link: https://lore.kernel.org/lkml/20201230154749.746641-1-arnd@kernel.org/
Fixes: d0a3ac549f38 ("ubsan: enable for all*config builds")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Marco Elver <elver@google.com>
Cc: George Popescu <georgepope@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/Kconfig.ubsan |    1 +
 1 file changed, 1 insertion(+)

--- a/lib/Kconfig.ubsan~ubsan-disable-unsigned-overflow-check-for-i386
+++ a/lib/Kconfig.ubsan
@@ -123,6 +123,7 @@ config UBSAN_SIGNED_OVERFLOW
 config UBSAN_UNSIGNED_OVERFLOW
 	bool "Perform checking for unsigned arithmetic overflow"
 	depends on $(cc-option,-fsanitize=unsigned-integer-overflow)
+	depends on !X86_32 # avoid excessive stack usage on x86-32/clang
 	help
 	  This option enables -fsanitize=unsigned-integer-overflow which checks
 	  for overflow of any arithmetic operations with unsigned integers. This
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 13/19] mm: fix page reference leak in soft_offline_page()
  2021-01-24  5:00 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2021-01-24  5:01 ` [patch 12/19] ubsan: disable unsigned-overflow check for i386 Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:01 ` [patch 14/19] sparc/mm/highmem: flush cache and TLB Andrew Morton
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, cai, dan.j.williams, david, linux-mm, mhocko, mm-commits,
	naoya.horiguchi, osalvador, stable, torvalds

From: Dan Williams <dan.j.williams@intel.com>
Subject: mm: fix page reference leak in soft_offline_page()

The conversion to move pfn_to_online_page() internal to
soft_offline_page() missed that the get_user_pages() reference taken by
the madvise() path needs to be dropped when pfn_to_online_page() fails. 
Note the direct sysfs-path to soft_offline_page() does not perform a
get_user_pages() lookup.

When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page()
pfn the kernel hangs at dax-device shutdown due to a leaked reference.

Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory-failure.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

--- a/mm/memory-failure.c~mm-fix-page-reference-leak-in-soft_offline_page
+++ a/mm/memory-failure.c
@@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct
 	return rc;
 }
 
+static void put_ref_page(struct page *page)
+{
+	if (page)
+		put_page(page);
+}
+
 /**
  * soft_offline_page - Soft offline a page.
  * @pfn: pfn to soft-offline
@@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct
 int soft_offline_page(unsigned long pfn, int flags)
 {
 	int ret;
-	struct page *page;
 	bool try_again = true;
+	struct page *page, *ref_page = NULL;
+
+	WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
 
 	if (!pfn_valid(pfn))
 		return -ENXIO;
+	if (flags & MF_COUNT_INCREASED)
+		ref_page = pfn_to_page(pfn);
+
 	/* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */
 	page = pfn_to_online_page(pfn);
-	if (!page)
+	if (!page) {
+		put_ref_page(ref_page);
 		return -EIO;
+	}
 
 	if (PageHWPoison(page)) {
 		pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
-		if (flags & MF_COUNT_INCREASED)
-			put_page(page);
+		put_ref_page(ref_page);
 		return 0;
 	}
 
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 14/19] sparc/mm/highmem: flush cache and TLB
  2021-01-24  5:00 incoming Andrew Morton
                   ` (12 preceding siblings ...)
  2021-01-24  5:01 ` [patch 13/19] mm: fix page reference leak in soft_offline_page() Andrew Morton
@ 2021-01-24  5:01 ` Andrew Morton
  2021-01-24  5:02 ` [patch 15/19] mm/highmem: prepare for overriding set_pte_at() Andrew Morton
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:01 UTC (permalink / raw)
  To: akpm, andreas, davem, linux-mm, mm-commits, mpe, paul, peterz,
	tglx, torvalds, tsbogend

From: Thomas Gleixner <tglx@linutronix.de>
Subject: sparc/mm/highmem: flush cache and TLB

Patch series "mm/highmem: Fix fallout from generic kmap_local conversions".

The kmap_local conversion wreckaged sparc, mips and powerpc as it missed
some of the details in the original implementation.


This patch (of 4):

The recent conversion to the generic kmap_local infrastructure failed to
assign the proper pre/post map/unmap flush operations for sparc.

Sparc requires cache flush before map/unmap and tlb flush afterwards.

Link: https://lkml.kernel.org/r/20210112170136.078559026@linutronix.de
Link: https://lkml.kernel.org/r/20210112170410.905976187@linutronix.de
Fixes: 3293efa97807 ("sparc/mm/highmem: Switch to generic kmap atomic")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Andreas Larsson <andreas@gaisler.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/sparc/include/asm/highmem.h |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/arch/sparc/include/asm/highmem.h~sparc-mm-highmem-flush-cache-and-tlb
+++ a/arch/sparc/include/asm/highmem.h
@@ -50,10 +50,11 @@ extern pte_t *pkmap_page_table;
 
 #define flush_cache_kmaps()	flush_cache_all()
 
-/* FIXME: Use __flush_tlb_one(vaddr) instead of flush_cache_all() -- Anton */
-#define arch_kmap_local_post_map(vaddr, pteval)	flush_cache_all()
-#define arch_kmap_local_post_unmap(vaddr)	flush_cache_all()
-
+/* FIXME: Use __flush_*_one(vaddr) instead of flush_*_all() -- Anton */
+#define arch_kmap_local_pre_map(vaddr, pteval)	flush_cache_all()
+#define arch_kmap_local_pre_unmap(vaddr)	flush_cache_all()
+#define arch_kmap_local_post_map(vaddr, pteval)	flush_tlb_all()
+#define arch_kmap_local_post_unmap(vaddr)	flush_tlb_all()
 
 #endif /* __KERNEL__ */
 
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 15/19] mm/highmem: prepare for overriding set_pte_at()
  2021-01-24  5:00 incoming Andrew Morton
                   ` (13 preceding siblings ...)
  2021-01-24  5:01 ` [patch 14/19] sparc/mm/highmem: flush cache and TLB Andrew Morton
@ 2021-01-24  5:02 ` Andrew Morton
  2021-01-24  5:02 ` [patch 16/19] mips/mm/highmem: use set_pte() for kmap_local() Andrew Morton
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:02 UTC (permalink / raw)
  To: akpm, andreas, davem, linux-mm, mm-commits, mpe, paul, peterz,
	tglx, torvalds, tsbogend

From: Thomas Gleixner <tglx@linutronix.de>
Subject: mm/highmem: prepare for overriding set_pte_at()

The generic kmap_local() map function uses set_pte_at(), but MIPS requires
set_pte() and PowerPC wants __set_pte_at().

Provide arch_kmap_local_set_pte() and default it to set_pte_at().

Link: https://lkml.kernel.org/r/20210112170411.056306194@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/highmem.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/mm/highmem.c~mm-highmem-prepare-for-overriding-set_pte_at
+++ a/mm/highmem.c
@@ -473,6 +473,11 @@ static inline void *arch_kmap_local_high
 }
 #endif
 
+#ifndef arch_kmap_local_set_pte
+#define arch_kmap_local_set_pte(mm, vaddr, ptep, ptev)	\
+	set_pte_at(mm, vaddr, ptep, ptev)
+#endif
+
 /* Unmap a local mapping which was obtained by kmap_high_get() */
 static inline bool kmap_high_unmap_local(unsigned long vaddr)
 {
@@ -515,7 +520,7 @@ void *__kmap_local_pfn_prot(unsigned lon
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
 	BUG_ON(!pte_none(*(kmap_pte - idx)));
 	pteval = pfn_pte(pfn, prot);
-	set_pte_at(&init_mm, vaddr, kmap_pte - idx, pteval);
+	arch_kmap_local_set_pte(&init_mm, vaddr, kmap_pte - idx, pteval);
 	arch_kmap_local_post_map(vaddr, pteval);
 	current->kmap_ctrl.pteval[kmap_local_idx()] = pteval;
 	preempt_enable();
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 16/19] mips/mm/highmem: use set_pte() for kmap_local()
  2021-01-24  5:00 incoming Andrew Morton
                   ` (14 preceding siblings ...)
  2021-01-24  5:02 ` [patch 15/19] mm/highmem: prepare for overriding set_pte_at() Andrew Morton
@ 2021-01-24  5:02 ` Andrew Morton
  2021-01-24  5:02 ` [patch 17/19] powerpc/mm/highmem: use __set_pte_at() " Andrew Morton
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:02 UTC (permalink / raw)
  To: akpm, andreas, davem, linux-mm, mm-commits, mpe, paul, peterz,
	tglx, torvalds, tsbogend

From: Thomas Gleixner <tglx@linutronix.de>
Subject: mips/mm/highmem: use set_pte() for kmap_local()

set_pte_at() on MIPS invokes update_cache() which might recurse into
kmap_local().  Use set_pte() like the original MIPS highmem implementation
did.

Link: https://lkml.kernel.org/r/20210112170411.187513575@linutronix.de
Fixes: a4c33e83bca1 ("mips/mm/highmem: Switch to generic kmap atomic")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Paul Cercueil <paul@crapouillou.net>
Reported-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/mips/include/asm/highmem.h |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/mips/include/asm/highmem.h~mips-mm-highmem-use-set_pte-for-kmap_local
+++ a/arch/mips/include/asm/highmem.h
@@ -51,6 +51,7 @@ extern void kmap_flush_tlb(unsigned long
 
 #define flush_cache_kmaps()	BUG_ON(cpu_has_dc_aliases)
 
+#define arch_kmap_local_set_pte(mm, vaddr, ptep, ptev)	set_pte(ptep, ptev)
 #define arch_kmap_local_post_map(vaddr, pteval)	local_flush_tlb_one(vaddr)
 #define arch_kmap_local_post_unmap(vaddr)	local_flush_tlb_one(vaddr)
 
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 17/19] powerpc/mm/highmem: use __set_pte_at() for kmap_local()
  2021-01-24  5:00 incoming Andrew Morton
                   ` (15 preceding siblings ...)
  2021-01-24  5:02 ` [patch 16/19] mips/mm/highmem: use set_pte() for kmap_local() Andrew Morton
@ 2021-01-24  5:02 ` Andrew Morton
  2021-01-24  5:02 ` [patch 18/19] proc_sysctl: fix oops caused by incorrect command parameters Andrew Morton
  2021-01-24  5:02 ` [patch 19/19] MAINTAINERS: add a couple more files to the Clang/LLVM section Andrew Morton
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:02 UTC (permalink / raw)
  To: akpm, andreas, davem, linux-mm, mm-commits, mpe, paul, peterz,
	tglx, torvalds, tsbogend

From: Thomas Gleixner <tglx@linutronix.de>
Subject: powerpc/mm/highmem: use __set_pte_at() for kmap_local()

The original PowerPC highmem mapping function used __set_pte_at() to
denote that the mapping is per CPU.  This got lost with the conversion to
the generic implementation.

Override the default map function.

Link: https://lkml.kernel.org/r/20210112170411.281464308@linutronix.de
Fixes: 47da42b27a56 ("powerpc/mm/highmem: Switch to generic kmap atomic")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/include/asm/highmem.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/powerpc/include/asm/highmem.h~powerpc-mm-highmem-use-__set_pte_at-for-kmap_local
+++ a/arch/powerpc/include/asm/highmem.h
@@ -58,6 +58,8 @@ extern pte_t *pkmap_page_table;
 
 #define flush_cache_kmaps()	flush_cache_all()
 
+#define arch_kmap_local_set_pte(mm, vaddr, ptep, ptev)	\
+	__set_pte_at(mm, vaddr, ptep, ptev, 1)
 #define arch_kmap_local_post_map(vaddr, pteval)	\
 	local_flush_tlb_page(NULL, vaddr)
 #define arch_kmap_local_post_unmap(vaddr)	\
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 18/19] proc_sysctl: fix oops caused by incorrect command parameters
  2021-01-24  5:00 incoming Andrew Morton
                   ` (16 preceding siblings ...)
  2021-01-24  5:02 ` [patch 17/19] powerpc/mm/highmem: use __set_pte_at() " Andrew Morton
@ 2021-01-24  5:02 ` Andrew Morton
  2021-01-24  5:02 ` [patch 19/19] MAINTAINERS: add a couple more files to the Clang/LLVM section Andrew Morton
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:02 UTC (permalink / raw)
  To: adobriyan, akpm, hkallweit1, keescook, linux-mm, mcgrof,
	mhiramat, mhocko, mm-commits, nixiaoming, rdunlap, stable,
	torvalds, vbabka, yzaikin

From: Xiaoming Ni <nixiaoming@huawei.com>
Subject: proc_sysctl: fix oops caused by incorrect command parameters

The process_sysctl_arg() does not check whether val is empty before
invoking strlen(val).  If the command line parameter () is incorrectly
configured and val is empty, oops is triggered.

For example:
  "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
  triggered. The call stack is as follows:
    Kernel command line: .... hung_task_panic
    ......
    Call trace:
    __pi_strlen+0x10/0x98
    parse_args+0x278/0x344
    do_sysctl_args+0x8c/0xfc
    kernel_init+0x5c/0xf4
    ret_from_fork+0x10/0x30

To fix it, check whether "val" is empty when "phram" is a sysctl field.
Error codes are returned in the failure branch, and error logs are
generated by parse_args().

Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters from kernel command line")
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/proc_sysctl.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/fs/proc/proc_sysctl.c~proc_sysctl-fix-oops-caused-by-incorrect-command-parameters
+++ a/fs/proc/proc_sysctl.c
@@ -1770,6 +1770,12 @@ static int process_sysctl_arg(char *para
 			return 0;
 	}
 
+	if (!val)
+		return -EINVAL;
+	len = strlen(val);
+	if (len == 0)
+		return -EINVAL;
+
 	/*
 	 * To set sysctl options, we use a temporary mount of proc, look up the
 	 * respective sys/ file and write to it. To avoid mounting it when no
@@ -1811,7 +1817,6 @@ static int process_sysctl_arg(char *para
 				file, param, val);
 		goto out;
 	}
-	len = strlen(val);
 	wret = kernel_write(file, val, len, &pos);
 	if (wret < 0) {
 		err = wret;
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 19/19] MAINTAINERS: add a couple more files to the Clang/LLVM section
  2021-01-24  5:00 incoming Andrew Morton
                   ` (17 preceding siblings ...)
  2021-01-24  5:02 ` [patch 18/19] proc_sysctl: fix oops caused by incorrect command parameters Andrew Morton
@ 2021-01-24  5:02 ` Andrew Morton
  18 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-01-24  5:02 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, natechancellor, ndesaulniers, torvalds

From: Nathan Chancellor <natechancellor@gmail.com>
Subject: MAINTAINERS: add a couple more files to the Clang/LLVM section

The K: entry should ensure that Nick and I always get CC'd on patches that
touch these files but it is better to be explicit rather than implicit.

Link: https://lkml.kernel.org/r/20210114004059.2129921-1-natechancellor@gmail.com
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 MAINTAINERS |    2 ++
 1 file changed, 2 insertions(+)

--- a/MAINTAINERS~maintainers-add-a-couple-more-files-to-the-clang-llvm-section
+++ a/MAINTAINERS
@@ -4311,7 +4311,9 @@ W:	https://clangbuiltlinux.github.io/
 B:	https://github.com/ClangBuiltLinux/linux/issues
 C:	irc://chat.freenode.net/clangbuiltlinux
 F:	Documentation/kbuild/llvm.rst
+F:	include/linux/compiler-clang.h
 F:	scripts/clang-tools/
+F:	scripts/clang-version.sh
 F:	scripts/lld-version.sh
 K:	\b(?i:clang|llvm)\b
 
_

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 06/19] mm: memcontrol: prevent starvation when writing memory.high
  2021-01-24  5:01 ` [patch 06/19] mm: memcontrol: prevent starvation when writing memory.high Andrew Morton
@ 2021-01-24 18:01   ` Shakeel Butt
  2021-01-24 18:35     ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: Shakeel Butt @ 2021-01-24 18:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Roman Gushchin, Johannes Weiner, Linux MM, Michal Hocko,
	Michal Koutný,
	mm-commits, stable, Tejun Heo, Linus Torvalds

On Sat, Jan 23, 2021 at 9:01 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> From: Johannes Weiner <hannes@cmpxchg.org>
> Subject: mm: memcontrol: prevent starvation when writing memory.high
>
> When a value is written to a cgroup's memory.high control file, the
> write() context first tries to reclaim the cgroup to size before putting
> the limit in place for the workload.  Concurrent charges from the workload
> can keep such a write() looping in reclaim indefinitely.
>
> In the past, a write to memory.high would first put the limit in place for
> the workload, then do targeted reclaim until the new limit has been met -
> similar to how we do it for memory.max.  This wasn't prone to the
> described starvation issue.  However, this sequence could cause excessive
> latencies in the workload, when allocating threads could be put into long
> penalty sleeps on the sudden memory.high overage created by the write(),
> before that had a chance to work it off.
>
> Now that memory_high_write() performs reclaim before enforcing the new
> limit, reflect that the cgroup may well fail to converge due to concurrent
> workload activity.  Bail out of the loop after a few tries.
>
> Link: https://lkml.kernel.org/r/20210112163011.127833-1-hannes@cmpxchg.org
> Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Reported-by: Tejun Heo <tj@kernel.org>
> Acked-by: Roman Gushchin <guro@fb.com>
> Reviewed-by: Michal Koutný <mkoutny@suse.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: <stable@vger.kernel.org>    [5.8+]
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Johannes requested to replace this patch with
https://lore.kernel.org/linux-mm/20210122184341.292461-1-hannes@cmpxchg.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 06/19] mm: memcontrol: prevent starvation when writing memory.high
  2021-01-24 18:01   ` Shakeel Butt
@ 2021-01-24 18:35     ` Linus Torvalds
  0 siblings, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2021-01-24 18:35 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Roman Gushchin, Johannes Weiner, Linux MM,
	Michal Hocko, Michal Koutný,
	mm-commits, stable, Tejun Heo

On Sun, Jan 24, 2021 at 10:02 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> Johannes requested to replace this patch with
> https://lore.kernel.org/linux-mm/20210122184341.292461-1-hannes@cmpxchg.org/

I've dropped it (not replaced it - will wait for Andrew to
comment/send) from my queue.

          Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-01-24 18:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-24  5:00 incoming Andrew Morton
2021-01-24  5:00 ` [patch 01/19] x86/setup: don't remove E820_TYPE_RAM for pfn 0 Andrew Morton
2021-01-24  5:01 ` [patch 02/19] mm: fix initialization of struct page for holes in memory layout Andrew Morton
2021-01-24  5:01 ` [patch 03/19] mm: memcg/slab: optimize objcg stock draining Andrew Morton
2021-01-24  5:01 ` [patch 04/19] mm: memcg: fix memcg file_dirty numa stat Andrew Morton
2021-01-24  5:01 ` [patch 05/19] mm: fix numa stats for thp migration Andrew Morton
2021-01-24  5:01 ` [patch 06/19] mm: memcontrol: prevent starvation when writing memory.high Andrew Morton
2021-01-24 18:01   ` Shakeel Butt
2021-01-24 18:35     ` Linus Torvalds
2021-01-24  5:01 ` [patch 07/19] kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow Andrew Morton
2021-01-24  5:01 ` [patch 08/19] kasan: fix incorrect arguments passing in kasan_add_zero_shadow Andrew Morton
2021-01-24  5:01 ` [patch 09/19] kasan: fix HW_TAGS boot parameters Andrew Morton
2021-01-24  5:01 ` [patch 10/19] kasan, mm: fix conflicts with init_on_alloc/free Andrew Morton
2021-01-24  5:01 ` [patch 11/19] kasan, mm: fix resetting page_alloc tags for HW_TAGS Andrew Morton
2021-01-24  5:01 ` [patch 12/19] ubsan: disable unsigned-overflow check for i386 Andrew Morton
2021-01-24  5:01 ` [patch 13/19] mm: fix page reference leak in soft_offline_page() Andrew Morton
2021-01-24  5:01 ` [patch 14/19] sparc/mm/highmem: flush cache and TLB Andrew Morton
2021-01-24  5:02 ` [patch 15/19] mm/highmem: prepare for overriding set_pte_at() Andrew Morton
2021-01-24  5:02 ` [patch 16/19] mips/mm/highmem: use set_pte() for kmap_local() Andrew Morton
2021-01-24  5:02 ` [patch 17/19] powerpc/mm/highmem: use __set_pte_at() " Andrew Morton
2021-01-24  5:02 ` [patch 18/19] proc_sysctl: fix oops caused by incorrect command parameters Andrew Morton
2021-01-24  5:02 ` [patch 19/19] MAINTAINERS: add a couple more files to the Clang/LLVM section Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).