mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* incoming
@ 2020-09-04 23:34 Andrew Morton
  2020-09-04 23:35 ` [patch 01/19] memcg: fix use-after-free in uncharge_batch Andrew Morton
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

19 patches, based on 59126901f200f5fc907153468b03c64e0081b6e6.

Subsystems affected by this patch series:

  mm/memcg
  mm/slub
  MAINTAINERS
  mm/pagemap
  ipc
  fork
  checkpatch
  mm/madvise
  mm/migration
  mm/hugetlb
  lib

Subsystem: mm/memcg

    Michal Hocko <mhocko@suse.com>:
      memcg: fix use-after-free in uncharge_batch

    Xunlei Pang <xlpang@linux.alibaba.com>:
      mm: memcg: fix memcg reclaim soft lockup

Subsystem: mm/slub

    Eugeniu Rosca <erosca@de.adit-jv.com>:
      mm: slub: fix conversion of freelist_corrupted()

Subsystem: MAINTAINERS

    Robert Richter <rric@kernel.org>:
      MAINTAINERS: update Cavium/Marvell entries

    Nick Desaulniers <ndesaulniers@google.com>:
      MAINTAINERS: add LLVM maintainers

    Randy Dunlap <rdunlap@infradead.org>:
      MAINTAINERS: IA64: mark Status as Odd Fixes only

Subsystem: mm/pagemap

    Joerg Roedel <jroedel@suse.de>:
      mm: track page table modifications in __apply_to_page_range()

Subsystem: ipc

    Tobias Klauser <tklauser@distanz.ch>:
      ipc: adjust proc_ipc_sem_dointvec definition to match prototype

Subsystem: fork

    Tobias Klauser <tklauser@distanz.ch>:
      fork: adjust sysctl_max_threads definition to match prototype

Subsystem: checkpatch

    Mrinal Pandey <mrinalmni@gmail.com>:
      checkpatch: fix the usage of capture group ( ... )

Subsystem: mm/madvise

    Yang Shi <shy828301@gmail.com>:
      mm: madvise: fix vma user-after-free

Subsystem: mm/migration

    Alistair Popple <alistair@popple.id.au>:
      mm/migrate: fixup setting UFFD_WP flag
      mm/rmap: fixup copying of soft dirty and uffd ptes

    Ralph Campbell <rcampbell@nvidia.com>:
    Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()":
      mm/migrate: remove unnecessary is_zone_device_page() check
      mm/migrate: preserve soft dirty in remove_migration_pte()

Subsystem: mm/hugetlb

    Li Xinhai <lixinhai.lxh@gmail.com>:
      mm/hugetlb: try preferred node first when alloc gigantic page from cma

    Muchun Song <songmuchun@bytedance.com>:
      mm/hugetlb: fix a race between hugetlb sysctl handlers

    David Howells <dhowells@redhat.com>:
      mm/khugepaged.c: fix khugepaged's request size in collapse_file

Subsystem: lib

    Jason Gunthorpe <jgg@nvidia.com>:
      include/linux/log2.h: add missing () around n in roundup_pow_of_two()

 MAINTAINERS           |   32 ++++++++++++++++----------------
 include/linux/log2.h  |    2 +-
 ipc/ipc_sysctl.c      |    2 +-
 kernel/fork.c         |    2 +-
 mm/hugetlb.c          |   49 +++++++++++++++++++++++++++++++++++++------------
 mm/khugepaged.c       |    2 +-
 mm/madvise.c          |    2 +-
 mm/memcontrol.c       |    6 ++++++
 mm/memory.c           |   37 ++++++++++++++++++++++++-------------
 mm/migrate.c          |   31 +++++++++++++++++++------------
 mm/rmap.c             |    9 +++++++--
 mm/slub.c             |   12 ++++++------
 mm/vmscan.c           |    8 ++++++++
 scripts/checkpatch.pl |    4 ++--
 14 files changed, 130 insertions(+), 68 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 01/19] memcg: fix use-after-free in uncharge_batch
  2020-09-04 23:34 incoming Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 02/19] mm: memcg: fix memcg reclaim soft lockup Andrew Morton
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, guro, hannes, hughd, linux-mm, mhocko, mm-commits,
	shakeelb, torvalds

From: Michal Hocko <mhocko@suse.com>
Subject: memcg: fix use-after-free in uncharge_batch

syzbot has reported an use-after-free in the uncharge_batch path

BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline]
BUG: KASAN: use-after-free in atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
BUG: KASAN: use-after-free in atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
BUG: KASAN: use-after-free in page_counter_cancel mm/page_counter.c:54 [inline]
BUG: KASAN: use-after-free in page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
Write of size 8 at addr ffff8880371c0148 by task syz-executor.0/9304

CPU: 0 PID: 9304 Comm: syz-executor.0 Not tainted 5.8.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1f0/0x31e lib/dump_stack.c:118
 print_address_description+0x66/0x620 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report+0x132/0x1d0 mm/kasan/report.c:530
 check_memory_region_inline mm/kasan/generic.c:183 [inline]
 check_memory_region+0x2b5/0x2f0 mm/kasan/generic.c:192
 instrument_atomic_write include/linux/instrumented.h:71 [inline]
 atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
 atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
 page_counter_cancel mm/page_counter.c:54 [inline]
 page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
 uncharge_batch+0x6c/0x350 mm/memcontrol.c:6764
 uncharge_page+0x115/0x430 mm/memcontrol.c:6796
 uncharge_list mm/memcontrol.c:6835 [inline]
 mem_cgroup_uncharge_list+0x70/0xe0 mm/memcontrol.c:6877
 release_pages+0x13a2/0x1550 mm/swap.c:911
 tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
 tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
 tlb_flush_mmu+0x780/0x910 mm/mmu_gather.c:249
 tlb_finish_mmu+0xcb/0x200 mm/mmu_gather.c:328
 exit_mmap+0x296/0x550 mm/mmap.c:3185
 __mmput+0x113/0x370 kernel/fork.c:1076
 exit_mm+0x4cd/0x550 kernel/exit.c:483
 do_exit+0x576/0x1f20 kernel/exit.c:793
 do_group_exit+0x161/0x2d0 kernel/exit.c:903
 get_signal+0x139b/0x1d30 kernel/signal.c:2743
 arch_do_signal+0x33/0x610 arch/x86/kernel/signal.c:811
 exit_to_user_mode_loop kernel/entry/common.c:135 [inline]
 exit_to_user_mode_prepare+0x8d/0x1b0 kernel/entry/common.c:166
 syscall_exit_to_user_mode+0x5e/0x1a0 kernel/entry/common.c:241
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

1a3e1f40962c ("mm: memcontrol: decouple reference counting from page
accounting") has reworked the memcg lifetime to be bound the the struct
page rather than charges.  It has also removed the css_put_many from
uncharge_batch and that is causing the above splat.  uncharge_batch is
supposed to uncharge accumulated charges for all pages freed from the same
memcg.  The queuing is done by uncharge_page which however drops the memcg
reference after it adds charges to the batch.  If the current page happens
to be the last one holding the reference for its memcg then the memcg is
OK to go and the next page to be freed will trigger batched uncharge which
needs to access the memcg which is gone already.

Fix the issue by taking a reference for the memcg in the current batch.

Link: https://lkml.kernel.org/r/20200820090341.GC5033@dhcp22.suse.cz
Fixes: 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: syzbot+b305848212deec86eabe@syzkaller.appspotmail.com
Reported-by: syzbot+b5ea6fb6f139c8b9482b@syzkaller.appspotmail.com
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/mm/memcontrol.c~memcg-fix-use-after-free-in-uncharge_batch
+++ a/mm/memcontrol.c
@@ -6774,6 +6774,9 @@ static void uncharge_batch(const struct
 	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
 	memcg_check_events(ug->memcg, ug->dummy_page);
 	local_irq_restore(flags);
+
+	/* drop reference from uncharge_page */
+	css_put(&ug->memcg->css);
 }
 
 static void uncharge_page(struct page *page, struct uncharge_gather *ug)
@@ -6797,6 +6800,9 @@ static void uncharge_page(struct page *p
 			uncharge_gather_clear(ug);
 		}
 		ug->memcg = page->mem_cgroup;
+
+		/* pairs with css_put in uncharge_batch */
+		css_get(&ug->memcg->css);
 	}
 
 	nr_pages = compound_nr(page);
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 02/19] mm: memcg: fix memcg reclaim soft lockup
  2020-09-04 23:34 incoming Andrew Morton
  2020-09-04 23:35 ` [patch 01/19] memcg: fix use-after-free in uncharge_batch Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 03/19] mm: slub: fix conversion of freelist_corrupted() Andrew Morton
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, chris, hannes, linux-mm, mhocko, mm-commits, torvalds, xlpang

From: Xunlei Pang <xlpang@linux.alibaba.com>
Subject: mm: memcg: fix memcg reclaim soft lockup

We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg
doesn't have any reclaimable memory.

It can be easily reproduced as below:
 watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
 CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
 Call Trace:
  shrink_lruvec+0x49f/0x640
  shrink_node+0x2a6/0x6f0
  do_try_to_free_pages+0xe9/0x3e0
  try_to_free_mem_cgroup_pages+0xef/0x1f0
  try_charge+0x2c1/0x750
  mem_cgroup_charge+0xd7/0x240
  __add_to_page_cache_locked+0x2fd/0x370
  add_to_page_cache_lru+0x4a/0xc0
  pagecache_get_page+0x10b/0x2f0
  filemap_fault+0x661/0xad0
  ext4_filemap_fault+0x2c/0x40
  __do_fault+0x4d/0xf9
  handle_mm_fault+0x1080/0x1790

It only happens on our 1-vcpu instances, because there's no chance for oom
reaper to run to reclaim the to-be-killed process.

Add a cond_resched() at the upper shrink_node_memcgs() to solve this
issue, this will mean that we will get a scheduling point for each memcg
in the reclaimed hierarchy without any dependency on the reclaimable
memory in that memcg thus making it more predictable.

Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/mm/vmscan.c~mm-memcg-fix-memcg-reclaim-soft-lockup
+++ a/mm/vmscan.c
@@ -2615,6 +2615,14 @@ static void shrink_node_memcgs(pg_data_t
 		unsigned long reclaimed;
 		unsigned long scanned;
 
+		/*
+		 * This loop can become CPU-bound when target memcgs
+		 * aren't eligible for reclaim - either because they
+		 * don't have any reclaimable pages, or because their
+		 * memory is explicitly protected. Avoid soft lockups.
+		 */
+		cond_resched();
+
 		mem_cgroup_calculate_protection(target_memcg, memcg);
 
 		if (mem_cgroup_below_min(memcg)) {
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 03/19] mm: slub: fix conversion of freelist_corrupted()
  2020-09-04 23:34 incoming Andrew Morton
  2020-09-04 23:35 ` [patch 01/19] memcg: fix use-after-free in uncharge_batch Andrew Morton
  2020-09-04 23:35 ` [patch 02/19] mm: memcg: fix memcg reclaim soft lockup Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 04/19] MAINTAINERS: update Cavium/Marvell entries Andrew Morton
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, cl, dongli.zhang, erosca, iamjoonsoo.kim, joe.jin,
	linux-mm, mm-commits, penberg, rientjes, stable, torvalds

From: Eugeniu Rosca <erosca@de.adit-jv.com>
Subject: mm: slub: fix conversion of freelist_corrupted()

Commit 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in
deactivate_slab()") suffered an update when picked up from LKML [1].

Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()'
created a no-op statement.  Fix it by sticking to the behavior intended in
the original patch [1].  In addition, make freelist_corrupted() immune to
passing NULL instead of &freelist.

The issue has been spotted via static analysis and code review.

[1] https://lore.kernel.org/linux-mm/20200331031450.12182-1-dongli.zhang@oracle.com/

Link: https://lkml.kernel.org/r/20200824130643.10291-1-erosca@de.adit-jv.com
Fixes: 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in deactivate_slab()")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/mm/slub.c~mm-slub-fix-conversion-of-freelist_corrupted
+++ a/mm/slub.c
@@ -672,12 +672,12 @@ static void slab_fix(struct kmem_cache *
 }
 
 static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
-			       void *freelist, void *nextfree)
+			       void **freelist, void *nextfree)
 {
 	if ((s->flags & SLAB_CONSISTENCY_CHECKS) &&
-	    !check_valid_pointer(s, page, nextfree)) {
-		object_err(s, page, freelist, "Freechain corrupt");
-		freelist = NULL;
+	    !check_valid_pointer(s, page, nextfree) && freelist) {
+		object_err(s, page, *freelist, "Freechain corrupt");
+		*freelist = NULL;
 		slab_fix(s, "Isolate corrupted freechain");
 		return true;
 	}
@@ -1494,7 +1494,7 @@ static inline void dec_slabs_node(struct
 							int objects) {}
 
 static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
-			       void *freelist, void *nextfree)
+			       void **freelist, void *nextfree)
 {
 	return false;
 }
@@ -2184,7 +2184,7 @@ static void deactivate_slab(struct kmem_
 		 * 'freelist' is already corrupted.  So isolate all objects
 		 * starting at 'freelist'.
 		 */
-		if (freelist_corrupted(s, page, freelist, nextfree))
+		if (freelist_corrupted(s, page, &freelist, nextfree))
 			break;
 
 		do {
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 04/19] MAINTAINERS: update Cavium/Marvell entries
  2020-09-04 23:34 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2020-09-04 23:35 ` [patch 03/19] mm: slub: fix conversion of freelist_corrupted() Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 05/19] MAINTAINERS: add LLVM maintainers Andrew Morton
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, arnd, bp, gkulkarni, linux-mm, maz, mm-commits, rric,
	sgoutham, torvalds, wsa

From: Robert Richter <rric@kernel.org>
Subject: MAINTAINERS: update Cavium/Marvell entries

I am leaving Marvell and already do not have access to my @marvell.com
email address.  So switching over to my korg mail address or removing my
address there another maintainer is already listed.  For the entries there
no other maintainer is listed I will keep looking into patches for Cavium
systems for a while until someone from Marvell takes it over.  Since I
might have limited access to hardware and also limited time I changed
state to 'Odd Fixes' for those entries.

Link: https://lkml.kernel.org/r/20200824122050.31164-1-rric@kernel.org
Signed-off-by: Robert Richter <rric@kernel.org>
Cc: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Cc: Sunil Goutham <sgoutham@marvell.com>
CC: Borislav Petkov <bp@alien8.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Wolfram Sang <wsa@kernel.org>, 
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 MAINTAINERS |   28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

--- a/MAINTAINERS~maintainers-update-cavium-marvell-entries
+++ a/MAINTAINERS
@@ -1694,7 +1694,6 @@ F:	arch/arm/mach-cns3xxx/
 
 ARM/CAVIUM THUNDER NETWORK DRIVER
 M:	Sunil Goutham <sgoutham@marvell.com>
-M:	Robert Richter <rrichter@marvell.com>
 L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:	Supported
 F:	drivers/net/ethernet/cavium/thunder/
@@ -3948,8 +3947,8 @@ W:	https://wireless.wiki.kernel.org/en/u
 F:	drivers/net/wireless/ath/carl9170/
 
 CAVIUM I2C DRIVER
-M:	Robert Richter <rrichter@marvell.com>
-S:	Supported
+M:	Robert Richter <rric@kernel.org>
+S:	Odd Fixes
 W:	http://www.marvell.com
 F:	drivers/i2c/busses/i2c-octeon*
 F:	drivers/i2c/busses/i2c-thunderx*
@@ -3964,8 +3963,8 @@ W:	http://www.marvell.com
 F:	drivers/net/ethernet/cavium/liquidio/
 
 CAVIUM MMC DRIVER
-M:	Robert Richter <rrichter@marvell.com>
-S:	Supported
+M:	Robert Richter <rric@kernel.org>
+S:	Odd Fixes
 W:	http://www.marvell.com
 F:	drivers/mmc/host/cavium*
 
@@ -3977,9 +3976,9 @@ W:	http://www.marvell.com
 F:	drivers/crypto/cavium/cpt/
 
 CAVIUM THUNDERX2 ARM64 SOC
-M:	Robert Richter <rrichter@marvell.com>
+M:	Robert Richter <rric@kernel.org>
 L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-S:	Maintained
+S:	Odd Fixes
 F:	Documentation/devicetree/bindings/arm/cavium-thunder2.txt
 F:	arch/arm64/boot/dts/cavium/thunder2-99xx*
 
@@ -6191,16 +6190,15 @@ F:	drivers/edac/highbank*
 
 EDAC-CAVIUM OCTEON
 M:	Ralf Baechle <ralf@linux-mips.org>
-M:	Robert Richter <rrichter@marvell.com>
 L:	linux-edac@vger.kernel.org
 L:	linux-mips@vger.kernel.org
 S:	Supported
 F:	drivers/edac/octeon_edac*
 
 EDAC-CAVIUM THUNDERX
-M:	Robert Richter <rrichter@marvell.com>
+M:	Robert Richter <rric@kernel.org>
 L:	linux-edac@vger.kernel.org
-S:	Supported
+S:	Odd Fixes
 F:	drivers/edac/thunderx_edac*
 
 EDAC-CORE
@@ -6208,7 +6206,7 @@ M:	Borislav Petkov <bp@alien8.de>
 M:	Mauro Carvalho Chehab <mchehab@kernel.org>
 M:	Tony Luck <tony.luck@intel.com>
 R:	James Morse <james.morse@arm.com>
-R:	Robert Richter <rrichter@marvell.com>
+R:	Robert Richter <rric@kernel.org>
 L:	linux-edac@vger.kernel.org
 S:	Supported
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next
@@ -13446,10 +13444,10 @@ F:	Documentation/devicetree/bindings/pci
 F:	drivers/pci/controller/dwc/*artpec*
 
 PCIE DRIVER FOR CAVIUM THUNDERX
-M:	Robert Richter <rrichter@marvell.com>
+M:	Robert Richter <rric@kernel.org>
 L:	linux-pci@vger.kernel.org
 L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-S:	Supported
+S:	Odd Fixes
 F:	drivers/pci/controller/pci-thunder-*
 
 PCIE DRIVER FOR HISILICON
@@ -17237,8 +17235,8 @@ S:	Maintained
 F:	drivers/net/thunderbolt.c
 
 THUNDERX GPIO DRIVER
-M:	Robert Richter <rrichter@marvell.com>
-S:	Maintained
+M:	Robert Richter <rric@kernel.org>
+S:	Odd Fixes
 F:	drivers/gpio/gpio-thunderx.c
 
 TI AM437X VPFE DRIVER
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 05/19] MAINTAINERS: add LLVM maintainers
  2020-09-04 23:34 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2020-09-04 23:35 ` [patch 04/19] MAINTAINERS: update Cavium/Marvell entries Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 06/19] MAINTAINERS: IA64: mark Status as Odd Fixes only Andrew Morton
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, linux-mm, lukas.bulwahn, masahiroy, miguel.ojeda.sandonis,
	mm-commits, natechancellor, ndesaulniers, sedat.dilek, torvalds

From: Nick Desaulniers <ndesaulniers@google.com>
Subject: MAINTAINERS: add LLVM maintainers

Nominate Nathan and myself to be point of contact for clang/LLVM related
support, after a poll at the LLVM BoF at Linux Plumbers Conf 2020.

While corporate sponsorship is beneficial, its important to not entrust
the keys to the nukes with any one entity.  Should Nathan and I find
ourselves at the same employer, I would gladly step down.

Link: https://lkml.kernel.org/r/20200825143540.2948637-1-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Masahiro Yamada <masahiroy@kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 MAINTAINERS |    2 ++
 1 file changed, 2 insertions(+)

--- a/MAINTAINERS~maintainers-add-llvm-maintainers
+++ a/MAINTAINERS
@@ -4257,6 +4257,8 @@ S:	Maintained
 F:	.clang-format
 
 CLANG/LLVM BUILD SUPPORT
+M:	Nathan Chancellor <natechancellor@gmail.com>
+M:	Nick Desaulniers <ndesaulniers@google.com>
 L:	clang-built-linux@googlegroups.com
 S:	Supported
 W:	https://clangbuiltlinux.github.io/
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 06/19] MAINTAINERS: IA64: mark Status as Odd Fixes only
  2020-09-04 23:34 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2020-09-04 23:35 ` [patch 05/19] MAINTAINERS: add LLVM maintainers Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 07/19] mm: track page table modifications in __apply_to_page_range() Andrew Morton
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, fenghua.yu, linux-mm, mm-commits, rdunlap, tony.luck, torvalds

From: Randy Dunlap <rdunlap@infradead.org>
Subject: MAINTAINERS: IA64: mark Status as Odd Fixes only

IA64 isn't really being maintained, so mark it as Odd Fixes only.

Link: http://lkml.kernel.org/r/7e719139-450f-52c2-59a2-7964a34eda1f@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 MAINTAINERS |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/MAINTAINERS~maintainers-ia64-mark-status-as-odd-fixes-only
+++ a/MAINTAINERS
@@ -8272,7 +8272,7 @@ IA64 (Itanium) PLATFORM
 M:	Tony Luck <tony.luck@intel.com>
 M:	Fenghua Yu <fenghua.yu@intel.com>
 L:	linux-ia64@vger.kernel.org
-S:	Maintained
+S:	Odd Fixes
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
 F:	Documentation/ia64/
 F:	arch/ia64/
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 07/19] mm: track page table modifications in __apply_to_page_range()
  2020-09-04 23:34 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2020-09-04 23:35 ` [patch 06/19] MAINTAINERS: IA64: mark Status as Odd Fixes only Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 08/19] ipc: adjust proc_ipc_sem_dointvec definition to match prototype Andrew Morton
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, chris, jroedel, linux-mm, mm-commits, pavel, sfr, stable, torvalds

From: Joerg Roedel <jroedel@suse.de>
Subject: mm: track page table modifications in __apply_to_page_range()

__apply_to_page_range() is also used to change and/or allocate page-table
pages in the vmalloc area of the address space.  Make sure these changes
get synchronized to other page-tables in the system by calling
arch_sync_kernel_mappings() when necessary.

The impact appears limited to x86-32, where apply_to_page_range may miss
updating the PMD.  That leads to explosions in drivers like

[   24.227844] BUG: unable to handle page fault for address: fe036000
[   24.228076] #PF: supervisor write access in kernel mode
[   24.228294] #PF: error_code(0x0002) - not-present page
[   24.228494] *pde = 00000000
[   24.228640] Oops: 0002 [#1] SMP
[   24.228788] CPU: 3 PID: 1300 Comm: gem_concurrent_ Not tainted 5.9.0-rc1+ #16
[   24.228957] Hardware name:  /NUC6i3SYB, BIOS SYSKLi35.86A.0024.2015.1027.2142 10/27/2015
[   24.229297] EIP: __execlists_context_alloc+0x132/0x2d0 [i915]
[   24.229462] Code: 31 d2 89 f0 e8 2f 55 02 00 89 45 e8 3d 00 f0 ff ff 0f 87 11 01 00 00 8b 4d e8 03 4b 30 b8 5a 5a 5a 5a ba 01 00 00 00 8d 79 04 <c7> 01 5a 5a 5a 5a c7 81 fc 0f 00 00 5a 5a 5a 5a 83 e7 fc 29 f9 81
[   24.229759] EAX: 5a5a5a5a EBX: f60ca000 ECX: fe036000 EDX: 00000001
[   24.229915] ESI: f43b7340 EDI: fe036004 EBP: f6389cb8 ESP: f6389c9c
[   24.230072] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010286
[   24.230229] CR0: 80050033 CR2: fe036000 CR3: 2d361000 CR4: 001506d0
[   24.230385] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   24.230539] DR6: fffe0ff0 DR7: 00000400
[   24.230675] Call Trace:
[   24.230957]  execlists_context_alloc+0x10/0x20 [i915]
[   24.231266]  intel_context_alloc_state+0x3f/0x70 [i915]
[   24.231547]  __intel_context_do_pin+0x117/0x170 [i915]
[   24.231850]  i915_gem_do_execbuffer+0xcc7/0x2500 [i915]
[   24.232024]  ? __kmalloc_track_caller+0x54/0x230
[   24.232181]  ? ktime_get+0x3e/0x120
[   24.232333]  ? dma_fence_signal+0x34/0x50
[   24.232617]  i915_gem_execbuffer2_ioctl+0xcd/0x1f0 [i915]
[   24.232912]  ? i915_gem_execbuffer_ioctl+0x2e0/0x2e0 [i915]
[   24.233084]  drm_ioctl_kernel+0x8f/0xd0
[   24.233236]  drm_ioctl+0x223/0x3d0
[   24.233505]  ? i915_gem_execbuffer_ioctl+0x2e0/0x2e0 [i915]
[   24.233684]  ? pick_next_task_fair+0x1b5/0x3d0
[   24.233873]  ? __switch_to_asm+0x36/0x50
[   24.234021]  ? drm_ioctl_kernel+0xd0/0xd0
[   24.234167]  __ia32_sys_ioctl+0x1ab/0x760
[   24.234313]  ? exit_to_user_mode_prepare+0xe5/0x110
[   24.234453]  ? syscall_exit_to_user_mode+0x23/0x130
[   24.234601]  __do_fast_syscall_32+0x3f/0x70
[   24.234744]  do_fast_syscall_32+0x29/0x60
[   24.234885]  do_SYSENTER_32+0x15/0x20
[   24.235021]  entry_SYSENTER_32+0x9f/0xf2
[   24.235157] EIP: 0xb7f28559
[   24.235288] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
[   24.235576] EAX: ffffffda EBX: 00000005 ECX: c0406469 EDX: bf95556c
[   24.235722] ESI: b7e68000 EDI: c0406469 EBP: 00000005 ESP: bf9554d8
[   24.235869] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
[   24.236018] Modules linked in: i915 x86_pkg_temp_thermal intel_powerclamp crc32_pclmul crc32c_intel intel_cstate intel_uncore intel_gtt drm_kms_helper intel_pch_thermal video button autofs4 i2c_i801 i2c_smbus fan
[   24.236336] CR2: 00000000fe036000

It looks like kasan, xen and i915 are vulnerable.

Actual impact is "on thinkpad X60 in 5.9-rc1, screen starts blinking after
30-or-so minutes, and machine is unusable"

[sfr@canb.auug.org.au: ARCH_PAGE_TABLE_SYNC_MASK needs vmalloc.h]
  Link: https://lkml.kernel.org/r/20200825172508.16800a4f@canb.auug.org.au
[chris@chris-wilson.co.uk: changelog addition]
[pavel@ucw.cz: changelog addition]
Link: https://lkml.kernel.org/r/20200821123746.16904-1-joro@8bytes.org
Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
Fixes: 86cf69f1d893 ("x86/mm/32: implement arch_sync_kernel_mappings()")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>	[x86-32]
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

--- a/mm/memory.c~mm-track-page-table-modifications-in-__apply_to_page_range
+++ a/mm/memory.c
@@ -73,6 +73,7 @@
 #include <linux/numa.h>
 #include <linux/perf_event.h>
 #include <linux/ptrace.h>
+#include <linux/vmalloc.h>
 
 #include <trace/events/kmem.h>
 
@@ -83,6 +84,7 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 
+#include "pgalloc-track.h"
 #include "internal.h"
 
 #if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST)
@@ -2206,7 +2208,8 @@ EXPORT_SYMBOL(vm_iomap_memory);
 
 static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
 				     unsigned long addr, unsigned long end,
-				     pte_fn_t fn, void *data, bool create)
+				     pte_fn_t fn, void *data, bool create,
+				     pgtbl_mod_mask *mask)
 {
 	pte_t *pte;
 	int err = 0;
@@ -2214,7 +2217,7 @@ static int apply_to_pte_range(struct mm_
 
 	if (create) {
 		pte = (mm == &init_mm) ?
-			pte_alloc_kernel(pmd, addr) :
+			pte_alloc_kernel_track(pmd, addr, mask) :
 			pte_alloc_map_lock(mm, pmd, addr, &ptl);
 		if (!pte)
 			return -ENOMEM;
@@ -2235,6 +2238,7 @@ static int apply_to_pte_range(struct mm_
 				break;
 		}
 	} while (addr += PAGE_SIZE, addr != end);
+	*mask |= PGTBL_PTE_MODIFIED;
 
 	arch_leave_lazy_mmu_mode();
 
@@ -2245,7 +2249,8 @@ static int apply_to_pte_range(struct mm_
 
 static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud,
 				     unsigned long addr, unsigned long end,
-				     pte_fn_t fn, void *data, bool create)
+				     pte_fn_t fn, void *data, bool create,
+				     pgtbl_mod_mask *mask)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -2254,7 +2259,7 @@ static int apply_to_pmd_range(struct mm_
 	BUG_ON(pud_huge(*pud));
 
 	if (create) {
-		pmd = pmd_alloc(mm, pud, addr);
+		pmd = pmd_alloc_track(mm, pud, addr, mask);
 		if (!pmd)
 			return -ENOMEM;
 	} else {
@@ -2264,7 +2269,7 @@ static int apply_to_pmd_range(struct mm_
 		next = pmd_addr_end(addr, end);
 		if (create || !pmd_none_or_clear_bad(pmd)) {
 			err = apply_to_pte_range(mm, pmd, addr, next, fn, data,
-						 create);
+						 create, mask);
 			if (err)
 				break;
 		}
@@ -2274,14 +2279,15 @@ static int apply_to_pmd_range(struct mm_
 
 static int apply_to_pud_range(struct mm_struct *mm, p4d_t *p4d,
 				     unsigned long addr, unsigned long end,
-				     pte_fn_t fn, void *data, bool create)
+				     pte_fn_t fn, void *data, bool create,
+				     pgtbl_mod_mask *mask)
 {
 	pud_t *pud;
 	unsigned long next;
 	int err = 0;
 
 	if (create) {
-		pud = pud_alloc(mm, p4d, addr);
+		pud = pud_alloc_track(mm, p4d, addr, mask);
 		if (!pud)
 			return -ENOMEM;
 	} else {
@@ -2291,7 +2297,7 @@ static int apply_to_pud_range(struct mm_
 		next = pud_addr_end(addr, end);
 		if (create || !pud_none_or_clear_bad(pud)) {
 			err = apply_to_pmd_range(mm, pud, addr, next, fn, data,
-						 create);
+						 create, mask);
 			if (err)
 				break;
 		}
@@ -2301,14 +2307,15 @@ static int apply_to_pud_range(struct mm_
 
 static int apply_to_p4d_range(struct mm_struct *mm, pgd_t *pgd,
 				     unsigned long addr, unsigned long end,
-				     pte_fn_t fn, void *data, bool create)
+				     pte_fn_t fn, void *data, bool create,
+				     pgtbl_mod_mask *mask)
 {
 	p4d_t *p4d;
 	unsigned long next;
 	int err = 0;
 
 	if (create) {
-		p4d = p4d_alloc(mm, pgd, addr);
+		p4d = p4d_alloc_track(mm, pgd, addr, mask);
 		if (!p4d)
 			return -ENOMEM;
 	} else {
@@ -2318,7 +2325,7 @@ static int apply_to_p4d_range(struct mm_
 		next = p4d_addr_end(addr, end);
 		if (create || !p4d_none_or_clear_bad(p4d)) {
 			err = apply_to_pud_range(mm, p4d, addr, next, fn, data,
-						 create);
+						 create, mask);
 			if (err)
 				break;
 		}
@@ -2331,8 +2338,9 @@ static int __apply_to_page_range(struct
 				 void *data, bool create)
 {
 	pgd_t *pgd;
-	unsigned long next;
+	unsigned long start = addr, next;
 	unsigned long end = addr + size;
+	pgtbl_mod_mask mask = 0;
 	int err = 0;
 
 	if (WARN_ON(addr >= end))
@@ -2343,11 +2351,14 @@ static int __apply_to_page_range(struct
 		next = pgd_addr_end(addr, end);
 		if (!create && pgd_none_or_clear_bad(pgd))
 			continue;
-		err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create);
+		err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create, &mask);
 		if (err)
 			break;
 	} while (pgd++, addr = next, addr != end);
 
+	if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
+		arch_sync_kernel_mappings(start, start + size);
+
 	return err;
 }
 
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 08/19] ipc: adjust proc_ipc_sem_dointvec definition to match prototype
  2020-09-04 23:34 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2020-09-04 23:35 ` [patch 07/19] mm: track page table modifications in __apply_to_page_range() Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 09/19] fork: adjust sysctl_max_threads " Andrew Morton
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, hch, linux-mm, mm-commits, tklauser, torvalds, viro

From: Tobias Klauser <tklauser@distanz.ch>
Subject: ipc: adjust proc_ipc_sem_dointvec definition to match prototype

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of proc_ipc_sem_dointvec to match ctl_table.proc_handler which
fixes the following sparse error/warning:

ipc/ipc_sysctl.c:94:47: warning: incorrect type in argument 3 (different address spaces)
ipc/ipc_sysctl.c:94:47:    expected void *buffer
ipc/ipc_sysctl.c:94:47:    got void [noderef] __user *buffer
ipc/ipc_sysctl.c:194:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces))
ipc/ipc_sysctl.c:194:35:    expected int ( [usertype] *proc_handler )( ... )
ipc/ipc_sysctl.c:194:35:    got int ( * )( ... )

Link: https://lkml.kernel.org/r/20200825105846.5193-1-tklauser@distanz.ch
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 ipc/ipc_sysctl.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/ipc/ipc_sysctl.c~ipc-adjust-proc_ipc_sem_dointvec-definition-to-match-prototype
+++ a/ipc/ipc_sysctl.c
@@ -85,7 +85,7 @@ static int proc_ipc_auto_msgmni(struct c
 }
 
 static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
-	void __user *buffer, size_t *lenp, loff_t *ppos)
+	void *buffer, size_t *lenp, loff_t *ppos)
 {
 	int ret, semmni;
 	struct ipc_namespace *ns = current->nsproxy->ipc_ns;
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 09/19] fork: adjust sysctl_max_threads definition to match prototype
  2020-09-04 23:34 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2020-09-04 23:35 ` [patch 08/19] ipc: adjust proc_ipc_sem_dointvec definition to match prototype Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 10/19] checkpatch: fix the usage of capture group ( ... ) Andrew Morton
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, hch, linux-mm, mm-commits, tklauser, torvalds, viro

From: Tobias Klauser <tklauser@distanz.ch>
Subject: fork: adjust sysctl_max_threads definition to match prototype

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
definition of sysctl_max_threads to match its prototype in linux/sysctl.h
which fixes the following sparse error/warning:

kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
kernel/fork.c:3050:47:    expected void *
kernel/fork.c:3050:47:    got void [noderef] __user *buffer
kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
kernel/fork.c:3036:5:    int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
./include/linux/sysctl.h:242:5: note: previously declared as:
./include/linux/sysctl.h:242:5:    int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )

Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/fork.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/fork.c~fork-adjust-sysctl_max_threads-definition-to-match-prototype
+++ a/kernel/fork.c
@@ -3014,7 +3014,7 @@ int unshare_files(struct files_struct **
 }
 
 int sysctl_max_threads(struct ctl_table *table, int write,
-		       void __user *buffer, size_t *lenp, loff_t *ppos)
+		       void *buffer, size_t *lenp, loff_t *ppos)
 {
 	struct ctl_table t;
 	int ret;
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 10/19] checkpatch: fix the usage of capture group ( ... )
  2020-09-04 23:34 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2020-09-04 23:35 ` [patch 09/19] fork: adjust sysctl_max_threads " Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 11/19] mm: madvise: fix vma user-after-free Andrew Morton
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, joe, linux-mm, lukas.bulwahn, mm-commits, mrinalmni, torvalds

From: Mrinal Pandey <mrinalmni@gmail.com>
Subject: checkpatch: fix the usage of capture group ( ... )

The usage of "capture group (...)" in the immediate condition after `&&`
results in `$1` being uninitialized.  This issues a warning "Use of
uninitialized value $1 in regexp compilation at ./scripts/checkpatch.pl
line 2638".

I noticed this bug while running checkpatch on the set of commits from
v5.7 to v5.8-rc1 of the kernel on the commits with a diff content in their
commit message.

This bug was introduced in the script by commit e518e9a59ec3 ("checkpatch:
emit an error when there's a diff in a changelog").  It has been in the
script since then.

The author intended to store the match made by capture group in variable
`$1`.  This should have contained the name of the file as `[\w/]+`
matched.  However, this couldn't be accomplished due to usage of capture
group and `$1` in the same regular expression.

Fix this by placing the capture group in the condition before `&&`.  Thus,
`$1` can be initialized to the text that capture group matches thereby
setting it to the desired and required value.

Link: https://lkml.kernel.org/r/20200714032352.f476hanaj2dlmiot@mrinalpandey
Fixes: e518e9a59ec3 ("checkpatch: emit an error when there's a diff in a changelog")
Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/scripts/checkpatch.pl~checkpatch-fix-the-usage-of-capture-group
+++ a/scripts/checkpatch.pl
@@ -2639,8 +2639,8 @@ sub process {
 
 # Check if the commit log has what seems like a diff which can confuse patch
 		if ($in_commit_log && !$commit_log_has_diff &&
-		    (($line =~ m@^\s+diff\b.*a/[\w/]+@ &&
-		      $line =~ m@^\s+diff\b.*a/([\w/]+)\s+b/$1\b@) ||
+		    (($line =~ m@^\s+diff\b.*a/([\w/]+)@ &&
+		      $line =~ m@^\s+diff\b.*a/[\w/]+\s+b/$1\b@) ||
 		     $line =~ m@^\s*(?:\-\-\-\s+a/|\+\+\+\s+b/)@ ||
 		     $line =~ m/^\s*\@\@ \-\d+,\d+ \+\d+,\d+ \@\@/)) {
 			ERROR("DIFF_IN_COMMIT_MSG",
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 11/19] mm: madvise: fix vma user-after-free
  2020-09-04 23:34 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2020-09-04 23:35 ` [patch 10/19] checkpatch: fix the usage of capture group ( ... ) Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:35 ` [patch 12/19] mm/migrate: fixup setting UFFD_WP flag Andrew Morton
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, jack, linux-mm, mm-commits, shy828301, stable, torvalds

From: Yang Shi <shy828301@gmail.com>
Subject: mm: madvise: fix vma user-after-free

The syzbot reported the below use-after-free:

BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996

CPU: 0 PID: 9996 Comm: syz-executor.0 Not tainted 5.9.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 madvise_willneed mm/madvise.c:293 [inline]
 madvise_vma mm/madvise.c:942 [inline]
 do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
 do_madvise mm/madvise.c:1169 [inline]
 __do_sys_madvise mm/madvise.c:1171 [inline]
 __se_sys_madvise mm/madvise.c:1169 [inline]
 __x64_sys_madvise+0xd9/0x110 mm/madvise.c:1169
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45d4d9
Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f04f7464c78 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 0000000000020800 RCX: 000000000045d4d9
RDX: 0000000000000003 RSI: 0000000000600003 RDI: 0000000020000000
RBP: 000000000118d020 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cfec
R13: 00007ffc579cce7f R14: 00007f04f74659c0 R15: 000000000118cfec

Allocated by task 9992:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
 slab_post_alloc_hook mm/slab.h:518 [inline]
 slab_alloc mm/slab.c:3312 [inline]
 kmem_cache_alloc+0x138/0x3a0 mm/slab.c:3482
 vm_area_alloc+0x1c/0x110 kernel/fork.c:347
 mmap_region+0x8e5/0x1780 mm/mmap.c:1743
 do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
 vm_mmap_pgoff+0x195/0x200 mm/util.c:506
 ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 9992:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
 __cache_free mm/slab.c:3418 [inline]
 kmem_cache_free.part.0+0x67/0x1f0 mm/slab.c:3693
 remove_vma+0x132/0x170 mm/mmap.c:184
 remove_vma_list mm/mmap.c:2613 [inline]
 __do_munmap+0x743/0x1170 mm/mmap.c:2869
 do_munmap mm/mmap.c:2877 [inline]
 mmap_region+0x257/0x1780 mm/mmap.c:1716
 do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
 vm_mmap_pgoff+0x195/0x200 mm/util.c:506
 ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

It is because vma is accessed after releasing mmap_lock, but someone else
acquired the mmap_lock and the vma is gone.

Releasing mmap_lock after accessing vma should fix the problem.

Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.com
Fixes: 692fe62433d4c ("mm: Handle MADV_WILLNEED through vfs_fadvise()")
Reported-by: syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>	[5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/madvise.c~mm-madvise-fix-vma-user-after-free
+++ a/mm/madvise.c
@@ -289,9 +289,9 @@ static long madvise_willneed(struct vm_a
 	 */
 	*prev = NULL;	/* tell sys_madvise we drop mmap_lock */
 	get_file(file);
-	mmap_read_unlock(current->mm);
 	offset = (loff_t)(start - vma->vm_start)
 			+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+	mmap_read_unlock(current->mm);
 	vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
 	fput(file);
 	mmap_read_lock(current->mm);
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 12/19] mm/migrate: fixup setting UFFD_WP flag
  2020-09-04 23:34 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2020-09-04 23:35 ` [patch 11/19] mm: madvise: fix vma user-after-free Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
  2020-09-04 23:36 ` [patch 13/19] mm/rmap: fixup copying of soft dirty and uffd ptes Andrew Morton
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
  To: akpm, alistair, jglisse, jhubbard, linux-mm, mm-commits, peterx,
	rcampbell, torvalds

From: Alistair Popple <alistair@popple.id.au>
Subject: mm/migrate: fixup setting UFFD_WP flag

Commit f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration")
introduced support for tracking the uffd wp bit during page migration. 
However the non-swap PTE variant was used to set the flag for zone device
private pages which are a type of swap page.

This leads to corruption of the swap offset if the original PTE has the
uffd_wp flag set.

Link: https://lkml.kernel.org/r/20200825064232.10023-1-alistair@popple.id.au
Fixes: f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/migrate.c~mm-migrate-fixup-setting-uffd_wp-flag
+++ a/mm/migrate.c
@@ -251,7 +251,7 @@ static bool remove_migration_pte(struct
 				entry = make_device_private_entry(new, pte_write(pte));
 				pte = swp_entry_to_pte(entry);
 				if (pte_swp_uffd_wp(*pvmw.pte))
-					pte = pte_mkuffd_wp(pte);
+					pte = pte_swp_mkuffd_wp(pte);
 			}
 		}
 
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 13/19] mm/rmap: fixup copying of soft dirty and uffd ptes
  2020-09-04 23:34 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2020-09-04 23:35 ` [patch 12/19] mm/migrate: fixup setting UFFD_WP flag Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
  2020-09-04 23:36 ` [patch 14/19] mm/migrate: remove unnecessary is_zone_device_page() check Andrew Morton
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
  To: akpm, alistair, jglisse, jhubbard, linux-mm, mm-commits, peterx,
	rcampbell, stable, torvalds

From: Alistair Popple <alistair@popple.id.au>
Subject: mm/rmap: fixup copying of soft dirty and uffd ptes

During memory migration a pte is temporarily replaced with a migration
swap pte.  Some pte bits from the existing mapping such as the soft-dirty
and uffd write-protect bits are preserved by copying these to the
temporary migration swap pte.

However these bits are not stored at the same location for swap and
non-swap ptes.  Therefore testing these bits requires using the
appropriate helper function for the given pte type.

Unfortunately several code locations were found where the wrong helper
function is being used to test soft_dirty and uffd_wp bits which leads to
them getting incorrectly set or cleared during page-migration.

Fix these by using the correct tests based on pte type.

Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.au
Fixes: a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |   15 +++++++++++----
 mm/rmap.c    |    9 +++++++--
 2 files changed, 18 insertions(+), 6 deletions(-)

--- a/mm/migrate.c~mm-rmap-fixup-copying-of-soft-dirty-and-uffd-ptes
+++ a/mm/migrate.c
@@ -2427,10 +2427,17 @@ again:
 			entry = make_migration_entry(page, mpfn &
 						     MIGRATE_PFN_WRITE);
 			swp_pte = swp_entry_to_pte(entry);
-			if (pte_soft_dirty(pte))
-				swp_pte = pte_swp_mksoft_dirty(swp_pte);
-			if (pte_uffd_wp(pte))
-				swp_pte = pte_swp_mkuffd_wp(swp_pte);
+			if (pte_present(pte)) {
+				if (pte_soft_dirty(pte))
+					swp_pte = pte_swp_mksoft_dirty(swp_pte);
+				if (pte_uffd_wp(pte))
+					swp_pte = pte_swp_mkuffd_wp(swp_pte);
+			} else {
+				if (pte_swp_soft_dirty(pte))
+					swp_pte = pte_swp_mksoft_dirty(swp_pte);
+				if (pte_swp_uffd_wp(pte))
+					swp_pte = pte_swp_mkuffd_wp(swp_pte);
+			}
 			set_pte_at(mm, addr, ptep, swp_pte);
 
 			/*
--- a/mm/rmap.c~mm-rmap-fixup-copying-of-soft-dirty-and-uffd-ptes
+++ a/mm/rmap.c
@@ -1511,9 +1511,14 @@ static bool try_to_unmap_one(struct page
 			 */
 			entry = make_migration_entry(page, 0);
 			swp_pte = swp_entry_to_pte(entry);
-			if (pte_soft_dirty(pteval))
+
+			/*
+			 * pteval maps a zone device page and is therefore
+			 * a swap pte.
+			 */
+			if (pte_swp_soft_dirty(pteval))
 				swp_pte = pte_swp_mksoft_dirty(swp_pte);
-			if (pte_uffd_wp(pteval))
+			if (pte_swp_uffd_wp(pteval))
 				swp_pte = pte_swp_mkuffd_wp(swp_pte);
 			set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte);
 			/*
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 14/19] mm/migrate: remove unnecessary is_zone_device_page() check
  2020-09-04 23:34 incoming Andrew Morton
                   ` (12 preceding siblings ...)
  2020-09-04 23:36 ` [patch 13/19] mm/rmap: fixup copying of soft dirty and uffd ptes Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
  2020-09-04 23:36 ` [patch 15/19] mm/migrate: preserve soft dirty in remove_migration_pte() Andrew Morton
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
  To: akpm, apopple, bharata, hch, jgg, jglisse, linux-mm, mm-commits,
	rcampbell, torvalds

From: Ralph Campbell <rcampbell@nvidia.com>
Subject: mm/migrate: remove unnecessary is_zone_device_page() check

Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()".

I happened to notice this from code inspection after seeing Alistair
Popple's patch ("mm/rmap: Fixup copying of soft dirty and uffd ptes").


This patch (of 2):

The check for is_zone_device_page() and is_device_private_page() is
unnecessary since the latter is sufficient to determine if the page is a
device private page.  Simplify the code for easier reading.

Link: https://lkml.kernel.org/r/20200831212222.22409-1-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20200831212222.22409-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

--- a/mm/migrate.c~mm-migrate-remove-unnecessary-is_zone_device_page-check
+++ a/mm/migrate.c
@@ -246,13 +246,11 @@ static bool remove_migration_pte(struct
 		else if (pte_swp_uffd_wp(*pvmw.pte))
 			pte = pte_mkuffd_wp(pte);
 
-		if (unlikely(is_zone_device_page(new))) {
-			if (is_device_private_page(new)) {
-				entry = make_device_private_entry(new, pte_write(pte));
-				pte = swp_entry_to_pte(entry);
-				if (pte_swp_uffd_wp(*pvmw.pte))
-					pte = pte_swp_mkuffd_wp(pte);
-			}
+		if (unlikely(is_device_private_page(new))) {
+			entry = make_device_private_entry(new, pte_write(pte));
+			pte = swp_entry_to_pte(entry);
+			if (pte_swp_uffd_wp(*pvmw.pte))
+				pte = pte_swp_mkuffd_wp(pte);
 		}
 
 #ifdef CONFIG_HUGETLB_PAGE
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 15/19] mm/migrate: preserve soft dirty in remove_migration_pte()
  2020-09-04 23:34 incoming Andrew Morton
                   ` (13 preceding siblings ...)
  2020-09-04 23:36 ` [patch 14/19] mm/migrate: remove unnecessary is_zone_device_page() check Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
  2020-09-04 23:36 ` [patch 16/19] mm/hugetlb: try preferred node first when alloc gigantic page from cma Andrew Morton
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
  To: akpm, apopple, bharata, hch, jgg, jglisse, linux-mm, mm-commits,
	rcampbell, torvalds

From: Ralph Campbell <rcampbell@nvidia.com>
Subject: mm/migrate: preserve soft dirty in remove_migration_pte()

The code to remove a migration PTE and replace it with a device private
PTE was not copying the soft dirty bit from the migration entry.  This
could lead to page contents not being marked dirty when faulting the page
back from device private memory.

Link: https://lkml.kernel.org/r/20200831212222.22409-3-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/mm/migrate.c~mm-migrate-preserve-soft-dirty-in-remove_migration_pte
+++ a/mm/migrate.c
@@ -249,6 +249,8 @@ static bool remove_migration_pte(struct
 		if (unlikely(is_device_private_page(new))) {
 			entry = make_device_private_entry(new, pte_write(pte));
 			pte = swp_entry_to_pte(entry);
+			if (pte_swp_soft_dirty(*pvmw.pte))
+				pte = pte_swp_mksoft_dirty(pte);
 			if (pte_swp_uffd_wp(*pvmw.pte))
 				pte = pte_swp_mkuffd_wp(pte);
 		}
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 16/19] mm/hugetlb: try preferred node first when alloc gigantic page from cma
  2020-09-04 23:34 incoming Andrew Morton
                   ` (14 preceding siblings ...)
  2020-09-04 23:36 ` [patch 15/19] mm/migrate: preserve soft dirty in remove_migration_pte() Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
  2020-09-04 23:36 ` [patch 17/19] mm/hugetlb: fix a race between hugetlb sysctl handlers Andrew Morton
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
  To: akpm, guro, linux-mm, lixinhai.lxh, mhocko, mike.kravetz,
	mm-commits, torvalds

From: Li Xinhai <lixinhai.lxh@gmail.com>
Subject: mm/hugetlb: try preferred node first when alloc gigantic page from cma

Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic
hugepages using cma"), the gigantic page would be allocated from node
which is not the preferred node, although there are pages available from
that node.  The reason is that the nid parameter has been ignored in
alloc_gigantic_page().

Besides, the __GFP_THISNODE also need be checked if user required to alloc
only from the preferred node.

After this patch, the preferred node is tried first before other allowed
nodes, and don't try to allocate from other nodes if __GFP_THISNODE is
specified.  If user don't specify the preferred node, the current node
will be used as preferred node, which makes sure consistent behavior of
allocating gigantic and non-gigantic hugetlb page.

Link: https://lkml.kernel.org/r/20200902025016.697260-1-lixinhai.lxh@gmail.com
Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hugetlb.c |   23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

--- a/mm/hugetlb.c~mm-hugetlb-try-preferred-node-first-when-alloc-gigantic-page-from-cma
+++ a/mm/hugetlb.c
@@ -1250,21 +1250,32 @@ static struct page *alloc_gigantic_page(
 		int nid, nodemask_t *nodemask)
 {
 	unsigned long nr_pages = 1UL << huge_page_order(h);
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
 
 #ifdef CONFIG_CMA
 	{
 		struct page *page;
 		int node;
 
-		for_each_node_mask(node, *nodemask) {
-			if (!hugetlb_cma[node])
-				continue;
-
-			page = cma_alloc(hugetlb_cma[node], nr_pages,
-					 huge_page_order(h), true);
+		if (hugetlb_cma[nid]) {
+			page = cma_alloc(hugetlb_cma[nid], nr_pages,
+					huge_page_order(h), true);
 			if (page)
 				return page;
 		}
+
+		if (!(gfp_mask & __GFP_THISNODE)) {
+			for_each_node_mask(node, *nodemask) {
+				if (node == nid || !hugetlb_cma[node])
+					continue;
+
+				page = cma_alloc(hugetlb_cma[node], nr_pages,
+						huge_page_order(h), true);
+				if (page)
+					return page;
+			}
+		}
 	}
 #endif
 
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 17/19] mm/hugetlb: fix a race between hugetlb sysctl handlers
  2020-09-04 23:34 incoming Andrew Morton
                   ` (15 preceding siblings ...)
  2020-09-04 23:36 ` [patch 16/19] mm/hugetlb: try preferred node first when alloc gigantic page from cma Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
  2020-09-04 23:36 ` [patch 18/19] mm/khugepaged.c: fix khugepaged's request size in collapse_file Andrew Morton
  2020-09-04 23:36 ` [patch 19/19] include/linux/log2.h: add missing () around n in roundup_pow_of_two() Andrew Morton
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
  To: ak, akpm, linux-mm, mike.kravetz, mm-commits, songmuchun, torvalds

From: Muchun Song <songmuchun@bytedance.com>
Subject: mm/hugetlb: fix a race between hugetlb sysctl handlers

There is a race between the assignment of `table->data` and write value to
the pointer of `table->data` in the __do_proc_doulongvec_minmax() on the
other thread.

CPU0:                                 CPU1:
                                      proc_sys_write
hugetlb_sysctl_handler                  proc_sys_call_handler
hugetlb_sysctl_handler_common             hugetlb_sysctl_handler
  table->data = &tmp;                       hugetlb_sysctl_handler_common
                                              table->data = &tmp;
    proc_doulongvec_minmax
      do_proc_doulongvec_minmax           sysctl_head_finish
        __do_proc_doulongvec_minmax         unuse_table
          i = table->data;
          *i = val;  // corrupt CPU1's stack

Fix this by duplicating the `table`, and only update the duplicate of
it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.

The following oops was seen:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    Code: Bad RIP value.
    ...
    Call Trace:
     ? set_max_huge_pages+0x3da/0x4f0
     ? alloc_pool_huge_page+0x150/0x150
     ? proc_doulongvec_minmax+0x46/0x60
     ? hugetlb_sysctl_handler_common+0x1c7/0x200
     ? nr_hugepages_store+0x20/0x20
     ? copy_fd_bitmaps+0x170/0x170
     ? hugetlb_sysctl_handler+0x1e/0x20
     ? proc_sys_call_handler+0x2f1/0x300
     ? unregister_sysctl_table+0xb0/0xb0
     ? __fd_install+0x78/0x100
     ? proc_sys_write+0x14/0x20
     ? __vfs_write+0x4d/0x90
     ? vfs_write+0xef/0x240
     ? ksys_write+0xc0/0x160
     ? __ia32_sys_read+0x50/0x50
     ? __close_fd+0x129/0x150
     ? __x64_sys_write+0x43/0x50
     ? do_syscall_64+0x6c/0x200
     ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.com
Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hugetlb.c |   26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

--- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-hugetlb-sysctl-handlers
+++ a/mm/hugetlb.c
@@ -3465,6 +3465,22 @@ static unsigned int allowed_mems_nr(stru
 }
 
 #ifdef CONFIG_SYSCTL
+static int proc_hugetlb_doulongvec_minmax(struct ctl_table *table, int write,
+					  void *buffer, size_t *length,
+					  loff_t *ppos, unsigned long *out)
+{
+	struct ctl_table dup_table;
+
+	/*
+	 * In order to avoid races with __do_proc_doulongvec_minmax(), we
+	 * can duplicate the @table and alter the duplicate of it.
+	 */
+	dup_table = *table;
+	dup_table.data = out;
+
+	return proc_doulongvec_minmax(&dup_table, write, buffer, length, ppos);
+}
+
 static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
 			 struct ctl_table *table, int write,
 			 void *buffer, size_t *length, loff_t *ppos)
@@ -3476,9 +3492,8 @@ static int hugetlb_sysctl_handler_common
 	if (!hugepages_supported())
 		return -EOPNOTSUPP;
 
-	table->data = &tmp;
-	table->maxlen = sizeof(unsigned long);
-	ret = proc_doulongvec_minmax(table, write, buffer, length, ppos);
+	ret = proc_hugetlb_doulongvec_minmax(table, write, buffer, length, ppos,
+					     &tmp);
 	if (ret)
 		goto out;
 
@@ -3521,9 +3536,8 @@ int hugetlb_overcommit_handler(struct ct
 	if (write && hstate_is_gigantic(h))
 		return -EINVAL;
 
-	table->data = &tmp;
-	table->maxlen = sizeof(unsigned long);
-	ret = proc_doulongvec_minmax(table, write, buffer, length, ppos);
+	ret = proc_hugetlb_doulongvec_minmax(table, write, buffer, length, ppos,
+					     &tmp);
 	if (ret)
 		goto out;
 
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 18/19] mm/khugepaged.c: fix khugepaged's request size in collapse_file
  2020-09-04 23:34 incoming Andrew Morton
                   ` (16 preceding siblings ...)
  2020-09-04 23:36 ` [patch 17/19] mm/hugetlb: fix a race between hugetlb sysctl handlers Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
  2020-09-04 23:36 ` [patch 19/19] include/linux/log2.h: add missing () around n in roundup_pow_of_two() Andrew Morton
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
  To: akpm, dhowells, ebiggers, linux-mm, mm-commits,
	pankaj.gupta.linux, shy828301, songliubraving, torvalds, willy

From: David Howells <dhowells@redhat.com>
Subject: mm/khugepaged.c: fix khugepaged's request size in collapse_file

collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
be read to page_cache_sync_readahead().  The intent was probably to read a
single page.  Fix it to use the number of pages to the end of the window
instead.

Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.org
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Yang Shi <shy828301@gmail.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/khugepaged.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/khugepaged.c~fix-khugepageds-request-size-in-collapse_file
+++ a/mm/khugepaged.c
@@ -1709,7 +1709,7 @@ static void collapse_file(struct mm_stru
 				xas_unlock_irq(&xas);
 				page_cache_sync_readahead(mapping, &file->f_ra,
 							  file, index,
-							  PAGE_SIZE);
+							  end - index);
 				/* drain pagevecs to help isolate_lru_page() */
 				lru_add_drain();
 				page = find_lock_page(mapping, index);
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 19/19] include/linux/log2.h: add missing () around n in roundup_pow_of_two()
  2020-09-04 23:34 incoming Andrew Morton
                   ` (17 preceding siblings ...)
  2020-09-04 23:36 ` [patch 18/19] mm/khugepaged.c: fix khugepaged's request size in collapse_file Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
  18 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
  To: akpm, jgg, linux-mm, mm-commits, torvalds

From: Jason Gunthorpe <jgg@nvidia.com>
Subject: include/linux/log2.h: add missing () around n in roundup_pow_of_two()

Otherwise gcc generates warnings if the expression is complicated.

Link: https://lkml.kernel.org/r/0-v1-8a2697e3c003+41165-log_brackets_jgg@nvidia.com
Fixes: 312a0c170945 ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/log2.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/log2.h~log2-add-missing-around-n-in-roundup_pow_of_two
+++ a/include/linux/log2.h
@@ -173,7 +173,7 @@ unsigned long __rounddown_pow_of_two(uns
 #define roundup_pow_of_two(n)			\
 (						\
 	__builtin_constant_p(n) ? (		\
-		(n == 1) ? 1 :			\
+		((n) == 1) ? 1 :		\
 		(1UL << (ilog2((n) - 1) + 1))	\
 				   ) :		\
 	__roundup_pow_of_two(n)			\
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-09-04 23:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 23:34 incoming Andrew Morton
2020-09-04 23:35 ` [patch 01/19] memcg: fix use-after-free in uncharge_batch Andrew Morton
2020-09-04 23:35 ` [patch 02/19] mm: memcg: fix memcg reclaim soft lockup Andrew Morton
2020-09-04 23:35 ` [patch 03/19] mm: slub: fix conversion of freelist_corrupted() Andrew Morton
2020-09-04 23:35 ` [patch 04/19] MAINTAINERS: update Cavium/Marvell entries Andrew Morton
2020-09-04 23:35 ` [patch 05/19] MAINTAINERS: add LLVM maintainers Andrew Morton
2020-09-04 23:35 ` [patch 06/19] MAINTAINERS: IA64: mark Status as Odd Fixes only Andrew Morton
2020-09-04 23:35 ` [patch 07/19] mm: track page table modifications in __apply_to_page_range() Andrew Morton
2020-09-04 23:35 ` [patch 08/19] ipc: adjust proc_ipc_sem_dointvec definition to match prototype Andrew Morton
2020-09-04 23:35 ` [patch 09/19] fork: adjust sysctl_max_threads " Andrew Morton
2020-09-04 23:35 ` [patch 10/19] checkpatch: fix the usage of capture group ( ... ) Andrew Morton
2020-09-04 23:35 ` [patch 11/19] mm: madvise: fix vma user-after-free Andrew Morton
2020-09-04 23:35 ` [patch 12/19] mm/migrate: fixup setting UFFD_WP flag Andrew Morton
2020-09-04 23:36 ` [patch 13/19] mm/rmap: fixup copying of soft dirty and uffd ptes Andrew Morton
2020-09-04 23:36 ` [patch 14/19] mm/migrate: remove unnecessary is_zone_device_page() check Andrew Morton
2020-09-04 23:36 ` [patch 15/19] mm/migrate: preserve soft dirty in remove_migration_pte() Andrew Morton
2020-09-04 23:36 ` [patch 16/19] mm/hugetlb: try preferred node first when alloc gigantic page from cma Andrew Morton
2020-09-04 23:36 ` [patch 17/19] mm/hugetlb: fix a race between hugetlb sysctl handlers Andrew Morton
2020-09-04 23:36 ` [patch 18/19] mm/khugepaged.c: fix khugepaged's request size in collapse_file Andrew Morton
2020-09-04 23:36 ` [patch 19/19] include/linux/log2.h: add missing () around n in roundup_pow_of_two() Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).