* incoming
@ 2020-09-04 23:34 Andrew Morton
2020-09-04 23:35 ` [patch 01/19] memcg: fix use-after-free in uncharge_batch Andrew Morton
` (18 more replies)
0 siblings, 19 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
19 patches, based on 59126901f200f5fc907153468b03c64e0081b6e6.
Subsystems affected by this patch series:
mm/memcg
mm/slub
MAINTAINERS
mm/pagemap
ipc
fork
checkpatch
mm/madvise
mm/migration
mm/hugetlb
lib
Subsystem: mm/memcg
Michal Hocko <mhocko@suse.com>:
memcg: fix use-after-free in uncharge_batch
Xunlei Pang <xlpang@linux.alibaba.com>:
mm: memcg: fix memcg reclaim soft lockup
Subsystem: mm/slub
Eugeniu Rosca <erosca@de.adit-jv.com>:
mm: slub: fix conversion of freelist_corrupted()
Subsystem: MAINTAINERS
Robert Richter <rric@kernel.org>:
MAINTAINERS: update Cavium/Marvell entries
Nick Desaulniers <ndesaulniers@google.com>:
MAINTAINERS: add LLVM maintainers
Randy Dunlap <rdunlap@infradead.org>:
MAINTAINERS: IA64: mark Status as Odd Fixes only
Subsystem: mm/pagemap
Joerg Roedel <jroedel@suse.de>:
mm: track page table modifications in __apply_to_page_range()
Subsystem: ipc
Tobias Klauser <tklauser@distanz.ch>:
ipc: adjust proc_ipc_sem_dointvec definition to match prototype
Subsystem: fork
Tobias Klauser <tklauser@distanz.ch>:
fork: adjust sysctl_max_threads definition to match prototype
Subsystem: checkpatch
Mrinal Pandey <mrinalmni@gmail.com>:
checkpatch: fix the usage of capture group ( ... )
Subsystem: mm/madvise
Yang Shi <shy828301@gmail.com>:
mm: madvise: fix vma user-after-free
Subsystem: mm/migration
Alistair Popple <alistair@popple.id.au>:
mm/migrate: fixup setting UFFD_WP flag
mm/rmap: fixup copying of soft dirty and uffd ptes
Ralph Campbell <rcampbell@nvidia.com>:
Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()":
mm/migrate: remove unnecessary is_zone_device_page() check
mm/migrate: preserve soft dirty in remove_migration_pte()
Subsystem: mm/hugetlb
Li Xinhai <lixinhai.lxh@gmail.com>:
mm/hugetlb: try preferred node first when alloc gigantic page from cma
Muchun Song <songmuchun@bytedance.com>:
mm/hugetlb: fix a race between hugetlb sysctl handlers
David Howells <dhowells@redhat.com>:
mm/khugepaged.c: fix khugepaged's request size in collapse_file
Subsystem: lib
Jason Gunthorpe <jgg@nvidia.com>:
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
MAINTAINERS | 32 ++++++++++++++++----------------
include/linux/log2.h | 2 +-
ipc/ipc_sysctl.c | 2 +-
kernel/fork.c | 2 +-
mm/hugetlb.c | 49 +++++++++++++++++++++++++++++++++++++------------
mm/khugepaged.c | 2 +-
mm/madvise.c | 2 +-
mm/memcontrol.c | 6 ++++++
mm/memory.c | 37 ++++++++++++++++++++++++-------------
mm/migrate.c | 31 +++++++++++++++++++------------
mm/rmap.c | 9 +++++++--
mm/slub.c | 12 ++++++------
mm/vmscan.c | 8 ++++++++
scripts/checkpatch.pl | 4 ++--
14 files changed, 130 insertions(+), 68 deletions(-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 01/19] memcg: fix use-after-free in uncharge_batch
2020-09-04 23:34 incoming Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 02/19] mm: memcg: fix memcg reclaim soft lockup Andrew Morton
` (17 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, guro, hannes, hughd, linux-mm, mhocko, mm-commits,
shakeelb, torvalds
From: Michal Hocko <mhocko@suse.com>
Subject: memcg: fix use-after-free in uncharge_batch
syzbot has reported an use-after-free in the uncharge_batch path
BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline]
BUG: KASAN: use-after-free in atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
BUG: KASAN: use-after-free in atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
BUG: KASAN: use-after-free in page_counter_cancel mm/page_counter.c:54 [inline]
BUG: KASAN: use-after-free in page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
Write of size 8 at addr ffff8880371c0148 by task syz-executor.0/9304
CPU: 0 PID: 9304 Comm: syz-executor.0 Not tainted 5.8.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1f0/0x31e lib/dump_stack.c:118
print_address_description+0x66/0x620 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report+0x132/0x1d0 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:183 [inline]
check_memory_region+0x2b5/0x2f0 mm/kasan/generic.c:192
instrument_atomic_write include/linux/instrumented.h:71 [inline]
atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
page_counter_cancel mm/page_counter.c:54 [inline]
page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
uncharge_batch+0x6c/0x350 mm/memcontrol.c:6764
uncharge_page+0x115/0x430 mm/memcontrol.c:6796
uncharge_list mm/memcontrol.c:6835 [inline]
mem_cgroup_uncharge_list+0x70/0xe0 mm/memcontrol.c:6877
release_pages+0x13a2/0x1550 mm/swap.c:911
tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
tlb_flush_mmu+0x780/0x910 mm/mmu_gather.c:249
tlb_finish_mmu+0xcb/0x200 mm/mmu_gather.c:328
exit_mmap+0x296/0x550 mm/mmap.c:3185
__mmput+0x113/0x370 kernel/fork.c:1076
exit_mm+0x4cd/0x550 kernel/exit.c:483
do_exit+0x576/0x1f20 kernel/exit.c:793
do_group_exit+0x161/0x2d0 kernel/exit.c:903
get_signal+0x139b/0x1d30 kernel/signal.c:2743
arch_do_signal+0x33/0x610 arch/x86/kernel/signal.c:811
exit_to_user_mode_loop kernel/entry/common.c:135 [inline]
exit_to_user_mode_prepare+0x8d/0x1b0 kernel/entry/common.c:166
syscall_exit_to_user_mode+0x5e/0x1a0 kernel/entry/common.c:241
entry_SYSCALL_64_after_hwframe+0x44/0xa9
1a3e1f40962c ("mm: memcontrol: decouple reference counting from page
accounting") has reworked the memcg lifetime to be bound the the struct
page rather than charges. It has also removed the css_put_many from
uncharge_batch and that is causing the above splat. uncharge_batch is
supposed to uncharge accumulated charges for all pages freed from the same
memcg. The queuing is done by uncharge_page which however drops the memcg
reference after it adds charges to the batch. If the current page happens
to be the last one holding the reference for its memcg then the memcg is
OK to go and the next page to be freed will trigger batched uncharge which
needs to access the memcg which is gone already.
Fix the issue by taking a reference for the memcg in the current batch.
Link: https://lkml.kernel.org/r/20200820090341.GC5033@dhcp22.suse.cz
Fixes: 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: syzbot+b305848212deec86eabe@syzkaller.appspotmail.com
Reported-by: syzbot+b5ea6fb6f139c8b9482b@syzkaller.appspotmail.com
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/mm/memcontrol.c~memcg-fix-use-after-free-in-uncharge_batch
+++ a/mm/memcontrol.c
@@ -6774,6 +6774,9 @@ static void uncharge_batch(const struct
__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
memcg_check_events(ug->memcg, ug->dummy_page);
local_irq_restore(flags);
+
+ /* drop reference from uncharge_page */
+ css_put(&ug->memcg->css);
}
static void uncharge_page(struct page *page, struct uncharge_gather *ug)
@@ -6797,6 +6800,9 @@ static void uncharge_page(struct page *p
uncharge_gather_clear(ug);
}
ug->memcg = page->mem_cgroup;
+
+ /* pairs with css_put in uncharge_batch */
+ css_get(&ug->memcg->css);
}
nr_pages = compound_nr(page);
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 02/19] mm: memcg: fix memcg reclaim soft lockup
2020-09-04 23:34 incoming Andrew Morton
2020-09-04 23:35 ` [patch 01/19] memcg: fix use-after-free in uncharge_batch Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 03/19] mm: slub: fix conversion of freelist_corrupted() Andrew Morton
` (16 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, chris, hannes, linux-mm, mhocko, mm-commits, torvalds, xlpang
From: Xunlei Pang <xlpang@linux.alibaba.com>
Subject: mm: memcg: fix memcg reclaim soft lockup
We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg
doesn't have any reclaimable memory.
It can be easily reproduced as below:
watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
Call Trace:
shrink_lruvec+0x49f/0x640
shrink_node+0x2a6/0x6f0
do_try_to_free_pages+0xe9/0x3e0
try_to_free_mem_cgroup_pages+0xef/0x1f0
try_charge+0x2c1/0x750
mem_cgroup_charge+0xd7/0x240
__add_to_page_cache_locked+0x2fd/0x370
add_to_page_cache_lru+0x4a/0xc0
pagecache_get_page+0x10b/0x2f0
filemap_fault+0x661/0xad0
ext4_filemap_fault+0x2c/0x40
__do_fault+0x4d/0xf9
handle_mm_fault+0x1080/0x1790
It only happens on our 1-vcpu instances, because there's no chance for oom
reaper to run to reclaim the to-be-killed process.
Add a cond_resched() at the upper shrink_node_memcgs() to solve this
issue, this will mean that we will get a scheduling point for each memcg
in the reclaimed hierarchy without any dependency on the reclaimable
memory in that memcg thus making it more predictable.
Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmscan.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/vmscan.c~mm-memcg-fix-memcg-reclaim-soft-lockup
+++ a/mm/vmscan.c
@@ -2615,6 +2615,14 @@ static void shrink_node_memcgs(pg_data_t
unsigned long reclaimed;
unsigned long scanned;
+ /*
+ * This loop can become CPU-bound when target memcgs
+ * aren't eligible for reclaim - either because they
+ * don't have any reclaimable pages, or because their
+ * memory is explicitly protected. Avoid soft lockups.
+ */
+ cond_resched();
+
mem_cgroup_calculate_protection(target_memcg, memcg);
if (mem_cgroup_below_min(memcg)) {
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 03/19] mm: slub: fix conversion of freelist_corrupted()
2020-09-04 23:34 incoming Andrew Morton
2020-09-04 23:35 ` [patch 01/19] memcg: fix use-after-free in uncharge_batch Andrew Morton
2020-09-04 23:35 ` [patch 02/19] mm: memcg: fix memcg reclaim soft lockup Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 04/19] MAINTAINERS: update Cavium/Marvell entries Andrew Morton
` (15 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, cl, dongli.zhang, erosca, iamjoonsoo.kim, joe.jin,
linux-mm, mm-commits, penberg, rientjes, stable, torvalds
From: Eugeniu Rosca <erosca@de.adit-jv.com>
Subject: mm: slub: fix conversion of freelist_corrupted()
Commit 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in
deactivate_slab()") suffered an update when picked up from LKML [1].
Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()'
created a no-op statement. Fix it by sticking to the behavior intended in
the original patch [1]. In addition, make freelist_corrupted() immune to
passing NULL instead of &freelist.
The issue has been spotted via static analysis and code review.
[1] https://lore.kernel.org/linux-mm/20200331031450.12182-1-dongli.zhang@oracle.com/
Link: https://lkml.kernel.org/r/20200824130643.10291-1-erosca@de.adit-jv.com
Fixes: 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in deactivate_slab()")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/slub.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/mm/slub.c~mm-slub-fix-conversion-of-freelist_corrupted
+++ a/mm/slub.c
@@ -672,12 +672,12 @@ static void slab_fix(struct kmem_cache *
}
static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
- void *freelist, void *nextfree)
+ void **freelist, void *nextfree)
{
if ((s->flags & SLAB_CONSISTENCY_CHECKS) &&
- !check_valid_pointer(s, page, nextfree)) {
- object_err(s, page, freelist, "Freechain corrupt");
- freelist = NULL;
+ !check_valid_pointer(s, page, nextfree) && freelist) {
+ object_err(s, page, *freelist, "Freechain corrupt");
+ *freelist = NULL;
slab_fix(s, "Isolate corrupted freechain");
return true;
}
@@ -1494,7 +1494,7 @@ static inline void dec_slabs_node(struct
int objects) {}
static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
- void *freelist, void *nextfree)
+ void **freelist, void *nextfree)
{
return false;
}
@@ -2184,7 +2184,7 @@ static void deactivate_slab(struct kmem_
* 'freelist' is already corrupted. So isolate all objects
* starting at 'freelist'.
*/
- if (freelist_corrupted(s, page, freelist, nextfree))
+ if (freelist_corrupted(s, page, &freelist, nextfree))
break;
do {
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 04/19] MAINTAINERS: update Cavium/Marvell entries
2020-09-04 23:34 incoming Andrew Morton
` (2 preceding siblings ...)
2020-09-04 23:35 ` [patch 03/19] mm: slub: fix conversion of freelist_corrupted() Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 05/19] MAINTAINERS: add LLVM maintainers Andrew Morton
` (14 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, arnd, bp, gkulkarni, linux-mm, maz, mm-commits, rric,
sgoutham, torvalds, wsa
From: Robert Richter <rric@kernel.org>
Subject: MAINTAINERS: update Cavium/Marvell entries
I am leaving Marvell and already do not have access to my @marvell.com
email address. So switching over to my korg mail address or removing my
address there another maintainer is already listed. For the entries there
no other maintainer is listed I will keep looking into patches for Cavium
systems for a while until someone from Marvell takes it over. Since I
might have limited access to hardware and also limited time I changed
state to 'Odd Fixes' for those entries.
Link: https://lkml.kernel.org/r/20200824122050.31164-1-rric@kernel.org
Signed-off-by: Robert Richter <rric@kernel.org>
Cc: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Cc: Sunil Goutham <sgoutham@marvell.com>
CC: Borislav Petkov <bp@alien8.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Wolfram Sang <wsa@kernel.org>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
MAINTAINERS | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)
--- a/MAINTAINERS~maintainers-update-cavium-marvell-entries
+++ a/MAINTAINERS
@@ -1694,7 +1694,6 @@ F: arch/arm/mach-cns3xxx/
ARM/CAVIUM THUNDER NETWORK DRIVER
M: Sunil Goutham <sgoutham@marvell.com>
-M: Robert Richter <rrichter@marvell.com>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Supported
F: drivers/net/ethernet/cavium/thunder/
@@ -3948,8 +3947,8 @@ W: https://wireless.wiki.kernel.org/en/u
F: drivers/net/wireless/ath/carl9170/
CAVIUM I2C DRIVER
-M: Robert Richter <rrichter@marvell.com>
-S: Supported
+M: Robert Richter <rric@kernel.org>
+S: Odd Fixes
W: http://www.marvell.com
F: drivers/i2c/busses/i2c-octeon*
F: drivers/i2c/busses/i2c-thunderx*
@@ -3964,8 +3963,8 @@ W: http://www.marvell.com
F: drivers/net/ethernet/cavium/liquidio/
CAVIUM MMC DRIVER
-M: Robert Richter <rrichter@marvell.com>
-S: Supported
+M: Robert Richter <rric@kernel.org>
+S: Odd Fixes
W: http://www.marvell.com
F: drivers/mmc/host/cavium*
@@ -3977,9 +3976,9 @@ W: http://www.marvell.com
F: drivers/crypto/cavium/cpt/
CAVIUM THUNDERX2 ARM64 SOC
-M: Robert Richter <rrichter@marvell.com>
+M: Robert Richter <rric@kernel.org>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-S: Maintained
+S: Odd Fixes
F: Documentation/devicetree/bindings/arm/cavium-thunder2.txt
F: arch/arm64/boot/dts/cavium/thunder2-99xx*
@@ -6191,16 +6190,15 @@ F: drivers/edac/highbank*
EDAC-CAVIUM OCTEON
M: Ralf Baechle <ralf@linux-mips.org>
-M: Robert Richter <rrichter@marvell.com>
L: linux-edac@vger.kernel.org
L: linux-mips@vger.kernel.org
S: Supported
F: drivers/edac/octeon_edac*
EDAC-CAVIUM THUNDERX
-M: Robert Richter <rrichter@marvell.com>
+M: Robert Richter <rric@kernel.org>
L: linux-edac@vger.kernel.org
-S: Supported
+S: Odd Fixes
F: drivers/edac/thunderx_edac*
EDAC-CORE
@@ -6208,7 +6206,7 @@ M: Borislav Petkov <bp@alien8.de>
M: Mauro Carvalho Chehab <mchehab@kernel.org>
M: Tony Luck <tony.luck@intel.com>
R: James Morse <james.morse@arm.com>
-R: Robert Richter <rrichter@marvell.com>
+R: Robert Richter <rric@kernel.org>
L: linux-edac@vger.kernel.org
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next
@@ -13446,10 +13444,10 @@ F: Documentation/devicetree/bindings/pci
F: drivers/pci/controller/dwc/*artpec*
PCIE DRIVER FOR CAVIUM THUNDERX
-M: Robert Richter <rrichter@marvell.com>
+M: Robert Richter <rric@kernel.org>
L: linux-pci@vger.kernel.org
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-S: Supported
+S: Odd Fixes
F: drivers/pci/controller/pci-thunder-*
PCIE DRIVER FOR HISILICON
@@ -17237,8 +17235,8 @@ S: Maintained
F: drivers/net/thunderbolt.c
THUNDERX GPIO DRIVER
-M: Robert Richter <rrichter@marvell.com>
-S: Maintained
+M: Robert Richter <rric@kernel.org>
+S: Odd Fixes
F: drivers/gpio/gpio-thunderx.c
TI AM437X VPFE DRIVER
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 05/19] MAINTAINERS: add LLVM maintainers
2020-09-04 23:34 incoming Andrew Morton
` (3 preceding siblings ...)
2020-09-04 23:35 ` [patch 04/19] MAINTAINERS: update Cavium/Marvell entries Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-05 17:25 ` Masahiro Yamada
2020-09-04 23:35 ` [patch 06/19] MAINTAINERS: IA64: mark Status as Odd Fixes only Andrew Morton
` (13 subsequent siblings)
18 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, linux-mm, lukas.bulwahn, masahiroy, miguel.ojeda.sandonis,
mm-commits, natechancellor, ndesaulniers, sedat.dilek, torvalds
From: Nick Desaulniers <ndesaulniers@google.com>
Subject: MAINTAINERS: add LLVM maintainers
Nominate Nathan and myself to be point of contact for clang/LLVM related
support, after a poll at the LLVM BoF at Linux Plumbers Conf 2020.
While corporate sponsorship is beneficial, its important to not entrust
the keys to the nukes with any one entity. Should Nathan and I find
ourselves at the same employer, I would gladly step down.
Link: https://lkml.kernel.org/r/20200825143540.2948637-1-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Masahiro Yamada <masahiroy@kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)
--- a/MAINTAINERS~maintainers-add-llvm-maintainers
+++ a/MAINTAINERS
@@ -4257,6 +4257,8 @@ S: Maintained
F: .clang-format
CLANG/LLVM BUILD SUPPORT
+M: Nathan Chancellor <natechancellor@gmail.com>
+M: Nick Desaulniers <ndesaulniers@google.com>
L: clang-built-linux@googlegroups.com
S: Supported
W: https://clangbuiltlinux.github.io/
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 06/19] MAINTAINERS: IA64: mark Status as Odd Fixes only
2020-09-04 23:34 incoming Andrew Morton
` (4 preceding siblings ...)
2020-09-04 23:35 ` [patch 05/19] MAINTAINERS: add LLVM maintainers Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 07/19] mm: track page table modifications in __apply_to_page_range() Andrew Morton
` (12 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, fenghua.yu, linux-mm, mm-commits, rdunlap, tony.luck, torvalds
From: Randy Dunlap <rdunlap@infradead.org>
Subject: MAINTAINERS: IA64: mark Status as Odd Fixes only
IA64 isn't really being maintained, so mark it as Odd Fixes only.
Link: http://lkml.kernel.org/r/7e719139-450f-52c2-59a2-7964a34eda1f@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/MAINTAINERS~maintainers-ia64-mark-status-as-odd-fixes-only
+++ a/MAINTAINERS
@@ -8272,7 +8272,7 @@ IA64 (Itanium) PLATFORM
M: Tony Luck <tony.luck@intel.com>
M: Fenghua Yu <fenghua.yu@intel.com>
L: linux-ia64@vger.kernel.org
-S: Maintained
+S: Odd Fixes
T: git git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
F: Documentation/ia64/
F: arch/ia64/
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 07/19] mm: track page table modifications in __apply_to_page_range()
2020-09-04 23:34 incoming Andrew Morton
` (5 preceding siblings ...)
2020-09-04 23:35 ` [patch 06/19] MAINTAINERS: IA64: mark Status as Odd Fixes only Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 08/19] ipc: adjust proc_ipc_sem_dointvec definition to match prototype Andrew Morton
` (11 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, chris, jroedel, linux-mm, mm-commits, pavel, sfr, stable, torvalds
From: Joerg Roedel <jroedel@suse.de>
Subject: mm: track page table modifications in __apply_to_page_range()
__apply_to_page_range() is also used to change and/or allocate page-table
pages in the vmalloc area of the address space. Make sure these changes
get synchronized to other page-tables in the system by calling
arch_sync_kernel_mappings() when necessary.
The impact appears limited to x86-32, where apply_to_page_range may miss
updating the PMD. That leads to explosions in drivers like
[ 24.227844] BUG: unable to handle page fault for address: fe036000
[ 24.228076] #PF: supervisor write access in kernel mode
[ 24.228294] #PF: error_code(0x0002) - not-present page
[ 24.228494] *pde = 00000000
[ 24.228640] Oops: 0002 [#1] SMP
[ 24.228788] CPU: 3 PID: 1300 Comm: gem_concurrent_ Not tainted 5.9.0-rc1+ #16
[ 24.228957] Hardware name: /NUC6i3SYB, BIOS SYSKLi35.86A.0024.2015.1027.2142 10/27/2015
[ 24.229297] EIP: __execlists_context_alloc+0x132/0x2d0 [i915]
[ 24.229462] Code: 31 d2 89 f0 e8 2f 55 02 00 89 45 e8 3d 00 f0 ff ff 0f 87 11 01 00 00 8b 4d e8 03 4b 30 b8 5a 5a 5a 5a ba 01 00 00 00 8d 79 04 <c7> 01 5a 5a 5a 5a c7 81 fc 0f 00 00 5a 5a 5a 5a 83 e7 fc 29 f9 81
[ 24.229759] EAX: 5a5a5a5a EBX: f60ca000 ECX: fe036000 EDX: 00000001
[ 24.229915] ESI: f43b7340 EDI: fe036004 EBP: f6389cb8 ESP: f6389c9c
[ 24.230072] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010286
[ 24.230229] CR0: 80050033 CR2: fe036000 CR3: 2d361000 CR4: 001506d0
[ 24.230385] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 24.230539] DR6: fffe0ff0 DR7: 00000400
[ 24.230675] Call Trace:
[ 24.230957] execlists_context_alloc+0x10/0x20 [i915]
[ 24.231266] intel_context_alloc_state+0x3f/0x70 [i915]
[ 24.231547] __intel_context_do_pin+0x117/0x170 [i915]
[ 24.231850] i915_gem_do_execbuffer+0xcc7/0x2500 [i915]
[ 24.232024] ? __kmalloc_track_caller+0x54/0x230
[ 24.232181] ? ktime_get+0x3e/0x120
[ 24.232333] ? dma_fence_signal+0x34/0x50
[ 24.232617] i915_gem_execbuffer2_ioctl+0xcd/0x1f0 [i915]
[ 24.232912] ? i915_gem_execbuffer_ioctl+0x2e0/0x2e0 [i915]
[ 24.233084] drm_ioctl_kernel+0x8f/0xd0
[ 24.233236] drm_ioctl+0x223/0x3d0
[ 24.233505] ? i915_gem_execbuffer_ioctl+0x2e0/0x2e0 [i915]
[ 24.233684] ? pick_next_task_fair+0x1b5/0x3d0
[ 24.233873] ? __switch_to_asm+0x36/0x50
[ 24.234021] ? drm_ioctl_kernel+0xd0/0xd0
[ 24.234167] __ia32_sys_ioctl+0x1ab/0x760
[ 24.234313] ? exit_to_user_mode_prepare+0xe5/0x110
[ 24.234453] ? syscall_exit_to_user_mode+0x23/0x130
[ 24.234601] __do_fast_syscall_32+0x3f/0x70
[ 24.234744] do_fast_syscall_32+0x29/0x60
[ 24.234885] do_SYSENTER_32+0x15/0x20
[ 24.235021] entry_SYSENTER_32+0x9f/0xf2
[ 24.235157] EIP: 0xb7f28559
[ 24.235288] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
[ 24.235576] EAX: ffffffda EBX: 00000005 ECX: c0406469 EDX: bf95556c
[ 24.235722] ESI: b7e68000 EDI: c0406469 EBP: 00000005 ESP: bf9554d8
[ 24.235869] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
[ 24.236018] Modules linked in: i915 x86_pkg_temp_thermal intel_powerclamp crc32_pclmul crc32c_intel intel_cstate intel_uncore intel_gtt drm_kms_helper intel_pch_thermal video button autofs4 i2c_i801 i2c_smbus fan
[ 24.236336] CR2: 00000000fe036000
It looks like kasan, xen and i915 are vulnerable.
Actual impact is "on thinkpad X60 in 5.9-rc1, screen starts blinking after
30-or-so minutes, and machine is unusable"
[sfr@canb.auug.org.au: ARCH_PAGE_TABLE_SYNC_MASK needs vmalloc.h]
Link: https://lkml.kernel.org/r/20200825172508.16800a4f@canb.auug.org.au
[chris@chris-wilson.co.uk: changelog addition]
[pavel@ucw.cz: changelog addition]
Link: https://lkml.kernel.org/r/20200821123746.16904-1-joro@8bytes.org
Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
Fixes: 86cf69f1d893 ("x86/mm/32: implement arch_sync_kernel_mappings()")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk> [x86-32]
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: <stable@vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 37 ++++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 13 deletions(-)
--- a/mm/memory.c~mm-track-page-table-modifications-in-__apply_to_page_range
+++ a/mm/memory.c
@@ -73,6 +73,7 @@
#include <linux/numa.h>
#include <linux/perf_event.h>
#include <linux/ptrace.h>
+#include <linux/vmalloc.h>
#include <trace/events/kmem.h>
@@ -83,6 +84,7 @@
#include <asm/tlb.h>
#include <asm/tlbflush.h>
+#include "pgalloc-track.h"
#include "internal.h"
#if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST)
@@ -2206,7 +2208,8 @@ EXPORT_SYMBOL(vm_iomap_memory);
static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
unsigned long addr, unsigned long end,
- pte_fn_t fn, void *data, bool create)
+ pte_fn_t fn, void *data, bool create,
+ pgtbl_mod_mask *mask)
{
pte_t *pte;
int err = 0;
@@ -2214,7 +2217,7 @@ static int apply_to_pte_range(struct mm_
if (create) {
pte = (mm == &init_mm) ?
- pte_alloc_kernel(pmd, addr) :
+ pte_alloc_kernel_track(pmd, addr, mask) :
pte_alloc_map_lock(mm, pmd, addr, &ptl);
if (!pte)
return -ENOMEM;
@@ -2235,6 +2238,7 @@ static int apply_to_pte_range(struct mm_
break;
}
} while (addr += PAGE_SIZE, addr != end);
+ *mask |= PGTBL_PTE_MODIFIED;
arch_leave_lazy_mmu_mode();
@@ -2245,7 +2249,8 @@ static int apply_to_pte_range(struct mm_
static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud,
unsigned long addr, unsigned long end,
- pte_fn_t fn, void *data, bool create)
+ pte_fn_t fn, void *data, bool create,
+ pgtbl_mod_mask *mask)
{
pmd_t *pmd;
unsigned long next;
@@ -2254,7 +2259,7 @@ static int apply_to_pmd_range(struct mm_
BUG_ON(pud_huge(*pud));
if (create) {
- pmd = pmd_alloc(mm, pud, addr);
+ pmd = pmd_alloc_track(mm, pud, addr, mask);
if (!pmd)
return -ENOMEM;
} else {
@@ -2264,7 +2269,7 @@ static int apply_to_pmd_range(struct mm_
next = pmd_addr_end(addr, end);
if (create || !pmd_none_or_clear_bad(pmd)) {
err = apply_to_pte_range(mm, pmd, addr, next, fn, data,
- create);
+ create, mask);
if (err)
break;
}
@@ -2274,14 +2279,15 @@ static int apply_to_pmd_range(struct mm_
static int apply_to_pud_range(struct mm_struct *mm, p4d_t *p4d,
unsigned long addr, unsigned long end,
- pte_fn_t fn, void *data, bool create)
+ pte_fn_t fn, void *data, bool create,
+ pgtbl_mod_mask *mask)
{
pud_t *pud;
unsigned long next;
int err = 0;
if (create) {
- pud = pud_alloc(mm, p4d, addr);
+ pud = pud_alloc_track(mm, p4d, addr, mask);
if (!pud)
return -ENOMEM;
} else {
@@ -2291,7 +2297,7 @@ static int apply_to_pud_range(struct mm_
next = pud_addr_end(addr, end);
if (create || !pud_none_or_clear_bad(pud)) {
err = apply_to_pmd_range(mm, pud, addr, next, fn, data,
- create);
+ create, mask);
if (err)
break;
}
@@ -2301,14 +2307,15 @@ static int apply_to_pud_range(struct mm_
static int apply_to_p4d_range(struct mm_struct *mm, pgd_t *pgd,
unsigned long addr, unsigned long end,
- pte_fn_t fn, void *data, bool create)
+ pte_fn_t fn, void *data, bool create,
+ pgtbl_mod_mask *mask)
{
p4d_t *p4d;
unsigned long next;
int err = 0;
if (create) {
- p4d = p4d_alloc(mm, pgd, addr);
+ p4d = p4d_alloc_track(mm, pgd, addr, mask);
if (!p4d)
return -ENOMEM;
} else {
@@ -2318,7 +2325,7 @@ static int apply_to_p4d_range(struct mm_
next = p4d_addr_end(addr, end);
if (create || !p4d_none_or_clear_bad(p4d)) {
err = apply_to_pud_range(mm, p4d, addr, next, fn, data,
- create);
+ create, mask);
if (err)
break;
}
@@ -2331,8 +2338,9 @@ static int __apply_to_page_range(struct
void *data, bool create)
{
pgd_t *pgd;
- unsigned long next;
+ unsigned long start = addr, next;
unsigned long end = addr + size;
+ pgtbl_mod_mask mask = 0;
int err = 0;
if (WARN_ON(addr >= end))
@@ -2343,11 +2351,14 @@ static int __apply_to_page_range(struct
next = pgd_addr_end(addr, end);
if (!create && pgd_none_or_clear_bad(pgd))
continue;
- err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create);
+ err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create, &mask);
if (err)
break;
} while (pgd++, addr = next, addr != end);
+ if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
+ arch_sync_kernel_mappings(start, start + size);
+
return err;
}
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 08/19] ipc: adjust proc_ipc_sem_dointvec definition to match prototype
2020-09-04 23:34 incoming Andrew Morton
` (6 preceding siblings ...)
2020-09-04 23:35 ` [patch 07/19] mm: track page table modifications in __apply_to_page_range() Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 09/19] fork: adjust sysctl_max_threads " Andrew Morton
` (10 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, hch, linux-mm, mm-commits, tklauser, torvalds, viro
From: Tobias Klauser <tklauser@distanz.ch>
Subject: ipc: adjust proc_ipc_sem_dointvec definition to match prototype
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of proc_ipc_sem_dointvec to match ctl_table.proc_handler which
fixes the following sparse error/warning:
ipc/ipc_sysctl.c:94:47: warning: incorrect type in argument 3 (different address spaces)
ipc/ipc_sysctl.c:94:47: expected void *buffer
ipc/ipc_sysctl.c:94:47: got void [noderef] __user *buffer
ipc/ipc_sysctl.c:194:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces))
ipc/ipc_sysctl.c:194:35: expected int ( [usertype] *proc_handler )( ... )
ipc/ipc_sysctl.c:194:35: got int ( * )( ... )
Link: https://lkml.kernel.org/r/20200825105846.5193-1-tklauser@distanz.ch
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
ipc/ipc_sysctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/ipc/ipc_sysctl.c~ipc-adjust-proc_ipc_sem_dointvec-definition-to-match-prototype
+++ a/ipc/ipc_sysctl.c
@@ -85,7 +85,7 @@ static int proc_ipc_auto_msgmni(struct c
}
static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
- void __user *buffer, size_t *lenp, loff_t *ppos)
+ void *buffer, size_t *lenp, loff_t *ppos)
{
int ret, semmni;
struct ipc_namespace *ns = current->nsproxy->ipc_ns;
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 09/19] fork: adjust sysctl_max_threads definition to match prototype
2020-09-04 23:34 incoming Andrew Morton
` (7 preceding siblings ...)
2020-09-04 23:35 ` [patch 08/19] ipc: adjust proc_ipc_sem_dointvec definition to match prototype Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 10/19] checkpatch: fix the usage of capture group ( ... ) Andrew Morton
` (9 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, hch, linux-mm, mm-commits, tklauser, torvalds, viro
From: Tobias Klauser <tklauser@distanz.ch>
Subject: fork: adjust sysctl_max_threads definition to match prototype
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
definition of sysctl_max_threads to match its prototype in linux/sysctl.h
which fixes the following sparse error/warning:
kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
kernel/fork.c:3050:47: expected void *
kernel/fork.c:3050:47: got void [noderef] __user *buffer
kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
kernel/fork.c:3036:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
./include/linux/sysctl.h:242:5: note: previously declared as:
./include/linux/sysctl.h:242:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
kernel/fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/fork.c~fork-adjust-sysctl_max_threads-definition-to-match-prototype
+++ a/kernel/fork.c
@@ -3014,7 +3014,7 @@ int unshare_files(struct files_struct **
}
int sysctl_max_threads(struct ctl_table *table, int write,
- void __user *buffer, size_t *lenp, loff_t *ppos)
+ void *buffer, size_t *lenp, loff_t *ppos)
{
struct ctl_table t;
int ret;
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 10/19] checkpatch: fix the usage of capture group ( ... )
2020-09-04 23:34 incoming Andrew Morton
` (8 preceding siblings ...)
2020-09-04 23:35 ` [patch 09/19] fork: adjust sysctl_max_threads " Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 11/19] mm: madvise: fix vma user-after-free Andrew Morton
` (8 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, joe, linux-mm, lukas.bulwahn, mm-commits, mrinalmni, torvalds
From: Mrinal Pandey <mrinalmni@gmail.com>
Subject: checkpatch: fix the usage of capture group ( ... )
The usage of "capture group (...)" in the immediate condition after `&&`
results in `$1` being uninitialized. This issues a warning "Use of
uninitialized value $1 in regexp compilation at ./scripts/checkpatch.pl
line 2638".
I noticed this bug while running checkpatch on the set of commits from
v5.7 to v5.8-rc1 of the kernel on the commits with a diff content in their
commit message.
This bug was introduced in the script by commit e518e9a59ec3 ("checkpatch:
emit an error when there's a diff in a changelog"). It has been in the
script since then.
The author intended to store the match made by capture group in variable
`$1`. This should have contained the name of the file as `[\w/]+`
matched. However, this couldn't be accomplished due to usage of capture
group and `$1` in the same regular expression.
Fix this by placing the capture group in the condition before `&&`. Thus,
`$1` can be initialized to the text that capture group matches thereby
setting it to the desired and required value.
Link: https://lkml.kernel.org/r/20200714032352.f476hanaj2dlmiot@mrinalpandey
Fixes: e518e9a59ec3 ("checkpatch: emit an error when there's a diff in a changelog")
Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
scripts/checkpatch.pl | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/scripts/checkpatch.pl~checkpatch-fix-the-usage-of-capture-group
+++ a/scripts/checkpatch.pl
@@ -2639,8 +2639,8 @@ sub process {
# Check if the commit log has what seems like a diff which can confuse patch
if ($in_commit_log && !$commit_log_has_diff &&
- (($line =~ m@^\s+diff\b.*a/[\w/]+@ &&
- $line =~ m@^\s+diff\b.*a/([\w/]+)\s+b/$1\b@) ||
+ (($line =~ m@^\s+diff\b.*a/([\w/]+)@ &&
+ $line =~ m@^\s+diff\b.*a/[\w/]+\s+b/$1\b@) ||
$line =~ m@^\s*(?:\-\-\-\s+a/|\+\+\+\s+b/)@ ||
$line =~ m/^\s*\@\@ \-\d+,\d+ \+\d+,\d+ \@\@/)) {
ERROR("DIFF_IN_COMMIT_MSG",
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 11/19] mm: madvise: fix vma user-after-free
2020-09-04 23:34 incoming Andrew Morton
` (9 preceding siblings ...)
2020-09-04 23:35 ` [patch 10/19] checkpatch: fix the usage of capture group ( ... ) Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:35 ` [patch 12/19] mm/migrate: fixup setting UFFD_WP flag Andrew Morton
` (7 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, jack, linux-mm, mm-commits, shy828301, stable, torvalds
From: Yang Shi <shy828301@gmail.com>
Subject: mm: madvise: fix vma user-after-free
The syzbot reported the below use-after-free:
BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996
CPU: 0 PID: 9996 Comm: syz-executor.0 Not tainted 5.9.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
madvise_willneed mm/madvise.c:293 [inline]
madvise_vma mm/madvise.c:942 [inline]
do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
do_madvise mm/madvise.c:1169 [inline]
__do_sys_madvise mm/madvise.c:1171 [inline]
__se_sys_madvise mm/madvise.c:1169 [inline]
__x64_sys_madvise+0xd9/0x110 mm/madvise.c:1169
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45d4d9
Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f04f7464c78 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 0000000000020800 RCX: 000000000045d4d9
RDX: 0000000000000003 RSI: 0000000000600003 RDI: 0000000020000000
RBP: 000000000118d020 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cfec
R13: 00007ffc579cce7f R14: 00007f04f74659c0 R15: 000000000118cfec
Allocated by task 9992:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
slab_post_alloc_hook mm/slab.h:518 [inline]
slab_alloc mm/slab.c:3312 [inline]
kmem_cache_alloc+0x138/0x3a0 mm/slab.c:3482
vm_area_alloc+0x1c/0x110 kernel/fork.c:347
mmap_region+0x8e5/0x1780 mm/mmap.c:1743
do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
vm_mmap_pgoff+0x195/0x200 mm/util.c:506
ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 9992:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
__cache_free mm/slab.c:3418 [inline]
kmem_cache_free.part.0+0x67/0x1f0 mm/slab.c:3693
remove_vma+0x132/0x170 mm/mmap.c:184
remove_vma_list mm/mmap.c:2613 [inline]
__do_munmap+0x743/0x1170 mm/mmap.c:2869
do_munmap mm/mmap.c:2877 [inline]
mmap_region+0x257/0x1780 mm/mmap.c:1716
do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
vm_mmap_pgoff+0x195/0x200 mm/util.c:506
ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It is because vma is accessed after releasing mmap_lock, but someone else
acquired the mmap_lock and the vma is gone.
Releasing mmap_lock after accessing vma should fix the problem.
Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.com
Fixes: 692fe62433d4c ("mm: Handle MADV_WILLNEED through vfs_fadvise()")
Reported-by: syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/madvise.c~mm-madvise-fix-vma-user-after-free
+++ a/mm/madvise.c
@@ -289,9 +289,9 @@ static long madvise_willneed(struct vm_a
*/
*prev = NULL; /* tell sys_madvise we drop mmap_lock */
get_file(file);
- mmap_read_unlock(current->mm);
offset = (loff_t)(start - vma->vm_start)
+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ mmap_read_unlock(current->mm);
vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
fput(file);
mmap_read_lock(current->mm);
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 12/19] mm/migrate: fixup setting UFFD_WP flag
2020-09-04 23:34 incoming Andrew Morton
` (10 preceding siblings ...)
2020-09-04 23:35 ` [patch 11/19] mm: madvise: fix vma user-after-free Andrew Morton
@ 2020-09-04 23:35 ` Andrew Morton
2020-09-04 23:36 ` [patch 13/19] mm/rmap: fixup copying of soft dirty and uffd ptes Andrew Morton
` (6 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:35 UTC (permalink / raw)
To: akpm, alistair, jglisse, jhubbard, linux-mm, mm-commits, peterx,
rcampbell, torvalds
From: Alistair Popple <alistair@popple.id.au>
Subject: mm/migrate: fixup setting UFFD_WP flag
Commit f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration")
introduced support for tracking the uffd wp bit during page migration.
However the non-swap PTE variant was used to set the flag for zone device
private pages which are a type of swap page.
This leads to corruption of the swap offset if the original PTE has the
uffd_wp flag set.
Link: https://lkml.kernel.org/r/20200825064232.10023-1-alistair@popple.id.au
Fixes: f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-fixup-setting-uffd_wp-flag
+++ a/mm/migrate.c
@@ -251,7 +251,7 @@ static bool remove_migration_pte(struct
entry = make_device_private_entry(new, pte_write(pte));
pte = swp_entry_to_pte(entry);
if (pte_swp_uffd_wp(*pvmw.pte))
- pte = pte_mkuffd_wp(pte);
+ pte = pte_swp_mkuffd_wp(pte);
}
}
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 13/19] mm/rmap: fixup copying of soft dirty and uffd ptes
2020-09-04 23:34 incoming Andrew Morton
` (11 preceding siblings ...)
2020-09-04 23:35 ` [patch 12/19] mm/migrate: fixup setting UFFD_WP flag Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
2020-09-04 23:36 ` [patch 14/19] mm/migrate: remove unnecessary is_zone_device_page() check Andrew Morton
` (5 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
To: akpm, alistair, jglisse, jhubbard, linux-mm, mm-commits, peterx,
rcampbell, stable, torvalds
From: Alistair Popple <alistair@popple.id.au>
Subject: mm/rmap: fixup copying of soft dirty and uffd ptes
During memory migration a pte is temporarily replaced with a migration
swap pte. Some pte bits from the existing mapping such as the soft-dirty
and uffd write-protect bits are preserved by copying these to the
temporary migration swap pte.
However these bits are not stored at the same location for swap and
non-swap ptes. Therefore testing these bits requires using the
appropriate helper function for the given pte type.
Unfortunately several code locations were found where the wrong helper
function is being used to test soft_dirty and uffd_wp bits which leads to
them getting incorrectly set or cleared during page-migration.
Fix these by using the correct tests based on pte type.
Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.au
Fixes: a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 15 +++++++++++----
mm/rmap.c | 9 +++++++--
2 files changed, 18 insertions(+), 6 deletions(-)
--- a/mm/migrate.c~mm-rmap-fixup-copying-of-soft-dirty-and-uffd-ptes
+++ a/mm/migrate.c
@@ -2427,10 +2427,17 @@ again:
entry = make_migration_entry(page, mpfn &
MIGRATE_PFN_WRITE);
swp_pte = swp_entry_to_pte(entry);
- if (pte_soft_dirty(pte))
- swp_pte = pte_swp_mksoft_dirty(swp_pte);
- if (pte_uffd_wp(pte))
- swp_pte = pte_swp_mkuffd_wp(swp_pte);
+ if (pte_present(pte)) {
+ if (pte_soft_dirty(pte))
+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
+ if (pte_uffd_wp(pte))
+ swp_pte = pte_swp_mkuffd_wp(swp_pte);
+ } else {
+ if (pte_swp_soft_dirty(pte))
+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
+ if (pte_swp_uffd_wp(pte))
+ swp_pte = pte_swp_mkuffd_wp(swp_pte);
+ }
set_pte_at(mm, addr, ptep, swp_pte);
/*
--- a/mm/rmap.c~mm-rmap-fixup-copying-of-soft-dirty-and-uffd-ptes
+++ a/mm/rmap.c
@@ -1511,9 +1511,14 @@ static bool try_to_unmap_one(struct page
*/
entry = make_migration_entry(page, 0);
swp_pte = swp_entry_to_pte(entry);
- if (pte_soft_dirty(pteval))
+
+ /*
+ * pteval maps a zone device page and is therefore
+ * a swap pte.
+ */
+ if (pte_swp_soft_dirty(pteval))
swp_pte = pte_swp_mksoft_dirty(swp_pte);
- if (pte_uffd_wp(pteval))
+ if (pte_swp_uffd_wp(pteval))
swp_pte = pte_swp_mkuffd_wp(swp_pte);
set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte);
/*
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 14/19] mm/migrate: remove unnecessary is_zone_device_page() check
2020-09-04 23:34 incoming Andrew Morton
` (12 preceding siblings ...)
2020-09-04 23:36 ` [patch 13/19] mm/rmap: fixup copying of soft dirty and uffd ptes Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
2020-09-04 23:36 ` [patch 15/19] mm/migrate: preserve soft dirty in remove_migration_pte() Andrew Morton
` (4 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
To: akpm, apopple, bharata, hch, jgg, jglisse, linux-mm, mm-commits,
rcampbell, torvalds
From: Ralph Campbell <rcampbell@nvidia.com>
Subject: mm/migrate: remove unnecessary is_zone_device_page() check
Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()".
I happened to notice this from code inspection after seeing Alistair
Popple's patch ("mm/rmap: Fixup copying of soft dirty and uffd ptes").
This patch (of 2):
The check for is_zone_device_page() and is_device_private_page() is
unnecessary since the latter is sufficient to determine if the page is a
device private page. Simplify the code for easier reading.
Link: https://lkml.kernel.org/r/20200831212222.22409-1-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20200831212222.22409-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
--- a/mm/migrate.c~mm-migrate-remove-unnecessary-is_zone_device_page-check
+++ a/mm/migrate.c
@@ -246,13 +246,11 @@ static bool remove_migration_pte(struct
else if (pte_swp_uffd_wp(*pvmw.pte))
pte = pte_mkuffd_wp(pte);
- if (unlikely(is_zone_device_page(new))) {
- if (is_device_private_page(new)) {
- entry = make_device_private_entry(new, pte_write(pte));
- pte = swp_entry_to_pte(entry);
- if (pte_swp_uffd_wp(*pvmw.pte))
- pte = pte_swp_mkuffd_wp(pte);
- }
+ if (unlikely(is_device_private_page(new))) {
+ entry = make_device_private_entry(new, pte_write(pte));
+ pte = swp_entry_to_pte(entry);
+ if (pte_swp_uffd_wp(*pvmw.pte))
+ pte = pte_swp_mkuffd_wp(pte);
}
#ifdef CONFIG_HUGETLB_PAGE
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 15/19] mm/migrate: preserve soft dirty in remove_migration_pte()
2020-09-04 23:34 incoming Andrew Morton
` (13 preceding siblings ...)
2020-09-04 23:36 ` [patch 14/19] mm/migrate: remove unnecessary is_zone_device_page() check Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
2020-09-04 23:36 ` [patch 16/19] mm/hugetlb: try preferred node first when alloc gigantic page from cma Andrew Morton
` (3 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
To: akpm, apopple, bharata, hch, jgg, jglisse, linux-mm, mm-commits,
rcampbell, torvalds
From: Ralph Campbell <rcampbell@nvidia.com>
Subject: mm/migrate: preserve soft dirty in remove_migration_pte()
The code to remove a migration PTE and replace it with a device private
PTE was not copying the soft dirty bit from the migration entry. This
could lead to page contents not being marked dirty when faulting the page
back from device private memory.
Link: https://lkml.kernel.org/r/20200831212222.22409-3-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/migrate.c~mm-migrate-preserve-soft-dirty-in-remove_migration_pte
+++ a/mm/migrate.c
@@ -249,6 +249,8 @@ static bool remove_migration_pte(struct
if (unlikely(is_device_private_page(new))) {
entry = make_device_private_entry(new, pte_write(pte));
pte = swp_entry_to_pte(entry);
+ if (pte_swp_soft_dirty(*pvmw.pte))
+ pte = pte_swp_mksoft_dirty(pte);
if (pte_swp_uffd_wp(*pvmw.pte))
pte = pte_swp_mkuffd_wp(pte);
}
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 16/19] mm/hugetlb: try preferred node first when alloc gigantic page from cma
2020-09-04 23:34 incoming Andrew Morton
` (14 preceding siblings ...)
2020-09-04 23:36 ` [patch 15/19] mm/migrate: preserve soft dirty in remove_migration_pte() Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
2020-09-04 23:36 ` [patch 17/19] mm/hugetlb: fix a race between hugetlb sysctl handlers Andrew Morton
` (2 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
To: akpm, guro, linux-mm, lixinhai.lxh, mhocko, mike.kravetz,
mm-commits, torvalds
From: Li Xinhai <lixinhai.lxh@gmail.com>
Subject: mm/hugetlb: try preferred node first when alloc gigantic page from cma
Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic
hugepages using cma"), the gigantic page would be allocated from node
which is not the preferred node, although there are pages available from
that node. The reason is that the nid parameter has been ignored in
alloc_gigantic_page().
Besides, the __GFP_THISNODE also need be checked if user required to alloc
only from the preferred node.
After this patch, the preferred node is tried first before other allowed
nodes, and don't try to allocate from other nodes if __GFP_THISNODE is
specified. If user don't specify the preferred node, the current node
will be used as preferred node, which makes sure consistent behavior of
allocating gigantic and non-gigantic hugetlb page.
Link: https://lkml.kernel.org/r/20200902025016.697260-1-lixinhai.lxh@gmail.com
Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-try-preferred-node-first-when-alloc-gigantic-page-from-cma
+++ a/mm/hugetlb.c
@@ -1250,21 +1250,32 @@ static struct page *alloc_gigantic_page(
int nid, nodemask_t *nodemask)
{
unsigned long nr_pages = 1UL << huge_page_order(h);
+ if (nid == NUMA_NO_NODE)
+ nid = numa_mem_id();
#ifdef CONFIG_CMA
{
struct page *page;
int node;
- for_each_node_mask(node, *nodemask) {
- if (!hugetlb_cma[node])
- continue;
-
- page = cma_alloc(hugetlb_cma[node], nr_pages,
- huge_page_order(h), true);
+ if (hugetlb_cma[nid]) {
+ page = cma_alloc(hugetlb_cma[nid], nr_pages,
+ huge_page_order(h), true);
if (page)
return page;
}
+
+ if (!(gfp_mask & __GFP_THISNODE)) {
+ for_each_node_mask(node, *nodemask) {
+ if (node == nid || !hugetlb_cma[node])
+ continue;
+
+ page = cma_alloc(hugetlb_cma[node], nr_pages,
+ huge_page_order(h), true);
+ if (page)
+ return page;
+ }
+ }
}
#endif
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 17/19] mm/hugetlb: fix a race between hugetlb sysctl handlers
2020-09-04 23:34 incoming Andrew Morton
` (15 preceding siblings ...)
2020-09-04 23:36 ` [patch 16/19] mm/hugetlb: try preferred node first when alloc gigantic page from cma Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
2020-09-04 23:36 ` [patch 18/19] mm/khugepaged.c: fix khugepaged's request size in collapse_file Andrew Morton
2020-09-04 23:36 ` [patch 19/19] include/linux/log2.h: add missing () around n in roundup_pow_of_two() Andrew Morton
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
To: ak, akpm, linux-mm, mike.kravetz, mm-commits, songmuchun, torvalds
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm/hugetlb: fix a race between hugetlb sysctl handlers
There is a race between the assignment of `table->data` and write value to
the pointer of `table->data` in the __do_proc_doulongvec_minmax() on the
other thread.
CPU0: CPU1:
proc_sys_write
hugetlb_sysctl_handler proc_sys_call_handler
hugetlb_sysctl_handler_common hugetlb_sysctl_handler
table->data = &tmp; hugetlb_sysctl_handler_common
table->data = &tmp;
proc_doulongvec_minmax
do_proc_doulongvec_minmax sysctl_head_finish
__do_proc_doulongvec_minmax unuse_table
i = table->data;
*i = val; // corrupt CPU1's stack
Fix this by duplicating the `table`, and only update the duplicate of
it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.
The following oops was seen:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
Code: Bad RIP value.
...
Call Trace:
? set_max_huge_pages+0x3da/0x4f0
? alloc_pool_huge_page+0x150/0x150
? proc_doulongvec_minmax+0x46/0x60
? hugetlb_sysctl_handler_common+0x1c7/0x200
? nr_hugepages_store+0x20/0x20
? copy_fd_bitmaps+0x170/0x170
? hugetlb_sysctl_handler+0x1e/0x20
? proc_sys_call_handler+0x2f1/0x300
? unregister_sysctl_table+0xb0/0xb0
? __fd_install+0x78/0x100
? proc_sys_write+0x14/0x20
? __vfs_write+0x4d/0x90
? vfs_write+0xef/0x240
? ksys_write+0xc0/0x160
? __ia32_sys_read+0x50/0x50
? __close_fd+0x129/0x150
? __x64_sys_write+0x43/0x50
? do_syscall_64+0x6c/0x200
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.com
Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-hugetlb-sysctl-handlers
+++ a/mm/hugetlb.c
@@ -3465,6 +3465,22 @@ static unsigned int allowed_mems_nr(stru
}
#ifdef CONFIG_SYSCTL
+static int proc_hugetlb_doulongvec_minmax(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos, unsigned long *out)
+{
+ struct ctl_table dup_table;
+
+ /*
+ * In order to avoid races with __do_proc_doulongvec_minmax(), we
+ * can duplicate the @table and alter the duplicate of it.
+ */
+ dup_table = *table;
+ dup_table.data = out;
+
+ return proc_doulongvec_minmax(&dup_table, write, buffer, length, ppos);
+}
+
static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
struct ctl_table *table, int write,
void *buffer, size_t *length, loff_t *ppos)
@@ -3476,9 +3492,8 @@ static int hugetlb_sysctl_handler_common
if (!hugepages_supported())
return -EOPNOTSUPP;
- table->data = &tmp;
- table->maxlen = sizeof(unsigned long);
- ret = proc_doulongvec_minmax(table, write, buffer, length, ppos);
+ ret = proc_hugetlb_doulongvec_minmax(table, write, buffer, length, ppos,
+ &tmp);
if (ret)
goto out;
@@ -3521,9 +3536,8 @@ int hugetlb_overcommit_handler(struct ct
if (write && hstate_is_gigantic(h))
return -EINVAL;
- table->data = &tmp;
- table->maxlen = sizeof(unsigned long);
- ret = proc_doulongvec_minmax(table, write, buffer, length, ppos);
+ ret = proc_hugetlb_doulongvec_minmax(table, write, buffer, length, ppos,
+ &tmp);
if (ret)
goto out;
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 18/19] mm/khugepaged.c: fix khugepaged's request size in collapse_file
2020-09-04 23:34 incoming Andrew Morton
` (16 preceding siblings ...)
2020-09-04 23:36 ` [patch 17/19] mm/hugetlb: fix a race between hugetlb sysctl handlers Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
2020-09-04 23:36 ` [patch 19/19] include/linux/log2.h: add missing () around n in roundup_pow_of_two() Andrew Morton
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
To: akpm, dhowells, ebiggers, linux-mm, mm-commits,
pankaj.gupta.linux, shy828301, songliubraving, torvalds, willy
From: David Howells <dhowells@redhat.com>
Subject: mm/khugepaged.c: fix khugepaged's request size in collapse_file
collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
be read to page_cache_sync_readahead(). The intent was probably to read a
single page. Fix it to use the number of pages to the end of the window
instead.
Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.org
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Yang Shi <shy828301@gmail.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/khugepaged.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/khugepaged.c~fix-khugepageds-request-size-in-collapse_file
+++ a/mm/khugepaged.c
@@ -1709,7 +1709,7 @@ static void collapse_file(struct mm_stru
xas_unlock_irq(&xas);
page_cache_sync_readahead(mapping, &file->f_ra,
file, index,
- PAGE_SIZE);
+ end - index);
/* drain pagevecs to help isolate_lru_page() */
lru_add_drain();
page = find_lock_page(mapping, index);
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 19/19] include/linux/log2.h: add missing () around n in roundup_pow_of_two()
2020-09-04 23:34 incoming Andrew Morton
` (17 preceding siblings ...)
2020-09-04 23:36 ` [patch 18/19] mm/khugepaged.c: fix khugepaged's request size in collapse_file Andrew Morton
@ 2020-09-04 23:36 ` Andrew Morton
18 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2020-09-04 23:36 UTC (permalink / raw)
To: akpm, jgg, linux-mm, mm-commits, torvalds
From: Jason Gunthorpe <jgg@nvidia.com>
Subject: include/linux/log2.h: add missing () around n in roundup_pow_of_two()
Otherwise gcc generates warnings if the expression is complicated.
Link: https://lkml.kernel.org/r/0-v1-8a2697e3c003+41165-log_brackets_jgg@nvidia.com
Fixes: 312a0c170945 ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/log2.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/log2.h~log2-add-missing-around-n-in-roundup_pow_of_two
+++ a/include/linux/log2.h
@@ -173,7 +173,7 @@ unsigned long __rounddown_pow_of_two(uns
#define roundup_pow_of_two(n) \
( \
__builtin_constant_p(n) ? ( \
- (n == 1) ? 1 : \
+ ((n) == 1) ? 1 : \
(1UL << (ilog2((n) - 1) + 1)) \
) : \
__roundup_pow_of_two(n) \
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 05/19] MAINTAINERS: add LLVM maintainers
2020-09-04 23:35 ` [patch 05/19] MAINTAINERS: add LLVM maintainers Andrew Morton
@ 2020-09-05 17:25 ` Masahiro Yamada
0 siblings, 0 replies; 21+ messages in thread
From: Masahiro Yamada @ 2020-09-05 17:25 UTC (permalink / raw)
To: Andrew Morton, Linus Torvalds
Cc: linux-mm, Lukas Bulwahn, Miguel Ojeda, mm-commits,
Nathan Chancellor, Nick Desaulniers, Sedat Dilek
Hi Linus,
On Sat, Sep 5, 2020 at 8:35 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> From: Nick Desaulniers <ndesaulniers@google.com>
> Subject: MAINTAINERS: add LLVM maintainers
>
> Nominate Nathan and myself to be point of contact for clang/LLVM related
> support, after a poll at the LLVM BoF at Linux Plumbers Conf 2020.
>
> While corporate sponsorship is beneficial, its important to not entrust
> the keys to the nukes with any one entity. Should Nathan and I find
> ourselves at the same employer, I would gladly step down.
>
> Link: https://lkml.kernel.org/r/20200825143540.2948637-1-ndesaulniers@google.com
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> Acked-by: Nathan Chancellor <natechancellor@gmail.com>
> Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
> Acked-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
> Cc: Masahiro Yamada <masahiroy@kernel.org
The closing '>' is missing in this line.
Please feel free to replace Cc: with Acked-by:
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> MAINTAINERS | 2 ++
> 1 file changed, 2 insertions(+)
>
> --- a/MAINTAINERS~maintainers-add-llvm-maintainers
> +++ a/MAINTAINERS
> @@ -4257,6 +4257,8 @@ S: Maintained
> F: .clang-format
>
> CLANG/LLVM BUILD SUPPORT
> +M: Nathan Chancellor <natechancellor@gmail.com>
> +M: Nick Desaulniers <ndesaulniers@google.com>
> L: clang-built-linux@googlegroups.com
> S: Supported
> W: https://clangbuiltlinux.github.io/
> _
--
Best Regards
Masahiro Yamada
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2020-09-05 17:26 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 23:34 incoming Andrew Morton
2020-09-04 23:35 ` [patch 01/19] memcg: fix use-after-free in uncharge_batch Andrew Morton
2020-09-04 23:35 ` [patch 02/19] mm: memcg: fix memcg reclaim soft lockup Andrew Morton
2020-09-04 23:35 ` [patch 03/19] mm: slub: fix conversion of freelist_corrupted() Andrew Morton
2020-09-04 23:35 ` [patch 04/19] MAINTAINERS: update Cavium/Marvell entries Andrew Morton
2020-09-04 23:35 ` [patch 05/19] MAINTAINERS: add LLVM maintainers Andrew Morton
2020-09-05 17:25 ` Masahiro Yamada
2020-09-04 23:35 ` [patch 06/19] MAINTAINERS: IA64: mark Status as Odd Fixes only Andrew Morton
2020-09-04 23:35 ` [patch 07/19] mm: track page table modifications in __apply_to_page_range() Andrew Morton
2020-09-04 23:35 ` [patch 08/19] ipc: adjust proc_ipc_sem_dointvec definition to match prototype Andrew Morton
2020-09-04 23:35 ` [patch 09/19] fork: adjust sysctl_max_threads " Andrew Morton
2020-09-04 23:35 ` [patch 10/19] checkpatch: fix the usage of capture group ( ... ) Andrew Morton
2020-09-04 23:35 ` [patch 11/19] mm: madvise: fix vma user-after-free Andrew Morton
2020-09-04 23:35 ` [patch 12/19] mm/migrate: fixup setting UFFD_WP flag Andrew Morton
2020-09-04 23:36 ` [patch 13/19] mm/rmap: fixup copying of soft dirty and uffd ptes Andrew Morton
2020-09-04 23:36 ` [patch 14/19] mm/migrate: remove unnecessary is_zone_device_page() check Andrew Morton
2020-09-04 23:36 ` [patch 15/19] mm/migrate: preserve soft dirty in remove_migration_pte() Andrew Morton
2020-09-04 23:36 ` [patch 16/19] mm/hugetlb: try preferred node first when alloc gigantic page from cma Andrew Morton
2020-09-04 23:36 ` [patch 17/19] mm/hugetlb: fix a race between hugetlb sysctl handlers Andrew Morton
2020-09-04 23:36 ` [patch 18/19] mm/khugepaged.c: fix khugepaged's request size in collapse_file Andrew Morton
2020-09-04 23:36 ` [patch 19/19] include/linux/log2.h: add missing () around n in roundup_pow_of_two() Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.