* + mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch added to -mm tree
@ 2020-08-20 21:21 akpm
0 siblings, 0 replies; 2+ messages in thread
From: akpm @ 2020-08-20 21:21 UTC (permalink / raw)
To: aquini, cmaiolino, esandeen, hsiangkao, mm-commits, shy828301,
stable, willy, ying.huang
The patch titled
Subject: mm, THP, swap: fix allocating cluster for swapfile by mistake
has been added to the -mm tree. Its filename is
mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Gao Xiang <hsiangkao@redhat.com>
Subject: mm, THP, swap: fix allocating cluster for swapfile by mistake
SWP_FS is used to make swap_{read,write}page() go through the filesystem,
and it's only used for swap files over NFS. So, !SWP_FS means non NFS for
now, it could be either file backed or device backed. Something similar
goes with legacy SWP_FILE.
So in order to achieve the goal of the original patch, SWP_BLKDEV should
be used instead.
FS corruption can be observed with SSD device + XFS + fragmented swapfile
due to CONFIG_THP_SWAP=y.
I reproduced the issue with the following details:
Environment:
QEMU + upstream kernel + buildroot + NVMe (2 GB)
Kernel config:
CONFIG_BLK_DEV_NVME=y
CONFIG_THP_SWAP=y
Some reproducable steps:
mkfs.xfs -f /dev/nvme0n1
mkdir /tmp/mnt
mount /dev/nvme0n1 /tmp/mnt
bs="32k"
sz="1024m" # doesn't matter too much, I also tried 16m
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw
mkswap /tmp/mnt/sw
swapon /tmp/mnt/sw
stress --vm 2 --vm-bytes 600M # doesn't matter too much as well
Symptoms:
- FS corruption (e.g. checksum failure)
- memory corruption at: 0xd2808010
- segfault
Link: https://lkml.kernel.org/r/20200820045323.7809-1-hsiangkao@redhat.com
Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device")
Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Eric Sandeen <esandeen@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swapfile.c~mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake
+++ a/mm/swapfile.c
@@ -1078,7 +1078,7 @@ start_over:
goto nextsi;
}
if (size == SWAPFILE_CLUSTER) {
- if (!(si->flags & SWP_FS))
+ if (si->flags & SWP_BLKDEV)
n_ret = swap_alloc_cluster(si, swp_entries);
} else
n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
_
Patches currently in -mm which might be from hsiangkao@redhat.com are
mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
^ permalink raw reply [flat|nested] 2+ messages in thread
* incoming
@ 2020-08-15 0:29 Andrew Morton
2020-08-19 20:08 ` + mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch added to -mm tree Andrew Morton
0 siblings, 1 reply; 2+ messages in thread
From: Andrew Morton @ 2020-08-15 0:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
39 patches, based on b923f1247b72fc100b87792fd2129d026bb10e66.
Subsystems affected by this patch series:
mm/hotfixes
lz4
exec
mailmap
mm/thp
autofs
mm/madvise
sysctl
mm/kmemleak
mm/misc
lib
Subsystem: mm/hotfixes
Mike Rapoport <rppt@linux.ibm.com>:
asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()
Baoquan He <bhe@redhat.com>:
Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"
Subsystem: lz4
Nick Terrell <terrelln@fb.com>:
lz4: fix kernel decompression speed
Subsystem: exec
Kees Cook <keescook@chromium.org>:
Patch series "Fix S_ISDIR execve() errno":
exec: restore EACCES of S_ISDIR execve()
selftests/exec: add file type errno tests
Subsystem: mailmap
Greg Kurz <groug@kaod.org>:
mailmap: add entry for Greg Kurz
Subsystem: mm/thp
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "THP prep patches":
mm: store compound_nr as well as compound_order
mm: move page-flags include to top of file
mm: add thp_order
mm: add thp_size
mm: replace hpage_nr_pages with thp_nr_pages
mm: add thp_head
mm: introduce offset_in_thp
Subsystem: autofs
Randy Dunlap <rdunlap@infradead.org>:
fs: autofs: delete repeated words in comments
Subsystem: mm/madvise
Minchan Kim <minchan@kernel.org>:
Patch series "introduce memory hinting API for external process", v8:
mm/madvise: pass task and mm to do_madvise
pid: move pidfd_get_pid() to pid.c
mm/madvise: introduce process_madvise() syscall: an external memory hinting API
mm/madvise: check fatal signal pending of target process
Subsystem: sysctl
Xiaoming Ni <nixiaoming@huawei.com>:
all arch: remove system call sys_sysctl
Subsystem: mm/kmemleak
Qian Cai <cai@lca.pw>:
mm/kmemleak: silence KCSAN splats in checksum
Subsystem: mm/misc
Qian Cai <cai@lca.pw>:
mm/frontswap: mark various intentional data races
mm/page_io: mark various intentional data races
mm/swap_state: mark various intentional data races
Kirill A. Shutemov <kirill@shutemov.name>:
mm/filemap.c: fix a data race in filemap_fault()
Qian Cai <cai@lca.pw>:
mm/swapfile: fix and annotate various data races
mm/page_counter: fix various data races at memsw
mm/memcontrol: fix a data race in scan count
mm/list_lru: fix a data race in list_lru_count_one
mm/mempool: fix a data race in mempool_free()
mm/rmap: annotate a data race at tlb_flush_batched
mm/swap.c: annotate data races for lru_rotate_pvecs
mm: annotate a data race in page_zonenum()
Romain Naour <romain.naour@gmail.com>:
include/asm-generic/vmlinux.lds.h: align ro_after_init
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:
sh: clkfwk: remove r8/r16/r32
sh: use generic strncpy()
Subsystem: lib
Krzysztof Kozlowski <krzk@kernel.org>:
Patch series "iomap: Constify ioreadX() iomem argument", v3:
iomap: constify ioreadX() iomem argument (as in generic implementation)
rtl818x: constify ioreadX() iomem argument (as in generic implementation)
ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
virtio: pci: constify ioreadX() iomem argument (as in generic implementation)
.mailmap | 1
arch/alpha/include/asm/core_apecs.h | 6
arch/alpha/include/asm/core_cia.h | 6
arch/alpha/include/asm/core_lca.h | 6
arch/alpha/include/asm/core_marvel.h | 4
arch/alpha/include/asm/core_mcpcia.h | 6
arch/alpha/include/asm/core_t2.h | 2
arch/alpha/include/asm/io.h | 12 -
arch/alpha/include/asm/io_trivial.h | 16 -
arch/alpha/include/asm/jensen.h | 2
arch/alpha/include/asm/machvec.h | 6
arch/alpha/kernel/core_marvel.c | 2
arch/alpha/kernel/io.c | 12 -
arch/alpha/kernel/syscalls/syscall.tbl | 3
arch/arm/configs/am200epdkit_defconfig | 1
arch/arm/tools/syscall.tbl | 3
arch/arm64/include/asm/unistd.h | 2
arch/arm64/include/asm/unistd32.h | 6
arch/ia64/kernel/syscalls/syscall.tbl | 3
arch/m68k/kernel/syscalls/syscall.tbl | 3
arch/microblaze/kernel/syscalls/syscall.tbl | 3
arch/mips/configs/cu1000-neo_defconfig | 1
arch/mips/kernel/syscalls/syscall_n32.tbl | 3
arch/mips/kernel/syscalls/syscall_n64.tbl | 3
arch/mips/kernel/syscalls/syscall_o32.tbl | 3
arch/parisc/include/asm/io.h | 4
arch/parisc/kernel/syscalls/syscall.tbl | 3
arch/parisc/lib/iomap.c | 72 +++---
arch/powerpc/kernel/iomap.c | 28 +-
arch/powerpc/kernel/syscalls/syscall.tbl | 3
arch/s390/kernel/syscalls/syscall.tbl | 3
arch/sh/configs/dreamcast_defconfig | 1
arch/sh/configs/espt_defconfig | 1
arch/sh/configs/hp6xx_defconfig | 1
arch/sh/configs/landisk_defconfig | 1
arch/sh/configs/lboxre2_defconfig | 1
arch/sh/configs/microdev_defconfig | 1
arch/sh/configs/migor_defconfig | 1
arch/sh/configs/r7780mp_defconfig | 1
arch/sh/configs/r7785rp_defconfig | 1
arch/sh/configs/rts7751r2d1_defconfig | 1
arch/sh/configs/rts7751r2dplus_defconfig | 1
arch/sh/configs/se7206_defconfig | 1
arch/sh/configs/se7343_defconfig | 1
arch/sh/configs/se7619_defconfig | 1
arch/sh/configs/se7705_defconfig | 1
arch/sh/configs/se7750_defconfig | 1
arch/sh/configs/se7751_defconfig | 1
arch/sh/configs/secureedge5410_defconfig | 1
arch/sh/configs/sh03_defconfig | 1
arch/sh/configs/sh7710voipgw_defconfig | 1
arch/sh/configs/sh7757lcr_defconfig | 1
arch/sh/configs/sh7763rdp_defconfig | 1
arch/sh/configs/shmin_defconfig | 1
arch/sh/configs/titan_defconfig | 1
arch/sh/include/asm/string_32.h | 26 --
arch/sh/kernel/iomap.c | 22 -
arch/sh/kernel/syscalls/syscall.tbl | 3
arch/sparc/kernel/syscalls/syscall.tbl | 3
arch/x86/entry/syscalls/syscall_32.tbl | 3
arch/x86/entry/syscalls/syscall_64.tbl | 4
arch/xtensa/kernel/syscalls/syscall.tbl | 3
drivers/mailbox/bcm-pdc-mailbox.c | 2
drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h | 6
drivers/ntb/hw/intel/ntb_hw_gen1.c | 2
drivers/ntb/hw/intel/ntb_hw_gen3.h | 2
drivers/ntb/hw/intel/ntb_hw_intel.h | 2
drivers/nvdimm/btt.c | 4
drivers/nvdimm/pmem.c | 6
drivers/sh/clk/cpg.c | 25 --
drivers/virtio/virtio_pci_modern.c | 6
fs/autofs/dev-ioctl.c | 4
fs/io_uring.c | 2
fs/namei.c | 4
include/asm-generic/iomap.h | 28 +-
include/asm-generic/pgalloc.h | 2
include/asm-generic/vmlinux.lds.h | 1
include/linux/compat.h | 5
include/linux/huge_mm.h | 58 ++++-
include/linux/io-64-nonatomic-hi-lo.h | 4
include/linux/io-64-nonatomic-lo-hi.h | 4
include/linux/memcontrol.h | 2
include/linux/mm.h | 16 -
include/linux/mm_inline.h | 6
include/linux/mm_types.h | 1
include/linux/pagemap.h | 6
include/linux/pid.h | 1
include/linux/syscalls.h | 4
include/linux/sysctl.h | 6
include/uapi/asm-generic/unistd.h | 4
kernel/Makefile | 2
kernel/exit.c | 17 -
kernel/pid.c | 17 +
kernel/sys_ni.c | 3
kernel/sysctl_binary.c | 171 --------------
lib/iomap.c | 30 +-
lib/lz4/lz4_compress.c | 4
lib/lz4/lz4_decompress.c | 18 -
lib/lz4/lz4defs.h | 10
lib/lz4/lz4hc_compress.c | 2
mm/compaction.c | 2
mm/filemap.c | 22 +
mm/frontswap.c | 8
mm/gup.c | 2
mm/internal.h | 4
mm/kmemleak.c | 2
mm/list_lru.c | 2
mm/madvise.c | 190 ++++++++++++++--
mm/memcontrol.c | 10
mm/memory.c | 4
mm/memory_hotplug.c | 7
mm/mempolicy.c | 2
mm/mempool.c | 2
mm/migrate.c | 18 -
mm/mlock.c | 9
mm/page_alloc.c | 5
mm/page_counter.c | 13 -
mm/page_io.c | 12 -
mm/page_vma_mapped.c | 6
mm/rmap.c | 10
mm/swap.c | 21 -
mm/swap_state.c | 10
mm/swapfile.c | 33 +-
mm/vmscan.c | 6
mm/vmstat.c | 12 -
mm/workingset.c | 6
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 2
tools/perf/arch/s390/entry/syscalls/syscall.tbl | 2
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2
tools/testing/selftests/exec/.gitignore | 1
tools/testing/selftests/exec/Makefile | 5
tools/testing/selftests/exec/non-regular.c | 196 +++++++++++++++++
132 files changed, 815 insertions(+), 614 deletions(-)
^ permalink raw reply [flat|nested] 2+ messages in thread
* + mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch added to -mm tree
2020-08-15 0:29 incoming Andrew Morton
@ 2020-08-19 20:08 ` Andrew Morton
0 siblings, 0 replies; 2+ messages in thread
From: Andrew Morton @ 2020-08-19 20:08 UTC (permalink / raw)
To: hsiangkao, mm-commits, stable, ying.huang
The patch titled
Subject: mm, THP, swap: fix allocating cluster for swapfile by mistake
has been added to the -mm tree. Its filename is
mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Gao Xiang <hsiangkao@redhat.com>
Subject: mm, THP, swap: fix allocating cluster for swapfile by mistake
SWP_FS doesn't mean the device is file-backed swap device, which just
means each writeback request should go through fs by DIO. Or it'll just
use extents added by .swap_activate(), but it also works as file-backed
swap device.
So in order to achieve the goal of the original patch, SWP_BLKDEV should
be used instead.
FS corruption can be observed with SSD device + XFS + fragmented swapfile
due to CONFIG_THP_SWAP=y.
I reproduced the issue with the following details:
Environment:
QEMU + upstream kernel + buildroot + NVMe (2 GB)
Kernel config:
CONFIG_BLK_DEV_NVME=y
CONFIG_THP_SWAP=y
Some reproducable steps:
mkfs.xfs -f /dev/nvme0n1
mkdir /tmp/mnt
mount /dev/nvme0n1 /tmp/mnt
bs="32k"
sz="1024m" # doesn't matter too much, I also tried 16m
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw
mkswap /tmp/mnt/sw
swapon /tmp/mnt/sw
stress --vm 2 --vm-bytes 600M # doesn't matter too much as well
Symptoms:
- FS corruption (e.g. checksum failure)
- memory corruption at: 0xd2808010
- segfault
Link: https://lkml.kernel.org/r/20200819195613.24269-1-hsiangkao@redhat.com
Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device")
Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swapfile.c~mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake
+++ a/mm/swapfile.c
@@ -1078,7 +1078,7 @@ start_over:
goto nextsi;
}
if (size == SWAPFILE_CLUSTER) {
- if (!(si->flags & SWP_FS))
+ if (si->flags & SWP_BLKDEV)
n_ret = swap_alloc_cluster(si, swp_entries);
} else
n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
_
Patches currently in -mm which might be from hsiangkao@redhat.com are
mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-08-20 21:21 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20 21:21 + mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch added to -mm tree akpm
-- strict thread matches above, loose matches on Subject: below --
2020-08-15 0:29 incoming Andrew Morton
2020-08-19 20:08 ` + mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch added to -mm tree Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.