* + mm-thp-narrow-lru-locking.patch added to -mm tree
@ 2020-03-02 22:14 akpm
0 siblings, 0 replies; 3+ messages in thread
From: akpm @ 2020-03-02 22:14 UTC (permalink / raw)
To: aarcange, alex.shi, daniel.m.jordan, hannes, hughd, khlebnikov,
kirill, kravetz, mhocko, mm-commits, tj, vdavydov.dev, willy,
yang.shi
The patch titled
Subject: mm/thp: narrow lru locking
has been added to the -mm tree. Its filename is
mm-thp-narrow-lru-locking.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-narrow-lru-locking.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-narrow-lru-locking.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/thp: narrow lru locking
Lru locking just guards the lru list and subpage's Mlocked. Including
other things can't give help just delay the locking release. So narrow
the locking for early lock release and better code meaning.
Link: http://lkml.kernel.org/r/1583146830-169516-7-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <kravetz@us.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
--- a/mm/huge_memory.c~mm-thp-narrow-lru-locking
+++ a/mm/huge_memory.c
@@ -2559,13 +2559,14 @@ static void __split_huge_page_tail(struc
}
static void __split_huge_page(struct page *page, struct list_head *list,
- pgoff_t end, unsigned long flags)
+ pgoff_t end)
{
struct page *head = compound_head(page);
pg_data_t *pgdat = page_pgdat(head);
struct lruvec *lruvec;
struct address_space *swap_cache = NULL;
unsigned long offset = 0;
+ unsigned long flags;
int i;
lruvec = mem_cgroup_page_lruvec(head, pgdat);
@@ -2581,6 +2582,9 @@ static void __split_huge_page(struct pag
xa_lock(&swap_cache->i_pages);
}
+ /* Lru list would be changed, don't care head's LRU bit. */
+ spin_lock_irqsave(&pgdat->lru_lock, flags);
+
for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond i_size: drop them from page cache */
@@ -2598,6 +2602,7 @@ static void __split_huge_page(struct pag
head + i, 0);
}
}
+ spin_unlock_irqrestore(&pgdat->lru_lock, flags);
ClearPageCompound(head);
@@ -2618,8 +2623,6 @@ static void __split_huge_page(struct pag
xa_unlock(&head->mapping->i_pages);
}
- spin_unlock_irqrestore(&pgdat->lru_lock, flags);
-
remap_page(head);
for (i = 0; i < HPAGE_PMD_NR; i++) {
@@ -2757,13 +2760,11 @@ bool can_split_huge_page(struct page *pa
int split_huge_page_to_list(struct page *page, struct list_head *list)
{
struct page *head = compound_head(page);
- struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
int count, mapcount, extra_pins, ret;
bool mlocked;
- unsigned long flags;
pgoff_t end;
VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2829,9 +2830,6 @@ int split_huge_page_to_list(struct page
if (mlocked)
lru_add_drain();
- /* prevent PageLRU to go away from under us, and freeze lru stats */
- spin_lock_irqsave(&pgdata->lru_lock, flags);
-
if (mapping) {
XA_STATE(xas, &mapping->i_pages, page_index(head));
@@ -2861,7 +2859,7 @@ int split_huge_page_to_list(struct page
__dec_node_page_state(head, NR_FILE_THPS);
}
- __split_huge_page(page, list, end, flags);
+ __split_huge_page(page, list, end);
if (PageSwapCache(head)) {
swp_entry_t entry = { .val = page_private(head) };
@@ -2880,7 +2878,6 @@ int split_huge_page_to_list(struct page
spin_unlock(&ds_queue->split_queue_lock);
fail: if (mapping)
xa_unlock(&mapping->i_pages);
- spin_unlock_irqrestore(&pgdata->lru_lock, flags);
remap_page(head);
ret = -EBUSY;
}
_
Patches currently in -mm which might be from alex.shi@linux.alibaba.com are
ocfs2-remove-fs_ocfs2_nm.patch
ocfs2-remove-unused-macros.patch
ocfs2-use-ocfs2_sec_bits-in-macro.patch
ocfs2-remove-dlm_lock_is_remote.patch
ocfs2-remove-useless-err.patch
mm-vmscan-remove-unnecessary-lruvec-adding.patch
mm-memcg-fold-lock_page_lru-into-commit_charge.patch
mm-page_idle-no-unlikely-double-check-for-idle-page-counting.patch
mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
mm-thp-clean-up-lru_add_page_tail.patch
mm-thp-narrow-lru-locking.patch
^ permalink raw reply [flat|nested] 3+ messages in thread
* incoming
@ 2020-08-15 0:29 Andrew Morton
2020-08-19 3:50 ` + mm-thp-narrow-lru-locking.patch added to -mm tree Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2020-08-15 0:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
39 patches, based on b923f1247b72fc100b87792fd2129d026bb10e66.
Subsystems affected by this patch series:
mm/hotfixes
lz4
exec
mailmap
mm/thp
autofs
mm/madvise
sysctl
mm/kmemleak
mm/misc
lib
Subsystem: mm/hotfixes
Mike Rapoport <rppt@linux.ibm.com>:
asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()
Baoquan He <bhe@redhat.com>:
Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"
Subsystem: lz4
Nick Terrell <terrelln@fb.com>:
lz4: fix kernel decompression speed
Subsystem: exec
Kees Cook <keescook@chromium.org>:
Patch series "Fix S_ISDIR execve() errno":
exec: restore EACCES of S_ISDIR execve()
selftests/exec: add file type errno tests
Subsystem: mailmap
Greg Kurz <groug@kaod.org>:
mailmap: add entry for Greg Kurz
Subsystem: mm/thp
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "THP prep patches":
mm: store compound_nr as well as compound_order
mm: move page-flags include to top of file
mm: add thp_order
mm: add thp_size
mm: replace hpage_nr_pages with thp_nr_pages
mm: add thp_head
mm: introduce offset_in_thp
Subsystem: autofs
Randy Dunlap <rdunlap@infradead.org>:
fs: autofs: delete repeated words in comments
Subsystem: mm/madvise
Minchan Kim <minchan@kernel.org>:
Patch series "introduce memory hinting API for external process", v8:
mm/madvise: pass task and mm to do_madvise
pid: move pidfd_get_pid() to pid.c
mm/madvise: introduce process_madvise() syscall: an external memory hinting API
mm/madvise: check fatal signal pending of target process
Subsystem: sysctl
Xiaoming Ni <nixiaoming@huawei.com>:
all arch: remove system call sys_sysctl
Subsystem: mm/kmemleak
Qian Cai <cai@lca.pw>:
mm/kmemleak: silence KCSAN splats in checksum
Subsystem: mm/misc
Qian Cai <cai@lca.pw>:
mm/frontswap: mark various intentional data races
mm/page_io: mark various intentional data races
mm/swap_state: mark various intentional data races
Kirill A. Shutemov <kirill@shutemov.name>:
mm/filemap.c: fix a data race in filemap_fault()
Qian Cai <cai@lca.pw>:
mm/swapfile: fix and annotate various data races
mm/page_counter: fix various data races at memsw
mm/memcontrol: fix a data race in scan count
mm/list_lru: fix a data race in list_lru_count_one
mm/mempool: fix a data race in mempool_free()
mm/rmap: annotate a data race at tlb_flush_batched
mm/swap.c: annotate data races for lru_rotate_pvecs
mm: annotate a data race in page_zonenum()
Romain Naour <romain.naour@gmail.com>:
include/asm-generic/vmlinux.lds.h: align ro_after_init
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:
sh: clkfwk: remove r8/r16/r32
sh: use generic strncpy()
Subsystem: lib
Krzysztof Kozlowski <krzk@kernel.org>:
Patch series "iomap: Constify ioreadX() iomem argument", v3:
iomap: constify ioreadX() iomem argument (as in generic implementation)
rtl818x: constify ioreadX() iomem argument (as in generic implementation)
ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
virtio: pci: constify ioreadX() iomem argument (as in generic implementation)
.mailmap | 1
arch/alpha/include/asm/core_apecs.h | 6
arch/alpha/include/asm/core_cia.h | 6
arch/alpha/include/asm/core_lca.h | 6
arch/alpha/include/asm/core_marvel.h | 4
arch/alpha/include/asm/core_mcpcia.h | 6
arch/alpha/include/asm/core_t2.h | 2
arch/alpha/include/asm/io.h | 12 -
arch/alpha/include/asm/io_trivial.h | 16 -
arch/alpha/include/asm/jensen.h | 2
arch/alpha/include/asm/machvec.h | 6
arch/alpha/kernel/core_marvel.c | 2
arch/alpha/kernel/io.c | 12 -
arch/alpha/kernel/syscalls/syscall.tbl | 3
arch/arm/configs/am200epdkit_defconfig | 1
arch/arm/tools/syscall.tbl | 3
arch/arm64/include/asm/unistd.h | 2
arch/arm64/include/asm/unistd32.h | 6
arch/ia64/kernel/syscalls/syscall.tbl | 3
arch/m68k/kernel/syscalls/syscall.tbl | 3
arch/microblaze/kernel/syscalls/syscall.tbl | 3
arch/mips/configs/cu1000-neo_defconfig | 1
arch/mips/kernel/syscalls/syscall_n32.tbl | 3
arch/mips/kernel/syscalls/syscall_n64.tbl | 3
arch/mips/kernel/syscalls/syscall_o32.tbl | 3
arch/parisc/include/asm/io.h | 4
arch/parisc/kernel/syscalls/syscall.tbl | 3
arch/parisc/lib/iomap.c | 72 +++---
arch/powerpc/kernel/iomap.c | 28 +-
arch/powerpc/kernel/syscalls/syscall.tbl | 3
arch/s390/kernel/syscalls/syscall.tbl | 3
arch/sh/configs/dreamcast_defconfig | 1
arch/sh/configs/espt_defconfig | 1
arch/sh/configs/hp6xx_defconfig | 1
arch/sh/configs/landisk_defconfig | 1
arch/sh/configs/lboxre2_defconfig | 1
arch/sh/configs/microdev_defconfig | 1
arch/sh/configs/migor_defconfig | 1
arch/sh/configs/r7780mp_defconfig | 1
arch/sh/configs/r7785rp_defconfig | 1
arch/sh/configs/rts7751r2d1_defconfig | 1
arch/sh/configs/rts7751r2dplus_defconfig | 1
arch/sh/configs/se7206_defconfig | 1
arch/sh/configs/se7343_defconfig | 1
arch/sh/configs/se7619_defconfig | 1
arch/sh/configs/se7705_defconfig | 1
arch/sh/configs/se7750_defconfig | 1
arch/sh/configs/se7751_defconfig | 1
arch/sh/configs/secureedge5410_defconfig | 1
arch/sh/configs/sh03_defconfig | 1
arch/sh/configs/sh7710voipgw_defconfig | 1
arch/sh/configs/sh7757lcr_defconfig | 1
arch/sh/configs/sh7763rdp_defconfig | 1
arch/sh/configs/shmin_defconfig | 1
arch/sh/configs/titan_defconfig | 1
arch/sh/include/asm/string_32.h | 26 --
arch/sh/kernel/iomap.c | 22 -
arch/sh/kernel/syscalls/syscall.tbl | 3
arch/sparc/kernel/syscalls/syscall.tbl | 3
arch/x86/entry/syscalls/syscall_32.tbl | 3
arch/x86/entry/syscalls/syscall_64.tbl | 4
arch/xtensa/kernel/syscalls/syscall.tbl | 3
drivers/mailbox/bcm-pdc-mailbox.c | 2
drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h | 6
drivers/ntb/hw/intel/ntb_hw_gen1.c | 2
drivers/ntb/hw/intel/ntb_hw_gen3.h | 2
drivers/ntb/hw/intel/ntb_hw_intel.h | 2
drivers/nvdimm/btt.c | 4
drivers/nvdimm/pmem.c | 6
drivers/sh/clk/cpg.c | 25 --
drivers/virtio/virtio_pci_modern.c | 6
fs/autofs/dev-ioctl.c | 4
fs/io_uring.c | 2
fs/namei.c | 4
include/asm-generic/iomap.h | 28 +-
include/asm-generic/pgalloc.h | 2
include/asm-generic/vmlinux.lds.h | 1
include/linux/compat.h | 5
include/linux/huge_mm.h | 58 ++++-
include/linux/io-64-nonatomic-hi-lo.h | 4
include/linux/io-64-nonatomic-lo-hi.h | 4
include/linux/memcontrol.h | 2
include/linux/mm.h | 16 -
include/linux/mm_inline.h | 6
include/linux/mm_types.h | 1
include/linux/pagemap.h | 6
include/linux/pid.h | 1
include/linux/syscalls.h | 4
include/linux/sysctl.h | 6
include/uapi/asm-generic/unistd.h | 4
kernel/Makefile | 2
kernel/exit.c | 17 -
kernel/pid.c | 17 +
kernel/sys_ni.c | 3
kernel/sysctl_binary.c | 171 --------------
lib/iomap.c | 30 +-
lib/lz4/lz4_compress.c | 4
lib/lz4/lz4_decompress.c | 18 -
lib/lz4/lz4defs.h | 10
lib/lz4/lz4hc_compress.c | 2
mm/compaction.c | 2
mm/filemap.c | 22 +
mm/frontswap.c | 8
mm/gup.c | 2
mm/internal.h | 4
mm/kmemleak.c | 2
mm/list_lru.c | 2
mm/madvise.c | 190 ++++++++++++++--
mm/memcontrol.c | 10
mm/memory.c | 4
mm/memory_hotplug.c | 7
mm/mempolicy.c | 2
mm/mempool.c | 2
mm/migrate.c | 18 -
mm/mlock.c | 9
mm/page_alloc.c | 5
mm/page_counter.c | 13 -
mm/page_io.c | 12 -
mm/page_vma_mapped.c | 6
mm/rmap.c | 10
mm/swap.c | 21 -
mm/swap_state.c | 10
mm/swapfile.c | 33 +-
mm/vmscan.c | 6
mm/vmstat.c | 12 -
mm/workingset.c | 6
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 2
tools/perf/arch/s390/entry/syscalls/syscall.tbl | 2
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2
tools/testing/selftests/exec/.gitignore | 1
tools/testing/selftests/exec/Makefile | 5
tools/testing/selftests/exec/non-regular.c | 196 +++++++++++++++++
132 files changed, 815 insertions(+), 614 deletions(-)
^ permalink raw reply [flat|nested] 3+ messages in thread
* + mm-thp-narrow-lru-locking.patch added to -mm tree
2020-08-15 0:29 incoming Andrew Morton
@ 2020-08-19 3:50 ` Andrew Morton
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2020-08-19 3:50 UTC (permalink / raw)
To: aarcange, alex.shi, guro, hannes, hughd, kirill.shutemov, mhocko,
mhocko, mm-commits, richard.weiyang, vdavydov.dev, willy
The patch titled
Subject: mm/thp: narrow lru locking
has been added to the -mm tree. Its filename is
mm-thp-narrow-lru-locking.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-narrow-lru-locking.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-narrow-lru-locking.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/thp: narrow lru locking
lru_lock and page cache xa_lock have no reason with current sequence, put
them together isn't necessary. let's narrow the lru locking, but left the
local_irq_disable to block interrupt re-entry and statistic update.
Hugh Dickins point: split_huge_page_to_list() was already silly,to be
using the _irqsave variant: it's just been taking sleeping locks, so would
already be broken if entered with interrupts enabled. so we can save
passing flags argument down to __split_huge_page().
Link: http://lkml.kernel.org/r/1597144232-11370-6-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
--- a/mm/huge_memory.c~mm-thp-narrow-lru-locking
+++ a/mm/huge_memory.c
@@ -2397,7 +2397,7 @@ static void __split_huge_page_tail(struc
}
static void __split_huge_page(struct page *page, struct list_head *list,
- pgoff_t end, unsigned long flags)
+ pgoff_t end)
{
struct page *head = compound_head(page);
pg_data_t *pgdat = page_pgdat(head);
@@ -2406,8 +2406,6 @@ static void __split_huge_page(struct pag
unsigned long offset = 0;
int i;
- lruvec = mem_cgroup_page_lruvec(head, pgdat);
-
/* complete memcg works before add pages to LRU */
mem_cgroup_split_huge_fixup(head);
@@ -2419,6 +2417,11 @@ static void __split_huge_page(struct pag
xa_lock(&swap_cache->i_pages);
}
+ /* prevent PageLRU to go away from under us, and freeze lru stats */
+ spin_lock(&pgdat->lru_lock);
+
+ lruvec = mem_cgroup_page_lruvec(head, pgdat);
+
for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond i_size: drop them from page cache */
@@ -2438,6 +2441,8 @@ static void __split_huge_page(struct pag
}
ClearPageCompound(head);
+ spin_unlock(&pgdat->lru_lock);
+ /* Caller disabled irqs, so they are still disabled here */
split_page_owner(head, HPAGE_PMD_ORDER);
@@ -2455,8 +2460,7 @@ static void __split_huge_page(struct pag
page_ref_add(head, 2);
xa_unlock(&head->mapping->i_pages);
}
-
- spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+ local_irq_enable();
remap_page(head);
@@ -2595,12 +2599,10 @@ bool can_split_huge_page(struct page *pa
int split_huge_page_to_list(struct page *page, struct list_head *list)
{
struct page *head = compound_head(page);
- struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
int count, mapcount, extra_pins, ret;
- unsigned long flags;
pgoff_t end;
VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2661,9 +2663,8 @@ int split_huge_page_to_list(struct page
unmap_page(head);
VM_BUG_ON_PAGE(compound_mapcount(head), head);
- /* prevent PageLRU to go away from under us, and freeze lru stats */
- spin_lock_irqsave(&pgdata->lru_lock, flags);
-
+ /* block interrupt reentry in xa_lock and spinlock */
+ local_irq_disable();
if (mapping) {
XA_STATE(xas, &mapping->i_pages, page_index(head));
@@ -2693,7 +2694,7 @@ int split_huge_page_to_list(struct page
__dec_node_page_state(head, NR_FILE_THPS);
}
- __split_huge_page(page, list, end, flags);
+ __split_huge_page(page, list, end);
if (PageSwapCache(head)) {
swp_entry_t entry = { .val = page_private(head) };
@@ -2712,7 +2713,7 @@ int split_huge_page_to_list(struct page
spin_unlock(&ds_queue->split_queue_lock);
fail: if (mapping)
xa_unlock(&mapping->i_pages);
- spin_unlock_irqrestore(&pgdata->lru_lock, flags);
+ local_irq_enable();
remap_page(head);
ret = -EBUSY;
}
_
Patches currently in -mm which might be from alex.shi@linux.alibaba.com are
mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
mm-memcg-remove-useless-check-on-page-mem_cgroup.patch
mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
mm-thp-clean-up-lru_add_page_tail.patch
mm-thp-remove-code-path-which-never-got-into.patch
mm-thp-narrow-lru-locking.patch
^ permalink raw reply [flat|nested] 3+ messages in thread
* + mm-thp-narrow-lru-locking.patch added to -mm tree
@ 2020-11-16 21:56 akpm
0 siblings, 0 replies; 3+ messages in thread
From: akpm @ 2020-11-16 21:56 UTC (permalink / raw)
To: aarcange, alex.shi, alexander.duyck, aryabinin, daniel.m.jordan,
hannes, hughd, iamjoonsoo.kim, jannh, khlebnikov,
kirill.shutemov, kirill, mgorman, mhocko, mhocko, mika.penttila,
minchan, mm-commits, richard.weiyang, rong.a.chen, shakeelb,
tglx, tj, vbabka, vdavydov.dev, willy, yang.shi, ying.huang
The patch titled
Subject: mm/thp: narrow lru locking
has been added to the -mm tree. Its filename is
mm-thp-narrow-lru-locking.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-narrow-lru-locking.patch
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-narrow-lru-locking.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/thp: narrow lru locking
lru_lock and page cache xa_lock have no obvious reason to be taken one way
round or the other: until now, lru_lock has been taken before page cache
xa_lock, when splitting a THP; but nothing else takes them together.
Reverse that ordering: let's narrow the lru locking - but leave
local_irq_disable to block interrupts throughout, like before.
Hugh Dickins point: split_huge_page_to_list() was already silly, to be
using the _irqsave variant: it's just been taking sleeping locks, so would
already be broken if entered with interrupts enabled. So we can save
passing flags argument down to __split_huge_page().
Why change the lock ordering here? That was hard to decide. One reason:
when this series reaches per-memcg lru locking, it relies on the THP's
memcg to be stable when taking the lru_lock: that is now done after the
THP's refcount has been frozen, which ensures page memcg cannot change.
Another reason: previously, lock_page_memcg()'s move_lock was presumed to
nest inside lru_lock; but now lru_lock must nest inside (page cache lock
inside) move_lock, so it becomes possible to use lock_page_memcg() to
stabilize page memcg before taking its lru_lock. That is not the
mechanism used in this series, but it is an option we want to keep open.
[hughd@google.com: rewrite commit log]
Link: https://lkml.kernel.org/r/1604566549-62481-5-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
--- a/mm/huge_memory.c~mm-thp-narrow-lru-locking
+++ a/mm/huge_memory.c
@@ -2432,7 +2432,7 @@ static void __split_huge_page_tail(struc
}
static void __split_huge_page(struct page *page, struct list_head *list,
- pgoff_t end, unsigned long flags)
+ pgoff_t end)
{
struct page *head = compound_head(page);
pg_data_t *pgdat = page_pgdat(head);
@@ -2442,8 +2442,6 @@ static void __split_huge_page(struct pag
unsigned int nr = thp_nr_pages(head);
int i;
- lruvec = mem_cgroup_page_lruvec(head, pgdat);
-
/* complete memcg works before add pages to LRU */
mem_cgroup_split_huge_fixup(head);
@@ -2455,6 +2453,11 @@ static void __split_huge_page(struct pag
xa_lock(&swap_cache->i_pages);
}
+ /* prevent PageLRU to go away from under us, and freeze lru stats */
+ spin_lock(&pgdat->lru_lock);
+
+ lruvec = mem_cgroup_page_lruvec(head, pgdat);
+
for (i = nr - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond i_size: drop them from page cache */
@@ -2474,6 +2477,8 @@ static void __split_huge_page(struct pag
}
ClearPageCompound(head);
+ spin_unlock(&pgdat->lru_lock);
+ /* Caller disabled irqs, so they are still disabled here */
split_page_owner(head, nr);
@@ -2491,8 +2496,7 @@ static void __split_huge_page(struct pag
page_ref_add(head, 2);
xa_unlock(&head->mapping->i_pages);
}
-
- spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+ local_irq_enable();
remap_page(head, nr);
@@ -2638,12 +2642,10 @@ bool can_split_huge_page(struct page *pa
int split_huge_page_to_list(struct page *page, struct list_head *list)
{
struct page *head = compound_head(page);
- struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
int count, mapcount, extra_pins, ret;
- unsigned long flags;
pgoff_t end;
VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2704,9 +2706,8 @@ int split_huge_page_to_list(struct page
unmap_page(head);
VM_BUG_ON_PAGE(compound_mapcount(head), head);
- /* prevent PageLRU to go away from under us, and freeze lru stats */
- spin_lock_irqsave(&pgdata->lru_lock, flags);
-
+ /* block interrupt reentry in xa_lock and spinlock */
+ local_irq_disable();
if (mapping) {
XA_STATE(xas, &mapping->i_pages, page_index(head));
@@ -2736,7 +2737,7 @@ int split_huge_page_to_list(struct page
__dec_lruvec_page_state(head, NR_FILE_THPS);
}
- __split_huge_page(page, list, end, flags);
+ __split_huge_page(page, list, end);
ret = 0;
} else {
if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
@@ -2750,7 +2751,7 @@ int split_huge_page_to_list(struct page
spin_unlock(&ds_queue->split_queue_lock);
fail: if (mapping)
xa_unlock(&mapping->i_pages);
- spin_unlock_irqrestore(&pgdata->lru_lock, flags);
+ local_irq_enable();
remap_page(head, thp_nr_pages(head));
ret = -EBUSY;
}
_
Patches currently in -mm which might be from alex.shi@linux.alibaba.com are
mm-filemap-add-static-for-function-__add_to_page_cache_locked.patch
fs-ntfs-remove-unused-varibles.patch
fs-ntfs-remove-unused-varible-attr_len.patch
mm-memcg-update-page-struct-member-in-comments.patch
mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
mm-thp-use-head-for-head-page-in-lru_add_page_tail.patch
mm-thp-simplify-lru_add_page_tail.patch
mm-thp-narrow-lru-locking.patch
mm-vmscan-remove-unnecessary-lruvec-adding.patch
mm-rmap-stop-store-reordering-issue-on-page-mapping.patch
mm-rmap-stop-store-reordering-issue-on-page-mapping-fix.patch
mm-memcg-add-debug-checking-in-lock_page_memcg.patch
mm-swapc-fold-vm-event-pgrotated-into-pagevec_move_tail_fn.patch
mm-lru-move-lock-into-lru_note_cost.patch
mm-vmscan-remove-lruvec-reget-in-move_pages_to_lru.patch
mm-mlock-remove-lru_lock-on-testclearpagemlocked.patch
mm-mlock-remove-__munlock_isolate_lru_page.patch
mm-lru-introduce-testclearpagelru.patch
mm-compaction-do-page-isolation-first-in-compaction.patch
mm-swapc-serialize-memcg-changes-in-pagevec_lru_move_fn.patch
mm-lru-replace-pgdat-lru_lock-with-lruvec-lock.patch
mm-lru-replace-pgdat-lru_lock-with-lruvec-lock-fix.patch
mm-lru-replace-pgdat-lru_lock-with-lruvec-lock-fix-2.patch
mm-lru-introduce-the-relock_page_lruvec-function-fix.patch
docs-vm-remove-unused-3-items-explanation-for-proc-vmstat.patch
mm-memcg-bail-early-from-swap-accounting-if-memcg-disabled.patch
mm-memcg-warning-on-memcg-after-readahead-page-charged.patch
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-11-16 21:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-02 22:14 + mm-thp-narrow-lru-locking.patch added to -mm tree akpm
2020-08-15 0:29 incoming Andrew Morton
2020-08-19 3:50 ` + mm-thp-narrow-lru-locking.patch added to -mm tree Andrew Morton
2020-11-16 21:56 akpm
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.