From: Yang Shi <yang.shi@linux.alibaba.com> To: mhocko@kernel.org, willy@infradead.org, ldufour@linux.vnet.ibm.com, akpm@linux-foundation.org, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, tglx@linutronix.de, hpa@zytor.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v3 PATCH 0/5] mm: zap pages with read mmap_sem in munmap for large mapping Date: Sat, 30 Jun 2018 06:39:40 +0800 [thread overview] Message-ID: <1530311985-31251-1-git-send-email-yang.shi@linux.alibaba.com> (raw) Background: Recently, when we ran some vm scalability tests on machines with large memory, we ran into a couple of mmap_sem scalability issues when unmapping large memory space, please refer to https://lkml.org/lkml/2017/12/14/733 and https://lkml.org/lkml/2018/2/20/576. History: Then akpm suggested to unmap large mapping section by section and drop mmap_sem at a time to mitigate it (see https://lkml.org/lkml/2018/3/6/784). V1 patch series was submitted to the mailing list per Andrew’s suggestion (see https://lkml.org/lkml/2018/3/20/786). Then I received a lot great feedback and suggestions. Then this topic was discussed on LSFMM summit 2018. In the summit, Michal Hock suggested (also in the v1 patches review) to try "two phases" approach. Zapping pages with read mmap_sem, then doing via cleanup with write mmap_sem (for discussion detail, see https://lwn.net/Articles/753269/) Changelog: v2 —> v3: * Refactor do_munmap code to extract the common part per Peter's sugestion * Introduced VM_DEAD flag per Michal's suggestion. Just handled VM_DEAD in x86's page fault handler for the time being. Other architectures will be covered once the patch series is reviewed * Now lookup vma (find and split) and set VM_DEAD flag with write mmap_sem, then zap mapping with read mmap_sem, then clean up pgtables and vmas with write mmap_sem per Peter's suggestion v1 —> v2: * Re-implemented the code per the discussion on LSFMM summit Regression and performance data: Test is run on a machine with 32 cores of E5-2680 @ 2.70GHz and 384GB memory Regression test with full LTP and trinity (munmap) with setting thresh to 4K in the code (just for regression test only) so that the new code can be covered better and trinity (munmap) test manipulates 4K mapping. No regression issue is reported and the system survives under trinity (munmap) test for 4 hours until I abort the test. Throughput of page faults (#/s) with the below stress-ng test: stress-ng --mmap 0 --mmap-bytes 80G --mmap-file --metrics --perf --timeout 600s pristine patched delta 89.41K/sec 97.29K/sec +8.8% The result is not very stable, and depends on timing. So, this is just for reference. Yang Shi (5): uprobes: make vma_has_uprobes non-static mm: introduce VM_DEAD flag mm: refactor do_munmap() to extract the common part mm: mmap: zap pages with read mmap_sem for large mapping x86: check VM_DEAD flag in page fault arch/x86/mm/fault.c | 4 ++ include/linux/mm.h | 6 +++ include/linux/uprobes.h | 7 +++ kernel/events/uprobes.c | 2 +- mm/mmap.c | 243 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------- 5 files changed, 224 insertions(+), 38 deletions(-)
WARNING: multiple messages have this Message-ID (diff)
From: Yang Shi <yang.shi@linux.alibaba.com> To: mhocko@kernel.org, willy@infradead.org, ldufour@linux.vnet.ibm.com, akpm@linux-foundation.org, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, tglx@linutronix.de, hpa@zytor.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v3 PATCH 0/5] mm: zap pages with read mmap_sem in munmap for large mapping Date: Sat, 30 Jun 2018 06:39:40 +0800 [thread overview] Message-ID: <1530311985-31251-1-git-send-email-yang.shi@linux.alibaba.com> (raw) Background: Recently, when we ran some vm scalability tests on machines with large memory, we ran into a couple of mmap_sem scalability issues when unmapping large memory space, please refer to https://lkml.org/lkml/2017/12/14/733 and https://lkml.org/lkml/2018/2/20/576. History: Then akpm suggested to unmap large mapping section by section and drop mmap_sem at a time to mitigate it (see https://lkml.org/lkml/2018/3/6/784). V1 patch series was submitted to the mailing list per Andrewa??s suggestion (see https://lkml.org/lkml/2018/3/20/786). Then I received a lot great feedback and suggestions. Then this topic was discussed on LSFMM summit 2018. In the summit, Michal Hock suggested (also in the v1 patches review) to try "two phases" approach. Zapping pages with read mmap_sem, then doing via cleanup with write mmap_sem (for discussion detail, see https://lwn.net/Articles/753269/) Changelog: v2 a??> v3: * Refactor do_munmap code to extract the common part per Peter's sugestion * Introduced VM_DEAD flag per Michal's suggestion. Just handled VM_DEAD in x86's page fault handler for the time being. Other architectures will be covered once the patch series is reviewed * Now lookup vma (find and split) and set VM_DEAD flag with write mmap_sem, then zap mapping with read mmap_sem, then clean up pgtables and vmas with write mmap_sem per Peter's suggestion v1 a??> v2: * Re-implemented the code per the discussion on LSFMM summit Regression and performance data: Test is run on a machine with 32 cores of E5-2680 @ 2.70GHz and 384GB memory Regression test with full LTP and trinity (munmap) with setting thresh to 4K in the code (just for regression test only) so that the new code can be covered better and trinity (munmap) test manipulates 4K mapping. No regression issue is reported and the system survives under trinity (munmap) test for 4 hours until I abort the test. Throughput of page faults (#/s) with the below stress-ng test: stress-ng --mmap 0 --mmap-bytes 80G --mmap-file --metrics --perf --timeout 600s pristine patched delta 89.41K/sec 97.29K/sec +8.8% The result is not very stable, and depends on timing. So, this is just for reference. Yang Shi (5): uprobes: make vma_has_uprobes non-static mm: introduce VM_DEAD flag mm: refactor do_munmap() to extract the common part mm: mmap: zap pages with read mmap_sem for large mapping x86: check VM_DEAD flag in page fault arch/x86/mm/fault.c | 4 ++ include/linux/mm.h | 6 +++ include/linux/uprobes.h | 7 +++ kernel/events/uprobes.c | 2 +- mm/mmap.c | 243 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------- 5 files changed, 224 insertions(+), 38 deletions(-)
next reply other threads:[~2018-06-29 22:40 UTC|newest] Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-06-29 22:39 Yang Shi [this message] 2018-06-29 22:39 ` [RFC v3 PATCH 0/5] mm: zap pages with read mmap_sem in munmap for large mapping Yang Shi 2018-06-29 22:39 ` [RFC v3 PATCH 1/5] uprobes: make vma_has_uprobes non-static Yang Shi 2018-06-29 22:39 ` [RFC v3 PATCH 2/5] mm: introduce VM_DEAD flag Yang Shi 2018-07-02 13:40 ` Michal Hocko 2018-06-29 22:39 ` [RFC v3 PATCH 3/5] mm: refactor do_munmap() to extract the common part Yang Shi 2018-07-02 13:42 ` Michal Hocko 2018-07-02 16:59 ` Yang Shi 2018-07-02 17:58 ` Michal Hocko 2018-07-02 18:02 ` Yang Shi 2018-06-29 22:39 ` [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping Yang Shi 2018-06-30 1:28 ` Andrew Morton 2018-06-30 2:10 ` Yang Shi 2018-06-30 1:35 ` Andrew Morton 2018-06-30 2:28 ` Yang Shi 2018-06-30 3:15 ` Andrew Morton 2018-06-30 4:26 ` Yang Shi 2018-07-03 0:01 ` Yang Shi 2018-07-03 0:01 ` Yang Shi 2018-07-02 14:05 ` Michal Hocko 2018-07-02 20:48 ` Andrew Morton 2018-07-03 6:09 ` Michal Hocko 2018-07-03 16:53 ` Yang Shi 2018-07-03 18:22 ` Yang Shi 2018-07-04 8:13 ` Michal Hocko 2018-07-02 12:33 ` Kirill A. Shutemov 2018-07-02 12:49 ` Michal Hocko 2018-07-03 8:12 ` Kirill A. Shutemov 2018-07-03 8:27 ` Michal Hocko 2018-07-03 9:19 ` Kirill A. Shutemov 2018-07-03 11:34 ` Michal Hocko 2018-07-03 12:14 ` Kirill A. Shutemov 2018-07-03 17:00 ` Yang Shi 2018-07-02 17:19 ` Yang Shi 2018-07-03 8:07 ` Kirill A. Shutemov 2018-07-02 13:53 ` Michal Hocko 2018-07-02 17:07 ` Yang Shi 2018-06-29 22:39 ` [RFC v3 PATCH 5/5] x86: check VM_DEAD flag in page fault Yang Shi 2018-07-02 8:45 ` Laurent Dufour 2018-07-02 12:15 ` Michal Hocko 2018-07-02 12:26 ` Laurent Dufour 2018-07-02 12:45 ` Michal Hocko 2018-07-02 13:33 ` Laurent Dufour 2018-07-02 13:37 ` Michal Hocko 2018-07-02 17:24 ` Yang Shi 2018-07-02 17:57 ` Michal Hocko 2018-07-02 18:10 ` Yang Shi 2018-07-03 6:17 ` Michal Hocko 2018-07-03 16:50 ` Yang Shi 2018-07-02 13:39 ` [RFC v3 PATCH 0/5] mm: zap pages with read mmap_sem in munmap for large mapping Michal Hocko 2018-07-02 13:39 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1530311985-31251-1-git-send-email-yang.shi@linux.alibaba.com \ --to=yang.shi@linux.alibaba.com \ --cc=acme@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=alexander.shishkin@linux.intel.com \ --cc=hpa@zytor.com \ --cc=jolsa@redhat.com \ --cc=ldufour@linux.vnet.ibm.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=mingo@redhat.com \ --cc=namhyung@kernel.org \ --cc=peterz@infradead.org \ --cc=tglx@linutronix.de \ --cc=willy@infradead.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.