All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: mhocko@kernel.org, willy@infradead.org,
	ldufour@linux.vnet.ibm.com, akpm@linux-foundation.org,
	peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	alexander.shishkin@linux.intel.com, jolsa@redhat.com,
	namhyung@kernel.org, tglx@linutronix.de, hpa@zytor.com,
	linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping
Date: Tue, 3 Jul 2018 11:07:57 +0300	[thread overview]
Message-ID: <20180703080757.jryyxefaehil3yt3@kshutemo-mobl1> (raw)
In-Reply-To: <17c04c38-9569-9b02-2db2-7913a7debb46@linux.alibaba.com>

On Mon, Jul 02, 2018 at 10:19:32AM -0700, Yang Shi wrote:
> 
> 
> On 7/2/18 5:33 AM, Kirill A. Shutemov wrote:
> > On Sat, Jun 30, 2018 at 06:39:44AM +0800, Yang Shi wrote:
> > > When running some mmap/munmap scalability tests with large memory (i.e.
> > > > 300GB), the below hung task issue may happen occasionally.
> > > INFO: task ps:14018 blocked for more than 120 seconds.
> > >         Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
> > >   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> > > message.
> > >   ps              D    0 14018      1 0x00000004
> > >    ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
> > >    ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
> > >    00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
> > >   Call Trace:
> > >    [<ffffffff817154d0>] ? __schedule+0x250/0x730
> > >    [<ffffffff817159e6>] schedule+0x36/0x80
> > >    [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
> > >    [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
> > >    [<ffffffff81717db0>] down_read+0x20/0x40
> > >    [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
> > >    [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
> > >    [<ffffffff81241d87>] __vfs_read+0x37/0x150
> > >    [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
> > >    [<ffffffff81242266>] vfs_read+0x96/0x130
> > >    [<ffffffff812437b5>] SyS_read+0x55/0xc0
> > >    [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
> > > 
> > > It is because munmap holds mmap_sem from very beginning to all the way
> > > down to the end, and doesn't release it in the middle. When unmapping
> > > large mapping, it may take long time (take ~18 seconds to unmap 320GB
> > > mapping with every single page mapped on an idle machine).
> > > 
> > > It is because munmap holds mmap_sem from very beginning to all the way
> > > down to the end, and doesn't release it in the middle. When unmapping
> > > large mapping, it may take long time (take ~18 seconds to unmap 320GB
> > > mapping with every single page mapped on an idle machine).
> > > 
> > > Zapping pages is the most time consuming part, according to the
> > > suggestion from Michal Hock [1], zapping pages can be done with holding
> > > read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write
> > > mmap_sem to cleanup vmas. All zapped vmas will have VM_DEAD flag set,
> > > the page fault to VM_DEAD vma will trigger SIGSEGV.
> > > 
> > > Define large mapping size thresh as PUD size or 1GB, just zap pages with
> > > read mmap_sem for mappings which are >= thresh value.
> > > 
> > > If the vma has VM_LOCKED | VM_HUGETLB | VM_PFNMAP or uprobe, then just
> > > fallback to regular path since unmapping those mappings need acquire
> > > write mmap_sem.
> > > 
> > > For the time being, just do this in munmap syscall path. Other
> > > vm_munmap() or do_munmap() call sites remain intact for stability
> > > reason.
> > > 
> > > The below is some regression and performance data collected on a machine
> > > with 32 cores of E5-2680 @ 2.70GHz and 384GB memory.
> > > 
> > > With the patched kernel, write mmap_sem hold time is dropped to us level
> > > from second.
> > > 
> > > [1] https://lwn.net/Articles/753269/
> > > 
> > > Cc: Michal Hocko <mhocko@kernel.org>
> > > Cc: Matthew Wilcox <willy@infradead.org>
> > > Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> > > ---
> > >   mm/mmap.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > >   1 file changed, 134 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 87dcf83..d61e08b 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -2763,6 +2763,128 @@ static int munmap_lookup_vma(struct mm_struct *mm, struct vm_area_struct **vma,
> > >   	return 1;
> > >   }
> > > +/* Consider PUD size or 1GB mapping as large mapping */
> > > +#ifdef HPAGE_PUD_SIZE
> > > +#define LARGE_MAP_THRESH	HPAGE_PUD_SIZE
> > > +#else
> > > +#define LARGE_MAP_THRESH	(1 * 1024 * 1024 * 1024)
> > > +#endif
> > PUD_SIZE is defined everywhere.
> 
> If THP is defined, otherwise it is:
> 
> #define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; })

I'm talking about PUD_SIZE, not HPAGE_PUD_SIZE.

-- 
 Kirill A. Shutemov

  reply	other threads:[~2018-07-03  8:08 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29 22:39 [RFC v3 PATCH 0/5] mm: zap pages with read mmap_sem in munmap for large mapping Yang Shi
2018-06-29 22:39 ` Yang Shi
2018-06-29 22:39 ` [RFC v3 PATCH 1/5] uprobes: make vma_has_uprobes non-static Yang Shi
2018-06-29 22:39 ` [RFC v3 PATCH 2/5] mm: introduce VM_DEAD flag Yang Shi
2018-07-02 13:40   ` Michal Hocko
2018-06-29 22:39 ` [RFC v3 PATCH 3/5] mm: refactor do_munmap() to extract the common part Yang Shi
2018-07-02 13:42   ` Michal Hocko
2018-07-02 16:59     ` Yang Shi
2018-07-02 17:58       ` Michal Hocko
2018-07-02 18:02         ` Yang Shi
2018-06-29 22:39 ` [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping Yang Shi
2018-06-30  1:28   ` Andrew Morton
2018-06-30  2:10     ` Yang Shi
2018-06-30  1:35   ` Andrew Morton
2018-06-30  2:28     ` Yang Shi
2018-06-30  3:15       ` Andrew Morton
2018-06-30  4:26         ` Yang Shi
2018-07-03  0:01           ` Yang Shi
2018-07-03  0:01             ` Yang Shi
2018-07-02 14:05         ` Michal Hocko
2018-07-02 20:48           ` Andrew Morton
2018-07-03  6:09             ` Michal Hocko
2018-07-03 16:53               ` Yang Shi
2018-07-03 18:22               ` Yang Shi
2018-07-04  8:13                 ` Michal Hocko
2018-07-02 12:33   ` Kirill A. Shutemov
2018-07-02 12:49     ` Michal Hocko
2018-07-03  8:12       ` Kirill A. Shutemov
2018-07-03  8:27         ` Michal Hocko
2018-07-03  9:19           ` Kirill A. Shutemov
2018-07-03 11:34             ` Michal Hocko
2018-07-03 12:14               ` Kirill A. Shutemov
2018-07-03 17:00                 ` Yang Shi
2018-07-02 17:19     ` Yang Shi
2018-07-03  8:07       ` Kirill A. Shutemov [this message]
2018-07-02 13:53   ` Michal Hocko
2018-07-02 17:07     ` Yang Shi
2018-06-29 22:39 ` [RFC v3 PATCH 5/5] x86: check VM_DEAD flag in page fault Yang Shi
2018-07-02  8:45   ` Laurent Dufour
2018-07-02 12:15     ` Michal Hocko
2018-07-02 12:26       ` Laurent Dufour
2018-07-02 12:45         ` Michal Hocko
2018-07-02 13:33           ` Laurent Dufour
2018-07-02 13:37             ` Michal Hocko
2018-07-02 17:24               ` Yang Shi
2018-07-02 17:57                 ` Michal Hocko
2018-07-02 18:10                   ` Yang Shi
2018-07-03  6:17                     ` Michal Hocko
2018-07-03 16:50                       ` Yang Shi
2018-07-02 13:39 ` [RFC v3 PATCH 0/5] mm: zap pages with read mmap_sem in munmap for large mapping Michal Hocko
2018-07-02 13:39   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180703080757.jryyxefaehil3yt3@kshutemo-mobl1 \
    --to=kirill@shutemov.name \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=ldufour@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.