linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>,
	Matthew Wilcox <willy@infradead.org>,
	ldufour@linux.vnet.ibm.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	acme@kernel.org, alexander.shishkin@linux.intel.com,
	jolsa@redhat.com, namhyung@kernel.org,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC v2 PATCH 2/2] mm: mmap: zap pages with read mmap_sem for large mapping
Date: Wed, 20 Jun 2018 09:18:17 +0200	[thread overview]
Message-ID: <20180620071817.GJ13685@dhcp22.suse.cz> (raw)
In-Reply-To: <BFD6A249-B1D7-43D5-8D7C-9FAED4A168A1@gmail.com>

On Tue 19-06-18 17:31:27, Nadav Amit wrote:
> at 4:08 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> 
> > 
> > 
> > On 6/19/18 3:17 PM, Nadav Amit wrote:
> >> at 4:34 PM, Yang Shi <yang.shi@linux.alibaba.com>
> >>  wrote:
> >> 
> >> 
> >>> When running some mmap/munmap scalability tests with large memory (i.e.
> >>> 
> >>>> 300GB), the below hung task issue may happen occasionally.
> >>>> 
> >>> INFO: task ps:14018 blocked for more than 120 seconds.
> >>>       Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
> >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> >>> message.
> >>> ps              D    0 14018      1 0x00000004
> >>> 
> >>> 
> >> (snip)
> >> 
> >> 
> >>> Zapping pages is the most time consuming part, according to the
> >>> suggestion from Michal Hock [1], zapping pages can be done with holding
> >>> read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write
> >>> mmap_sem to manipulate vmas.
> >>> 
> >> Does munmap() == MADV_DONTNEED + munmap() ?
> > 
> > Not exactly the same. So, I basically copied the page zapping used by munmap instead of calling MADV_DONTNEED.
> > 
> >> 
> >> For example, what happens with userfaultfd in this case? Can you get an
> >> extra #PF, which would be visible to userspace, before the munmap is
> >> finished?
> >> 
> > 
> > userfaultfd is handled by regular munmap path. So, no change to userfaultfd part.
> 
> Right. I see it now.
> 
> > 
> >> 
> >> In addition, would it be ok for the user to potentially get a zeroed page in
> >> the time window after the MADV_DONTNEED finished removing a PTE and before
> >> the munmap() is done?
> >> 
> > 
> > This should be undefined behavior according to Michal. This has been discussed in  https://lwn.net/Articles/753269/.
> 
> Thanks for the reference.
> 
> Reading the man page I see: "All pages containing a part of the indicated
> range are unmapped, and subsequent references to these pages will generate
> SIGSEGV.”

Yes, this is true but I guess what Yang Shi meant was that an userspace
access racing with munmap is not well defined. You never know whether
you get your data, #PTF or SEGV because it depends on timing. The user
visible change might be that you lose content and get zero page instead
if you hit the race window while we are unmapping which was not possible
before. But whouldn't such an access pattern be buggy anyway? You need
some form of external synchronization AFAICS.

But maybe some userspace depends on "getting right data or get SEGV"
semantic. If we have to preserve that then we can come up with a VM_DEAD
flag set before we tear it down and force the SEGV on the #PF path.
Something similar we already do for MMF_UNSTABLE.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-06-20  9:17 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-18 23:34 [RFC v2 0/2] mm: zap pages with read mmap_sem in munmap for large mapping Yang Shi
2018-06-18 23:34 ` [RFC v2 PATCH 1/2] uprobes: make vma_has_uprobes non-static Yang Shi
2018-06-18 23:34 ` [RFC v2 PATCH 2/2] mm: mmap: zap pages with read mmap_sem for large mapping Yang Shi
2018-06-19 10:02   ` Peter Zijlstra
2018-06-19 21:13     ` Yang Shi
2018-06-20  7:17       ` Michal Hocko
2018-06-20 16:23         ` Yang Shi
2018-06-19 22:17   ` Nadav Amit
     [not found]     ` <158a4e4c-d290-77c4-a595-71332ede392b@linux.alibaba.com>
2018-06-20  0:31       ` Nadav Amit
2018-06-20  7:18         ` Michal Hocko [this message]
2018-06-20 17:12           ` Nadav Amit
2018-06-20 18:42           ` Yang Shi
2018-06-23  1:01             ` Yang Shi
2018-06-25  9:14               ` Michal Hocko
2018-06-26  0:06           ` Yang Shi
2018-06-26  7:43             ` Peter Zijlstra
2018-06-27  1:03               ` Yang Shi
2018-06-27  7:24                 ` Michal Hocko
2018-06-27 17:23                   ` Yang Shi
2018-06-28 11:51                     ` Michal Hocko
2018-06-28 19:10                       ` Yang Shi
2018-06-29  0:59                         ` Yang Shi
2018-06-29 11:39                           ` Michal Hocko
2018-06-29 16:50                             ` Yang Shi
2018-06-29 11:34                         ` Michal Hocko
2018-06-29 16:45                           ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180620071817.GJ13685@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jolsa@redhat.com \
    --cc=ldufour@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).