From: Yang Shi <yang.shi@linux.alibaba.com> To: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org>, Nadav Amit <nadav.amit@gmail.com>, Matthew Wilcox <willy@infradead.org>, ldufour@linux.vnet.ibm.com, Andrew Morton <akpm@linux-foundation.org>, Ingo Molnar <mingo@redhat.com>, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, "open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>, linux-kernel@vger.kernel.org Subject: Re: [RFC v2 PATCH 2/2] mm: mmap: zap pages with read mmap_sem for large mapping Date: Thu, 28 Jun 2018 12:10:10 -0700 [thread overview] Message-ID: <2ecdb667-f4de-673d-6a5f-ee50df505d0c@linux.alibaba.com> (raw) In-Reply-To: <20180628115101.GE32348@dhcp22.suse.cz> On 6/28/18 4:51 AM, Michal Hocko wrote: > On Wed 27-06-18 10:23:39, Yang Shi wrote: >> >> On 6/27/18 12:24 AM, Michal Hocko wrote: >>> On Tue 26-06-18 18:03:34, Yang Shi wrote: >>>> On 6/26/18 12:43 AM, Peter Zijlstra wrote: >>>>> On Mon, Jun 25, 2018 at 05:06:23PM -0700, Yang Shi wrote: >>>>>> By looking this deeper, we may not be able to cover all the unmapping range >>>>>> for VM_DEAD, for example, if the start addr is in the middle of a vma. We >>>>>> can't set VM_DEAD to that vma since that would trigger SIGSEGV for still >>>>>> mapped area. >>>>>> >>>>>> splitting can't be done with read mmap_sem held, so maybe just set VM_DEAD >>>>>> to non-overlapped vmas. Access to overlapped vmas (first and last) will >>>>>> still have undefined behavior. >>>>> Acquire mmap_sem for writing, split, mark VM_DEAD, drop mmap_sem. Acquire >>>>> mmap_sem for reading, madv_free drop mmap_sem. Acquire mmap_sem for >>>>> writing, free everything left, drop mmap_sem. >>>>> >>>>> ? >>>>> >>>>> Sure, you acquire the lock 3 times, but both write instances should be >>>>> 'short', and I suppose you can do a demote between 1 and 2 if you care. >>>> Thanks, Peter. Yes, by looking the code and trying two different approaches, >>>> it looks this approach is the most straight-forward one. >>> Yes, you just have to be careful about the max vma count limit. >> Yes, we should just need copy what do_munmap does as below: >> >> if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) >> return -ENOMEM; >> >> If the mas map count limit has been reached, it will return failure before >> zapping mappings. > Yeah, but as soon as you drop the lock and retake it, somebody might > have changed the adddress space and we might get inconsistency. > > So I am wondering whether we really need upgrade_read (to promote read > to write lock) and do the > down_write > split & set up VM_DEAD > downgrade_write > unmap > upgrade_read > zap ptes > up_write I'm supposed address space changing just can be done by mmap, mremap, mprotect. If so, we may utilize the new VM_DEAD flag. If the VM_DEAD flag is set for the vma, just return failure since it is being unmapped. Does it sounds reasonable? Thanks, Yang > > looks terrible, no question about that, but we won't drop the mmap sem > at any time.
WARNING: multiple messages have this Message-ID (diff)
From: Yang Shi <yang.shi@linux.alibaba.com> To: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org>, Nadav Amit <nadav.amit@gmail.com>, Matthew Wilcox <willy@infradead.org>, ldufour@linux.vnet.ibm.com, Andrew Morton <akpm@linux-foundation.org>, Ingo Molnar <mingo@redhat.com>, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, "open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>, linux-kernel@vger.kernel.org Subject: Re: [RFC v2 PATCH 2/2] mm: mmap: zap pages with read mmap_sem for large mapping Date: Thu, 28 Jun 2018 12:10:10 -0700 [thread overview] Message-ID: <2ecdb667-f4de-673d-6a5f-ee50df505d0c@linux.alibaba.com> (raw) In-Reply-To: <20180628115101.GE32348@dhcp22.suse.cz> On 6/28/18 4:51 AM, Michal Hocko wrote: > On Wed 27-06-18 10:23:39, Yang Shi wrote: >> >> On 6/27/18 12:24 AM, Michal Hocko wrote: >>> On Tue 26-06-18 18:03:34, Yang Shi wrote: >>>> On 6/26/18 12:43 AM, Peter Zijlstra wrote: >>>>> On Mon, Jun 25, 2018 at 05:06:23PM -0700, Yang Shi wrote: >>>>>> By looking this deeper, we may not be able to cover all the unmapping range >>>>>> for VM_DEAD, for example, if the start addr is in the middle of a vma. We >>>>>> can't set VM_DEAD to that vma since that would trigger SIGSEGV for still >>>>>> mapped area. >>>>>> >>>>>> splitting can't be done with read mmap_sem held, so maybe just set VM_DEAD >>>>>> to non-overlapped vmas. Access to overlapped vmas (first and last) will >>>>>> still have undefined behavior. >>>>> Acquire mmap_sem for writing, split, mark VM_DEAD, drop mmap_sem. Acquire >>>>> mmap_sem for reading, madv_free drop mmap_sem. Acquire mmap_sem for >>>>> writing, free everything left, drop mmap_sem. >>>>> >>>>> ? >>>>> >>>>> Sure, you acquire the lock 3 times, but both write instances should be >>>>> 'short', and I suppose you can do a demote between 1 and 2 if you care. >>>> Thanks, Peter. Yes, by looking the code and trying two different approaches, >>>> it looks this approach is the most straight-forward one. >>> Yes, you just have to be careful about the max vma count limit. >> Yes, we should just need copy what do_munmap does as below: >> >> if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count) >> A A A A A A A A A return -ENOMEM; >> >> If the mas map count limit has been reached, it will return failure before >> zapping mappings. > Yeah, but as soon as you drop the lock and retake it, somebody might > have changed the adddress space and we might get inconsistency. > > So I am wondering whether we really need upgrade_read (to promote read > to write lock) and do the > down_write > split & set up VM_DEAD > downgrade_write > unmap > upgrade_read > zap ptes > up_write I'm supposed address space changing just can be done by mmap, mremap, mprotect. If so, we may utilize the new VM_DEAD flag. If the VM_DEAD flag is set for the vma, just return failure since it is being unmapped. Does it sounds reasonable? Thanks, Yang > > looks terrible, no question about that, but we won't drop the mmap sem > at any time.
next prev parent reply other threads:[~2018-06-28 19:10 UTC|newest] Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-06-18 23:34 [RFC v2 0/2] mm: zap pages with read mmap_sem in munmap for large mapping Yang Shi 2018-06-18 23:34 ` Yang Shi 2018-06-18 23:34 ` [RFC v2 PATCH 1/2] uprobes: make vma_has_uprobes non-static Yang Shi 2018-06-18 23:34 ` [RFC v2 PATCH 2/2] mm: mmap: zap pages with read mmap_sem for large mapping Yang Shi 2018-06-19 10:02 ` Peter Zijlstra 2018-06-19 21:13 ` Yang Shi 2018-06-20 7:17 ` Michal Hocko 2018-06-20 16:23 ` Yang Shi 2018-06-19 22:17 ` Nadav Amit 2018-06-19 23:08 ` Yang Shi 2018-06-20 0:31 ` Nadav Amit 2018-06-20 7:18 ` Michal Hocko 2018-06-20 7:18 ` Michal Hocko 2018-06-20 17:12 ` Nadav Amit 2018-06-20 18:42 ` Yang Shi 2018-06-20 18:42 ` Yang Shi 2018-06-23 1:01 ` Yang Shi 2018-06-25 9:14 ` Michal Hocko 2018-06-26 0:06 ` Yang Shi 2018-06-26 0:06 ` Yang Shi 2018-06-26 7:43 ` Peter Zijlstra 2018-06-27 1:03 ` Yang Shi 2018-06-27 7:24 ` Michal Hocko 2018-06-27 17:23 ` Yang Shi 2018-06-27 17:23 ` Yang Shi 2018-06-28 11:51 ` Michal Hocko 2018-06-28 11:51 ` Michal Hocko 2018-06-28 19:10 ` Yang Shi [this message] 2018-06-28 19:10 ` Yang Shi 2018-06-29 0:59 ` Yang Shi 2018-06-29 0:59 ` Yang Shi 2018-06-29 11:39 ` Michal Hocko 2018-06-29 11:39 ` Michal Hocko 2018-06-29 16:50 ` Yang Shi 2018-06-29 16:50 ` Yang Shi 2018-06-29 11:34 ` Michal Hocko 2018-06-29 11:34 ` Michal Hocko 2018-06-29 16:45 ` Yang Shi 2018-06-29 16:45 ` Yang Shi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2ecdb667-f4de-673d-6a5f-ee50df505d0c@linux.alibaba.com \ --to=yang.shi@linux.alibaba.com \ --cc=acme@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=alexander.shishkin@linux.intel.com \ --cc=jolsa@redhat.com \ --cc=ldufour@linux.vnet.ibm.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=mingo@redhat.com \ --cc=nadav.amit@gmail.com \ --cc=namhyung@kernel.org \ --cc=peterz@infradead.org \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.