linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: akpm@linux-foundation.org, dan.j.williams@intel.com,
	mhocko@suse.com, keith.busch@intel.com,
	kirill.shutemov@linux.intel.com,
	alexander.h.duyck@linux.intel.com, ira.weiny@intel.com,
	andreyknvl@google.com, arunks@codeaurora.org, vbabka@suse.cz,
	cl@linux.com, riel@surriel.com, keescook@chromium.org,
	hannes@cmpxchg.org, npiggin@gmail.com,
	mathieu.desnoyers@efficios.com, shakeelb@google.com, guro@fb.com,
	aarcange@redhat.com, hughd@google.com, jglisse@redhat.com,
	mgorman@techsingularity.net, daniel.m.jordan@oracle.com,
	jannh@google.com, kilobyte@angband.pl, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 0/7] mm: process_vm_mmap() -- syscall for duplication a process mapping
Date: Wed, 29 May 2019 17:33:02 +0300	[thread overview]
Message-ID: <4b9a6b90-4d82-9d4d-466d-653f9024849f@virtuozzo.com> (raw)
In-Reply-To: <20190528161524.tn5sqzhmhgyuwrmy@box>

On 28.05.2019 19:15, Kirill A. Shutemov wrote:
> On Tue, May 28, 2019 at 12:15:16PM +0300, Kirill Tkhai wrote:
>> On 28.05.2019 02:30, Kirill A. Shutemov wrote:
>>> On Fri, May 24, 2019 at 05:00:32PM +0300, Kirill Tkhai wrote:
>>>> On 24.05.2019 14:52, Kirill A. Shutemov wrote:
>>>>> On Fri, May 24, 2019 at 01:45:50PM +0300, Kirill Tkhai wrote:
>>>>>> On 22.05.2019 18:22, Kirill A. Shutemov wrote:
>>>>>>> On Mon, May 20, 2019 at 05:00:01PM +0300, Kirill Tkhai wrote:
>>>>>>>> This patchset adds a new syscall, which makes possible
>>>>>>>> to clone a VMA from a process to current process.
>>>>>>>> The syscall supplements the functionality provided
>>>>>>>> by process_vm_writev() and process_vm_readv() syscalls,
>>>>>>>> and it may be useful in many situation.
>>>>>>>
>>>>>>> Kirill, could you explain how the change affects rmap and how it is safe.
>>>>>>>
>>>>>>> My concern is that the patchset allows to map the same page multiple times
>>>>>>> within one process or even map page allocated by child to the parrent.
>>>>>>>
>>>>>>> It was not allowed before.
>>>>>>>
>>>>>>> In the best case it makes reasoning about rmap substantially more difficult.
>>>>>>>
>>>>>>> But I'm worry it will introduce hard-to-debug bugs, like described in
>>>>>>> https://lwn.net/Articles/383162/.
>>>>>>
>>>>>> Andy suggested to unmap PTEs from source page table, and this make the single
>>>>>> page never be mapped in the same process twice. This is OK for my use case,
>>>>>> and here we will just do a small step "allow to inherit VMA by a child process",
>>>>>> which we didn't have before this. If someone still needs to continue the work
>>>>>> to allow the same page be mapped twice in a single process in the future, this
>>>>>> person will have a supported basis we do in this small step. I believe, someone
>>>>>> like debugger may want to have this to make a fast snapshot of a process private
>>>>>> memory (when the task is stopped for a small time to get its memory). But for
>>>>>> me remapping is enough at the moment.
>>>>>>
>>>>>> What do you think about this?
>>>>>
>>>>> I don't think that unmapping alone will do. Consider the following
>>>>> scenario:
>>>>>
>>>>> 1. Task A creates and populates the mapping.
>>>>> 2. Task A forks. We have now Task B mapping the same pages, but
>>>>> write-protected.
>>>>> 3. Task B calls process_vm_mmap() and passes the mapping to the parent.
>>>>>
>>>>> After this Task A will have the same anon pages mapped twice.
>>>>
>>>> Ah, sure.
>>>>
>>>>> One possible way out would be to force CoW on all pages in the mapping,
>>>>> before passing the mapping to the new process.
>>>>
>>>> This will pop all swapped pages up, which is the thing the patchset aims
>>>> to prevent.
>>>>
>>>> Hm, what about allow remapping only VMA, which anon_vma::rb_root contain
>>>> only chain and which vma->anon_vma_chain contains single entry? This is
>>>> a vma, which were faulted, but its mm never were duplicated (or which
>>>> forks already died).
>>>
>>> The requirement for the VMA to be faulted (have any pages mapped) looks
>>> excessive to me, but the general idea may work.
>>>
>>> One issue I see is that userspace may not have full control to create such
>>> VMA. vma_merge() can merge the VMA to the next one without any consent
>>> from userspace and you'll get anon_vma inherited from the VMA you've
>>> justed merged with.
>>>
>>> I don't have any valid idea on how to get around this.
>>
>> Technically it is possible by creating boundary 1-page VMAs with another protection:
>> one above and one below the desired region, then map the desired mapping. But this
>> is not comfortable.
>>
>> I don't think it's difficult to find a natural limitation, which prevents mapping
>> a single page twice if we want to avoid this at least on start. Another suggestion:
>>
>> prohibit to map a remote process's VMA only in case of its vm_area_struct::anon_vma::root
>> is the same as root of one of local process's VMA.
>>
>> What about this?
> 
> I don't see anything immediately wrong with this, but it's still going to
> produce puzzling errors for a user. How would you document such limitation
> in the way it makes sense for userspace developer?

It's difficult, since the limitation is artificial.

I just may to suggest more strict limitation.

Something like "VMA may be remapped only as a whole region,
and only in the case of there were not fork() after VMA
appeared in a process (by mmap or remapping from another
remote process). In case of VMA were merged with a neighbouring
VMA, the same rules are applied to the neighbours.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e8834ac32b7..0bcd6f598e73 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -287,13 +287,17 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_5	37	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
+#define VM_HIGH_ARCH_5	BIT(VM_HIGH_ARCH_BIT_5)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
+#define VM_MAY_REMOTE_REMAP	VM_HIGH_ARCH_5
+
 #ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT	VM_HIGH_ARCH_BIT_0
 # define VM_PKEY_BIT0	VM_HIGH_ARCH_0	/* A protection key is a 4-bit value */
diff --git a/kernel/fork.c b/kernel/fork.c
index ff4efd16fd82..a3c758c8cd54 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -584,8 +584,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		rb_parent = &tmp->vm_rb;
 
 		mm->map_count++;
-		if (!(tmp->vm_flags & VM_WIPEONFORK))
+		if (!(tmp->vm_flags & VM_WIPEONFORK)) {
 			retval = copy_page_range(mm, oldmm, mpnt);
+			mpnt->vm_flags &= ~VM_MAY_REMOTE_REMAP;
+		}
 
 		if (tmp->vm_ops && tmp->vm_ops->open)
 			tmp->vm_ops->open(tmp);


  reply	other threads:[~2019-05-29 14:33 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20 14:00 [PATCH v2 0/7] mm: process_vm_mmap() -- syscall for duplication a process mapping Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 1/7] mm: Add process_vm_mmap() syscall declaration Kirill Tkhai
2019-05-21  0:28   ` Ira Weiny
2019-05-21  8:29     ` Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 2/7] mm: Extend copy_vma() Kirill Tkhai
2019-05-21  8:18   ` Kirill A. Shutemov
2019-05-21  8:48     ` Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 3/7] mm: Extend copy_page_range() Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 4/7] mm: Export round_hint_to_min() Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 5/7] mm: Introduce may_mmap_overlapped_region() helper Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 6/7] mm: Introduce find_vma_filter_flags() helper Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 7/7] mm: Add process_vm_mmap() Kirill Tkhai
2019-05-21 14:43 ` [PATCH v2 0/7] mm: process_vm_mmap() -- syscall for duplication a process mapping Andy Lutomirski
2019-05-21 15:52   ` Kirill Tkhai
2019-05-21 15:59     ` Kirill Tkhai
2019-05-21 16:20     ` Jann Horn
2019-05-21 17:03       ` Kirill Tkhai
2019-05-21 17:28         ` Jann Horn
2019-05-22 10:03           ` Kirill Tkhai
2019-05-21 16:43     ` Andy Lutomirski
2019-05-21 17:44       ` Kirill Tkhai
2019-05-23 16:19         ` Andy Lutomirski
2019-05-24 10:36           ` Kirill Tkhai
2019-05-22 15:22 ` Kirill A. Shutemov
2019-05-23 16:11   ` Kirill Tkhai
2019-05-24 10:45   ` Kirill Tkhai
2019-05-24 11:52     ` Kirill A. Shutemov
2019-05-24 14:00       ` Kirill Tkhai
2019-05-27 23:30         ` Kirill A. Shutemov
2019-05-28  9:15           ` Kirill Tkhai
2019-05-28 16:15             ` Kirill A. Shutemov
2019-05-29 14:33               ` Kirill Tkhai [this message]
2019-06-03 14:38   ` Kirill Tkhai
2019-06-03 14:56     ` Kirill Tkhai
2019-06-03 17:47       ` Kirill A. Shutemov
2019-06-04  9:32         ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b9a6b90-4d82-9d4d-466d-653f9024849f@virtuozzo.com \
    --to=ktkhai@virtuozzo.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=andreyknvl@google.com \
    --cc=arunks@codeaurora.org \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jglisse@redhat.com \
    --cc=keescook@chromium.org \
    --cc=keith.busch@intel.com \
    --cc=kilobyte@angband.pl \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=npiggin@gmail.com \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).