From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752401AbbJTNog (ORCPT ); Tue, 20 Oct 2015 09:44:36 -0400 Received: from mail-yk0-f196.google.com ([209.85.160.196]:35901 "EHLO mail-yk0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751651AbbJTNoe (ORCPT ); Tue, 20 Oct 2015 09:44:34 -0400 MIME-Version: 1.0 In-Reply-To: <20151019214216.GU19147@redhat.com> References: <1434388931-24487-1-git-send-email-aarcange@redhat.com> <20151019214216.GU19147@redhat.com> Date: Tue, 20 Oct 2015 09:44:33 -0400 Message-ID: Subject: Re: [PATCH 0/7] userfault21 update From: Patrick Donnelly To: Andrea Arcangeli Cc: Andrew Morton , open list , linux-mm@kvack.org, qemu-devel@nongnu.org, kvm@vger.kernel.org, Pavel Emelyanov , Sanidhya Kashyap , zhang.zhanghailiang@huawei.com, Linus Torvalds , "Kirill A. Shutemov" , Andres Lagar-Cavilla , Dave Hansen , Paolo Bonzini , Rik van Riel , Mel Gorman , Andy Lutomirski , Hugh Dickins , Peter Feiner , "Dr. David Alan Gilbert" , Johannes Weiner , "Huangpeng (Peter)" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 19, 2015 at 5:42 PM, Andrea Arcangeli wrote: > Hello Patrick, > > On Mon, Oct 12, 2015 at 11:04:11AM -0400, Patrick Donnelly wrote: >> Hello Andrea, >> >> On Mon, Jun 15, 2015 at 1:22 PM, Andrea Arcangeli wrote: >> > This is an incremental update to the userfaultfd code in -mm. >> >> Sorry I'm late to this party. I'm curious how a ptrace monitor might >> use a userfaultfd to handle faults in all of its tracees. Is this >> possible without having each (newly forked) tracee "cooperate" by >> creating a userfaultfd and passing that to the tracer? > > To make the non cooperative usage work, userfaulfd also needs more > features to track fork() and mremap() syscalls and such, as the > monitor needs to be aware about modifications to the address space of > each "mm" is managing and of new forked "mm" as well. So fork() won't > need to call userfaultfd once we add those features, but it still > doesn't need to know about the "pid". The uffd_msg already has padding > to add the features you need for that. > > Pavel invented and developed those features for the non cooperative > usage to implement postcopy live migration of containers. He posted > some patchset on the lists too, but it probably needs to be rebased on > upstream. > > The ptrace monitor thread can also fault into the userfault area if it > wants to (but only if it's not the userfault manager thread as well). > I didn't expect the ptrace monitor to want to be a userfault manager > too though. > [...] Okay, it's definitely tricky to make this work for a tree of non-cooperative processes. Brainstorming some ideas: o If we are using ptrace, then we can add a ptrace event for receiving the userfaultfd associated with the tracee, via waitpid (!). The ptrace monitor can deduplicate userfaultfds by looking at the inode. It can also associate a userfaultfd with a group of threads sharing a mm. [For my possible use-case with Parrot[1], we already track the shared address spaces of tracees in order to implement an mmap hook.] o The userfaultfd can have a flag for tracking a tree of processes (which can be sent via unix sockets to the userfault handler) and use an opaque tag (the mm pointer?) to disambiguate the faults, instead of a pid. There would need to be some kind of message to notify about newly cloned threads and the mm associated with them? Yes, you wouldn't be able to know which pid (or kernel/ptrace thread) generated a fault but at least you would know which pids the mm belongs to. I didn't see the patchset Pavel posted in a quick search of the archives. Only this [2]. [1] http://ccl.cse.nd.edu/software/parrot/ [2] https://lkml.org/lkml/2015/1/15/103 -- Patrick Donnelly From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f172.google.com (mail-yk0-f172.google.com [209.85.160.172]) by kanga.kvack.org (Postfix) with ESMTP id BF1C36B0038 for ; Tue, 20 Oct 2015 09:44:34 -0400 (EDT) Received: by ykaz22 with SMTP id z22so16660608yka.2 for ; Tue, 20 Oct 2015 06:44:34 -0700 (PDT) Received: from mail-yk0-f193.google.com (mail-yk0-f193.google.com. [209.85.160.193]) by mx.google.com with ESMTPS id x131si1337783ywa.226.2015.10.20.06.44.33 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Oct 2015 06:44:33 -0700 (PDT) Received: by ykdr3 with SMTP id r3so1350679ykd.0 for ; Tue, 20 Oct 2015 06:44:33 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20151019214216.GU19147@redhat.com> References: <1434388931-24487-1-git-send-email-aarcange@redhat.com> <20151019214216.GU19147@redhat.com> Date: Tue, 20 Oct 2015 09:44:33 -0400 Message-ID: Subject: Re: [PATCH 0/7] userfault21 update From: Patrick Donnelly Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli Cc: Andrew Morton , open list , linux-mm@kvack.org, qemu-devel@nongnu.org, kvm@vger.kernel.org, Pavel Emelyanov , Sanidhya Kashyap , zhang.zhanghailiang@huawei.com, Linus Torvalds , "Kirill A. Shutemov" , Andres Lagar-Cavilla , Dave Hansen , Paolo Bonzini , Rik van Riel , Mel Gorman , Andy Lutomirski , Hugh Dickins , Peter Feiner , "Dr. David Alan Gilbert" , Johannes Weiner , "Huangpeng (Peter)" On Mon, Oct 19, 2015 at 5:42 PM, Andrea Arcangeli wrote: > Hello Patrick, > > On Mon, Oct 12, 2015 at 11:04:11AM -0400, Patrick Donnelly wrote: >> Hello Andrea, >> >> On Mon, Jun 15, 2015 at 1:22 PM, Andrea Arcangeli wrote: >> > This is an incremental update to the userfaultfd code in -mm. >> >> Sorry I'm late to this party. I'm curious how a ptrace monitor might >> use a userfaultfd to handle faults in all of its tracees. Is this >> possible without having each (newly forked) tracee "cooperate" by >> creating a userfaultfd and passing that to the tracer? > > To make the non cooperative usage work, userfaulfd also needs more > features to track fork() and mremap() syscalls and such, as the > monitor needs to be aware about modifications to the address space of > each "mm" is managing and of new forked "mm" as well. So fork() won't > need to call userfaultfd once we add those features, but it still > doesn't need to know about the "pid". The uffd_msg already has padding > to add the features you need for that. > > Pavel invented and developed those features for the non cooperative > usage to implement postcopy live migration of containers. He posted > some patchset on the lists too, but it probably needs to be rebased on > upstream. > > The ptrace monitor thread can also fault into the userfault area if it > wants to (but only if it's not the userfault manager thread as well). > I didn't expect the ptrace monitor to want to be a userfault manager > too though. > [...] Okay, it's definitely tricky to make this work for a tree of non-cooperative processes. Brainstorming some ideas: o If we are using ptrace, then we can add a ptrace event for receiving the userfaultfd associated with the tracee, via waitpid (!). The ptrace monitor can deduplicate userfaultfds by looking at the inode. It can also associate a userfaultfd with a group of threads sharing a mm. [For my possible use-case with Parrot[1], we already track the shared address spaces of tracees in order to implement an mmap hook.] o The userfaultfd can have a flag for tracking a tree of processes (which can be sent via unix sockets to the userfault handler) and use an opaque tag (the mm pointer?) to disambiguate the faults, instead of a pid. There would need to be some kind of message to notify about newly cloned threads and the mm associated with them? Yes, you wouldn't be able to know which pid (or kernel/ptrace thread) generated a fault but at least you would know which pids the mm belongs to. I didn't see the patchset Pavel posted in a quick search of the archives. Only this [2]. [1] http://ccl.cse.nd.edu/software/parrot/ [2] https://lkml.org/lkml/2015/1/15/103 -- Patrick Donnelly -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58314) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZoXDW-0005uK-Fk for qemu-devel@nongnu.org; Tue, 20 Oct 2015 09:44:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZoXDQ-0005F0-F3 for qemu-devel@nongnu.org; Tue, 20 Oct 2015 09:44:42 -0400 Received: from mail-yk0-f195.google.com ([209.85.160.195]:32862) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZoXDQ-0005Eq-Ak for qemu-devel@nongnu.org; Tue, 20 Oct 2015 09:44:36 -0400 Received: by ykdr3 with SMTP id r3so1350680ykd.0 for ; Tue, 20 Oct 2015 06:44:33 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20151019214216.GU19147@redhat.com> References: <1434388931-24487-1-git-send-email-aarcange@redhat.com> <20151019214216.GU19147@redhat.com> Date: Tue, 20 Oct 2015 09:44:33 -0400 Message-ID: From: Patrick Donnelly Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] [PATCH 0/7] userfault21 update List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrea Arcangeli Cc: "Huangpeng (Peter)" , zhang.zhanghailiang@huawei.com, kvm@vger.kernel.org, Pavel Emelyanov , Hugh Dickins , Johannes Weiner , Dave Hansen , open list , qemu-devel@nongnu.org, linux-mm@kvack.org, Andres Lagar-Cavilla , Mel Gorman , Paolo Bonzini , "Kirill A. Shutemov" , Andrew Morton , Sanidhya Kashyap , Linus Torvalds , Andy Lutomirski , "Dr. David Alan Gilbert" , Peter Feiner On Mon, Oct 19, 2015 at 5:42 PM, Andrea Arcangeli wrote: > Hello Patrick, > > On Mon, Oct 12, 2015 at 11:04:11AM -0400, Patrick Donnelly wrote: >> Hello Andrea, >> >> On Mon, Jun 15, 2015 at 1:22 PM, Andrea Arcangeli wrote: >> > This is an incremental update to the userfaultfd code in -mm. >> >> Sorry I'm late to this party. I'm curious how a ptrace monitor might >> use a userfaultfd to handle faults in all of its tracees. Is this >> possible without having each (newly forked) tracee "cooperate" by >> creating a userfaultfd and passing that to the tracer? > > To make the non cooperative usage work, userfaulfd also needs more > features to track fork() and mremap() syscalls and such, as the > monitor needs to be aware about modifications to the address space of > each "mm" is managing and of new forked "mm" as well. So fork() won't > need to call userfaultfd once we add those features, but it still > doesn't need to know about the "pid". The uffd_msg already has padding > to add the features you need for that. > > Pavel invented and developed those features for the non cooperative > usage to implement postcopy live migration of containers. He posted > some patchset on the lists too, but it probably needs to be rebased on > upstream. > > The ptrace monitor thread can also fault into the userfault area if it > wants to (but only if it's not the userfault manager thread as well). > I didn't expect the ptrace monitor to want to be a userfault manager > too though. > [...] Okay, it's definitely tricky to make this work for a tree of non-cooperative processes. Brainstorming some ideas: o If we are using ptrace, then we can add a ptrace event for receiving the userfaultfd associated with the tracee, via waitpid (!). The ptrace monitor can deduplicate userfaultfds by looking at the inode. It can also associate a userfaultfd with a group of threads sharing a mm. [For my possible use-case with Parrot[1], we already track the shared address spaces of tracees in order to implement an mmap hook.] o The userfaultfd can have a flag for tracking a tree of processes (which can be sent via unix sockets to the userfault handler) and use an opaque tag (the mm pointer?) to disambiguate the faults, instead of a pid. There would need to be some kind of message to notify about newly cloned threads and the mm associated with them? Yes, you wouldn't be able to know which pid (or kernel/ptrace thread) generated a fault but at least you would know which pids the mm belongs to. I didn't see the patchset Pavel posted in a quick search of the archives. Only this [2]. [1] http://ccl.cse.nd.edu/software/parrot/ [2] https://lkml.org/lkml/2015/1/15/103 -- Patrick Donnelly