From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-vk0-x236.google.com ([IPv6:2607:f8b0:400c:c05::236]:33094 "EHLO mail-vk0-x236.google.com" rhost-flags-OK-OK-OK-OK) by eddie.linux-mips.org with ESMTP id S23991867AbdCOQwArCU-J convert rfc822-to-8bit (ORCPT ); Wed, 15 Mar 2017 17:52:00 +0100 Received: by mail-vk0-x236.google.com with SMTP id d188so11835639vka.0 for ; Wed, 15 Mar 2017 09:52:00 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20170314161229.tl6hsmian2gdep47@arch-dev> References: <20170314161229.tl6hsmian2gdep47@arch-dev> From: Andy Lutomirski Date: Wed, 15 Mar 2017 09:51:31 -0700 Message-ID: Subject: Re: [RFC PATCH 00/13] Introduce first class virtual address spaces Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Return-Path: Sender: linux-mips-bounce@linux-mips.org Errors-to: linux-mips-bounce@linux-mips.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-subscribe: List-owner: List-post: List-archive: To: Andy Lutomirski , Andy Lutomirski , Till Smejkal , Richard Henderson , Ivan Kokshaysky , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Steven Miao , Richard Kuo , Tony Luck , Fenghua Yu , James Hogan , Ralf Baechle , "James E.J. Bottomley" , Helge Deller , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Martin Schwidefsky , Heiko Carstens , Yoshinori Sato , Rich Felker , "David S. Miller" , Chris Metcalf , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , X86 ML , Chris Zankel , Max Filippov , Arnd Bergmann , Greg Kroah-Hartman , Laurent Pinchart , Mauro Carvalho Chehab , Pawel Osciak , Marek Szyprowski , Kyungmin Park , David Woodhouse , Brian Norris , Boris Brezillon , Marek Vasut , Richard Weinberger , Cyrille Pitchen , Felipe Balbi , Alexander Viro , Benjamin LaHaise , Nadia Yvette Chambers , Jeff Layton , "J. Bruce Fields" , Peter Zijlstra , Hugh Dickins , Arnaldo Carvalho de Melo , Alexander Shishkin , Jaroslav Kysela , Takashi Iwai , "linux-kernel@vger.kernel.org" , linux-alpha@vger.kernel.org, arcml , "linux-arm-kernel@lists.infradead.org" , adi-buildroot-devel@lists.sourceforge.net, linux-hexagon@vger.kernel.org, "linux-ia64@vger.kernel.org" , linux-metag@vger.kernel.org, Linux MIPS Mailing List , linux-parisc@vger.kernel.org, linuxppc-dev , "linux-s390@vger.kernel.org" , "linux-sh@vger.kernel.org" , sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Linux Media Mailing List , linux-mtd@lists.infradead.org, USB list , Linux FS Devel , linux-aio@kvack.org, "linux-mm@kvack.org" , Linux API , linux-arch , ALSA development Message-ID: <20170315165131.eFr_KaPqmu-bD2cNI5aQdClHupBQvjUbgjKHo-AtylY@z> On Tue, Mar 14, 2017 at 9:12 AM, Till Smejkal wrote: > On Mon, 13 Mar 2017, Andy Lutomirski wrote: >> On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal >> wrote: >> > On Mon, 13 Mar 2017, Andy Lutomirski wrote: >> >> This sounds rather complicated. Getting TLB flushing right seems >> >> tricky. Why not just map the same thing into multiple mms? >> > >> > This is exactly what happens at the end. The memory region that is described by the >> > VAS segment will be mapped in the ASes that use the segment. >> >> So why is this kernel feature better than just doing MAP_SHARED >> manually in userspace? > > One advantage of VAS segments is that they can be globally queried by user programs > which means that VAS segments can be shared by applications that not necessarily have > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will only work > if the tasks that share the memory region are related (aka. have a common parent that > initialized the shared mapping). Otherwise, the shared mapping have to be backed by a > file. What's wrong with memfd_create()? > VAS segments on the other side allow sharing of pure in memory data by > arbitrary related tasks without the need of a file. This becomes especially > interesting if one combines VAS segments with non-volatile memory since one can keep > data structures in the NVM and still be able to share them between multiple tasks. What's wrong with regular mmap? > >> >> Ick. Please don't do this. Can we please keep an mm as just an mm >> >> and not make it look magically different depending on which process >> >> maps it? If you need a trampoline (which you do, of course), just >> >> write a trampoline in regular user code and map it manually. >> > >> > Did I understand you correctly that you are proposing that the switching thread >> > should make sure by itself that its code, stack, … memory regions are properly setup >> > in the new AS before/after switching into it? I think, this would make using first >> > class virtual address spaces much more difficult for user applications to the extend >> > that I am not even sure if they can be used at all. At the moment, switching into a >> > VAS is a very simple operation for an application because the kernel will just simply >> > do the right thing. >> >> Yes. I think that having the same mm_struct look different from >> different tasks is problematic. Getting it right in the arch code is >> going to be nasty. The heuristics of what to share are also tough -- >> why would text + data + stack or whatever you're doing be adequate? >> What if you're in a thread? What if two tasks have their stacks in >> the same place? > > The different ASes that a task now can have when it uses first class virtual address > spaces are not realized in the kernel by using only one mm_struct per task that just > looks differently but by using multiple mm_structs - one for each AS that the task > can execute in. When a task attaches a first class virtual address space to itself to > be able to use another AS, the kernel adds a temporary mm_struct to this task that > contains the mappings of the first class virtual address space and the one shared > with the task's original AS. If a thread now wants to switch into this attached first > class virtual address space the kernel only changes the 'mm' and 'active_mm' pointers > in the task_struct of the thread to the temporary mm_struct and performs the > corresponding mm_switch operation. The original mm_struct of the thread will not be > changed. > > Accordingly, I do not magically make mm_structs look differently depending on the > task that uses it, but create temporary mm_structs that only contain mappings to the > same memory regions. This sounds complicated and fragile. What happens if a heuristically shared region coincides with a region in the "first class address space" being selected? I think the right solution is "you're a user program playing virtual address games -- make sure you do it right". --Andy