linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
       [not found] <CALCETrXfGgxaLivhci0VL=wUaWAnBiUXC47P7TUaEuOYV_-X_g@mail.gmail.com>
@ 2017-03-15 22:02 ` Till Smejkal
  2017-03-16  8:21   ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Till Smejkal @ 2017-03-15 22:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Till Smejkal, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Vineet Gupta, Russell King,
	Catalin Marinas, Will Deacon, Steven Miao, Richard Kuo,
	Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato

On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> On Wed, Mar 15, 2017 at 12:44 PM, Till Smejkal
> <till.smejkal@googlemail.com> wrote:
> > On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> >> > One advantage of VAS segments is that they can be globally queried by user programs
> >> > which means that VAS segments can be shared by applications that not necessarily have
> >> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will only work
> >> > if the tasks that share the memory region are related (aka. have a common parent that
> >> > initialized the shared mapping). Otherwise, the shared mapping have to be backed by a
> >> > file.
> >>
> >> What's wrong with memfd_create()?
> >>
> >> > VAS segments on the other side allow sharing of pure in memory data by
> >> > arbitrary related tasks without the need of a file. This becomes especially
> >> > interesting if one combines VAS segments with non-volatile memory since one can keep
> >> > data structures in the NVM and still be able to share them between multiple tasks.
> >>
> >> What's wrong with regular mmap?
> >
> > I never wanted to say that there is something wrong with regular mmap. We just
> > figured that with VAS segments you could remove the need to mmap your shared data but
> > instead can keep everything purely in memory.
> 
> memfd does that.

Yes, that's right. Thanks for giving me the pointer to this. I should have researched
more carefully before starting to work at VAS segments.

> > VAS segments on the other side would provide a functionality to
> > achieve the same without the need of any mounted filesystem. However, I agree, that
> > this is just a small advantage compared to what can already be achieved with the
> > existing functionality provided by the Linux kernel.
> 
> I see this "small advantage" as "resource leak and security problem".

I don't agree here. VAS segments are basically in-memory files that are handled by
the kernel directly without using a file system. Hence, if an application uses a VAS
segment to store data the same rules apply as if it uses a file. Everything that it
saves in the VAS segment might be accessible by other applications. An application
using VAS segments should be aware of this fact. In addition, the resources that are
represented by a VAS segment are not leaked. As I said, VAS segments are much like
files. Hence, if you don't want to use them any more, delete them. But as with files,
the kernel will not delete them for you (although something like this can be added).

> >> This sounds complicated and fragile.  What happens if a heuristically
> >> shared region coincides with a region in the "first class address
> >> space" being selected?
> >
> > If such a conflict happens, the task cannot use the first class address space and the
> > corresponding system call will return an error. However, with the current available
> > virtual address space size that programs can use, such conflicts are probably rare.
> 
> A bug that hits 1% of the time is often worse than one that hits 100%
> of the time because debugging it is miserable.

I don't agree that this is a bug at all. If there is a conflict in the memory layout
of the ASes the application simply cannot use this first class virtual address space.
Every application that wants to use first class virtual address spaces should check
for error return values and handle them.

This situation is similar to mapping a file at some special address in memory because
the file contains pointer based data structures and the application wants to use
them, but the kernel cannot map the file at this particular position in the
application's AS because there is already a different conflicting mapping. If an
application wants to do such things, it should also handle all the errors that can
occur.

Till

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
  2017-03-15 22:02 ` [RFC PATCH 00/13] Introduce first class virtual address spaces Till Smejkal
@ 2017-03-16  8:21   ` Thomas Gleixner
  2017-03-16 17:29     ` Till Smejkal
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2017-03-16  8:21 UTC (permalink / raw)
  To: Till Smejkal
  Cc: Andy Lutomirski, Andy Lutomirski, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Vineet Gupta, Russell King,
	Catalin Marinas, Will Deacon, Steven Miao, Richard Kuo,
	Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens

On Wed, 15 Mar 2017, Till Smejkal wrote:
> On Wed, 15 Mar 2017, Andy Lutomirski wrote:

> > > VAS segments on the other side would provide a functionality to
> > > achieve the same without the need of any mounted filesystem. However,
> > > I agree, that this is just a small advantage compared to what can
> > > already be achieved with the existing functionality provided by the
> > > Linux kernel.
> > 
> > I see this "small advantage" as "resource leak and security problem".
> 
> I don't agree here. VAS segments are basically in-memory files that are
> handled by the kernel directly without using a file system. Hence, if an

Why do we need yet another mechanism to represent something which looks
like a file instead of simply using existing mechanisms and extend them?

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
  2017-03-16  8:21   ` Thomas Gleixner
@ 2017-03-16 17:29     ` Till Smejkal
  2017-03-16 17:42       ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Till Smejkal @ 2017-03-16 17:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Till Smejkal, Andy Lutomirski, Andy Lutomirski,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens

On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> Why do we need yet another mechanism to represent something which looks
> like a file instead of simply using existing mechanisms and extend them?

You are right. I also recognized during the discussion with Andy, Chris, Matthew,
Luck, Rich and the others that there are already other techniques in the Linux kernel
that can achieve the same functionality when combined. As I said also to the others,
I will drop the VAS segments for future versions. The first class virtual address
space feature was the more interesting part of the patchset anyways.

Thanks
Till

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
  2017-03-16 17:29     ` Till Smejkal
@ 2017-03-16 17:42       ` Thomas Gleixner
  2017-03-16 17:50         ` Till Smejkal
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2017-03-16 17:42 UTC (permalink / raw)
  To: Till Smejkal
  Cc: Andy Lutomirski, Andy Lutomirski, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Vineet Gupta, Russell King,
	Catalin Marinas, Will Deacon, Steven Miao, Richard Kuo,
	Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens

On Thu, 16 Mar 2017, Till Smejkal wrote:
> On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> > Why do we need yet another mechanism to represent something which looks
> > like a file instead of simply using existing mechanisms and extend them?
> 
> You are right. I also recognized during the discussion with Andy, Chris,
> Matthew, Luck, Rich and the others that there are already other
> techniques in the Linux kernel that can achieve the same functionality
> when combined. As I said also to the others, I will drop the VAS segments
> for future versions. The first class virtual address space feature was
> the more interesting part of the patchset anyways.

While you are at it, could you please drop this 'first class' marketing as
well? It has zero technical value, really.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
  2017-03-16 17:42       ` Thomas Gleixner
@ 2017-03-16 17:50         ` Till Smejkal
  0 siblings, 0 replies; 17+ messages in thread
From: Till Smejkal @ 2017-03-16 17:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Till Smejkal, Andy Lutomirski, Andy Lutomirski,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens

On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> On Thu, 16 Mar 2017, Till Smejkal wrote:
> > On Thu, 16 Mar 2017, Thomas Gleixner wrote:
> > > Why do we need yet another mechanism to represent something which looks
> > > like a file instead of simply using existing mechanisms and extend them?
> > 
> > You are right. I also recognized during the discussion with Andy, Chris,
> > Matthew, Luck, Rich and the others that there are already other
> > techniques in the Linux kernel that can achieve the same functionality
> > when combined. As I said also to the others, I will drop the VAS segments
> > for future versions. The first class virtual address space feature was
> > the more interesting part of the patchset anyways.
> 
> While you are at it, could you please drop this 'first class' marketing as
> well? It has zero technical value, really.

Yes of course. I am sorry for the trouble that I caused already.

Thanks
Till

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
       [not found] <20170315220952.GA1435@intel.com>
@ 2017-03-15 23:18 ` Till Smejkal
  0 siblings, 0 replies; 17+ messages in thread
From: Till Smejkal @ 2017-03-15 23:18 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andy Lutomirski, Andy Lutomirski, Till Smejkal,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato

On Wed, 15 Mar 2017, Luck, Tony wrote:
> On Wed, Mar 15, 2017 at 03:02:34PM -0700, Till Smejkal wrote:
> > I don't agree here. VAS segments are basically in-memory files that are handled by
> > the kernel directly without using a file system. Hence, if an application uses a VAS
> > segment to store data the same rules apply as if it uses a file. Everything that it
> > saves in the VAS segment might be accessible by other applications. An application
> > using VAS segments should be aware of this fact. In addition, the resources that are
> > represented by a VAS segment are not leaked. As I said, VAS segments are much like
> > files. Hence, if you don't want to use them any more, delete them. But as with files,
> > the kernel will not delete them for you (although something like this can be added).
> 
> So how do they differ from shmget(2), shmat(2), shmdt(2), shmctl(2)?
> 
> Apart from VAS having better names, instead of silly "key_t key" ones.

Unfortunately, I have to admit that the VAS segments don't differ from shm* a lot.
The implementation is differently, but the functionality that you can achieve with it
is very similar. I am sorry. We should have looked more closely at the whole
functionality that is provided by the shmem subsystem before working on VAS segments.

However, VAS segments are not the key part of this patch set. The more interesting
functionality in our opinion is the introduction of first class virtual address
spaces and what they can be used for. VAS segments were just another logical step for
us (from first class virtual address spaces to first class virtual address space
segments) but since their functionality can be achieved with various other already
existing features of the Linux kernel, I will probably drop them in future versions
of the patchset.

Thanks
Till

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
       [not found] <20170315194723.GJ1693@brightrain.aerifal.cx>
@ 2017-03-15 21:30 ` Till Smejkal
  0 siblings, 0 replies; 17+ messages in thread
From: Till Smejkal @ 2017-03-15 21:30 UTC (permalink / raw)
  To: Rich Felker
  Cc: Andy Lutomirski, Andy Lutomirski, Till Smejkal,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens

On Wed, 15 Mar 2017, Rich Felker wrote:
> On Wed, Mar 15, 2017 at 12:44:47PM -0700, Till Smejkal wrote:
> > On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> > > > One advantage of VAS segments is that they can be globally queried by user programs
> > > > which means that VAS segments can be shared by applications that not necessarily have
> > > > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will only work
> > > > if the tasks that share the memory region are related (aka. have a common parent that
> > > > initialized the shared mapping). Otherwise, the shared mapping have to be backed by a
> > > > file.
> > > 
> > > What's wrong with memfd_create()?
> > > 
> > > > VAS segments on the other side allow sharing of pure in memory data by
> > > > arbitrary related tasks without the need of a file. This becomes especially
> > > > interesting if one combines VAS segments with non-volatile memory since one can keep
> > > > data structures in the NVM and still be able to share them between multiple tasks.
> > > 
> > > What's wrong with regular mmap?
> > 
> > I never wanted to say that there is something wrong with regular mmap. We just
> > figured that with VAS segments you could remove the need to mmap your shared data but
> > instead can keep everything purely in memory.
> > 
> > Unfortunately, I am not at full speed with memfds. Is my understanding correct that
> > if the last user of such a file descriptor closes it, the corresponding memory is
> > freed? Accordingly, memfd cannot be used to keep data in memory while no program is
> > currently using it, can it? To be able to do this you need again some representation
> 
> I have a name for application-allocated kernel resources that persist
> without a process holding a reference to them or a node in the
> filesystem: a bug. See: sysvipc.

VAS segments are first class citizens of the OS similar to processes. Accordingly, I
would not see this behavior as a bug. VAS segments are a kernel handle to
"persistent" memory (in the sense that they are independent of the lifetime of the
application that created them). That means the memory that is described by VAS
segments can be reused by other applications even if the VAS segment was not used by
any application in between. It is very much like a pure in-memory file. An
application creates a VAS segment, fills it with content and if it does not delete it
again, can reuse/open it again later. This also means, that if you know that you
never want to use this memory again you have to remove it explicitly, like you have
to remove a file, if you don't want to use it anymore.

I think it really might be better to implement VAS segments (if I should keep this
feature at all) with a special purpose filesystem. The way I've designed it seams to
be very misleading.

Till

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
       [not found]   ` <CALCETrX5gv+zdhOYro4-u3wGWjVCab28DFHPSm5=BVG_hKxy3A@mail.gmail.com>
  2017-03-15 16:57     ` Matthew Wilcox
@ 2017-03-15 19:44     ` Till Smejkal
  1 sibling, 0 replies; 17+ messages in thread
From: Till Smejkal @ 2017-03-15 19:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Till Smejkal, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Vineet Gupta, Russell King,
	Catalin Marinas, Will Deacon, Steven Miao, Richard Kuo,
	Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato

On Wed, 15 Mar 2017, Andy Lutomirski wrote:
> > One advantage of VAS segments is that they can be globally queried by user programs
> > which means that VAS segments can be shared by applications that not necessarily have
> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will only work
> > if the tasks that share the memory region are related (aka. have a common parent that
> > initialized the shared mapping). Otherwise, the shared mapping have to be backed by a
> > file.
> 
> What's wrong with memfd_create()?
> 
> > VAS segments on the other side allow sharing of pure in memory data by
> > arbitrary related tasks without the need of a file. This becomes especially
> > interesting if one combines VAS segments with non-volatile memory since one can keep
> > data structures in the NVM and still be able to share them between multiple tasks.
> 
> What's wrong with regular mmap?

I never wanted to say that there is something wrong with regular mmap. We just
figured that with VAS segments you could remove the need to mmap your shared data but
instead can keep everything purely in memory.

Unfortunately, I am not at full speed with memfds. Is my understanding correct that
if the last user of such a file descriptor closes it, the corresponding memory is
freed? Accordingly, memfd cannot be used to keep data in memory while no program is
currently using it, can it? To be able to do this you need again some representation
of the data in a file? Yes, you can use a tmpfs to keep the file content in memory as
well, or some DAX filesystem to keep the file content in NVM, but this always
requires that such filesystems are mounted in the system that the application is
currently running on. VAS segments on the other side would provide a functionality to
achieve the same without the need of any mounted filesystem. However, I agree, that
this is just a small advantage compared to what can already be achieved with the
existing functionality provided by the Linux kernel. I probably need to revisit the
whole idea of first class virtual address space segments before continuing with this
pacthset. Thank you very much for the great feedback.

> >> >> Ick.  Please don't do this.  Can we please keep an mm as just an mm
> >> >> and not make it look magically different depending on which process
> >> >> maps it?  If you need a trampoline (which you do, of course), just
> >> >> write a trampoline in regular user code and map it manually.
> >> >
> >> > Did I understand you correctly that you are proposing that the switching thread
> >> > should make sure by itself that its code, stack, … memory regions are properly setup
> >> > in the new AS before/after switching into it? I think, this would make using first
> >> > class virtual address spaces much more difficult for user applications to the extend
> >> > that I am not even sure if they can be used at all. At the moment, switching into a
> >> > VAS is a very simple operation for an application because the kernel will just simply
> >> > do the right thing.
> >>
> >> Yes.  I think that having the same mm_struct look different from
> >> different tasks is problematic.  Getting it right in the arch code is
> >> going to be nasty.  The heuristics of what to share are also tough --
> >> why would text + data + stack or whatever you're doing be adequate?
> >> What if you're in a thread?  What if two tasks have their stacks in
> >> the same place?
> >
> > The different ASes that a task now can have when it uses first class virtual address
> > spaces are not realized in the kernel by using only one mm_struct per task that just
> > looks differently but by using multiple mm_structs - one for each AS that the task
> > can execute in. When a task attaches a first class virtual address space to itself to
> > be able to use another AS, the kernel adds a temporary mm_struct to this task that
> > contains the mappings of the first class virtual address space and the one shared
> > with the task's original AS. If a thread now wants to switch into this attached first
> > class virtual address space the kernel only changes the 'mm' and 'active_mm' pointers
> > in the task_struct of the thread to the temporary mm_struct and performs the
> > corresponding mm_switch operation. The original mm_struct of the thread will not be
> > changed.
> >
> > Accordingly, I do not magically make mm_structs look differently depending on the
> > task that uses it, but create temporary mm_structs that only contain mappings to the
> > same memory regions.
> 
> This sounds complicated and fragile.  What happens if a heuristically
> shared region coincides with a region in the "first class address
> space" being selected?

If such a conflict happens, the task cannot use the first class address space and the
corresponding system call will return an error. However, with the current available
virtual address space size that programs can use, such conflicts are probably rare. I
could also image some additional functionality that allows a user to mark parts of
its AS to not to be shared/to be shared when switching into a VAS. With this
functionality in place, there would be no need for a heuristic in the kernel but the
user decides what to share. The kernel would by default only share code, data, and
stack and the application/libraries have to mark all the other memory regions as
shared if they need to be also available in the VAS.

> I think the right solution is "you're a user program playing virtual
> address games -- make sure you do it right".

Hm, in general I agree, that the easier and more robust solution from the kernel
perspective is to let the user do the AS setup and only provide the functionality to
create new empty ASes. Though, I think that such an interface would be much more
difficult to use than my current design. Letting the user program setup the AS has
also another implication that I currently don't have. Since I share the code and
stack regions between all ASes that are available to a process, I don't need to
save/restore stack pointers or instruction pointers when threads switch between ASes.
However, when the user will setup the AS, the kernel cannot be sure that the code and
stack will be mapped at the same virtual address and hence has to save and restore
these registers (and also potentially others since we can now basically jump between
different execution contexts).

When we first designed first class virtual address spaces, we had one special
use-case in mind, namely that one application wants to use different data sets that
it does not want/can keep in the same AS. Hence, sharing code and stack between the
different ASes that the application uses was a logic step for us because the code
memory region for example has to be available at all AS anyways since all of them
execute the same application. Sharing the stack memory region enabled the application
to keep volatile information that might be needed in the new AS on the stack which
allows easy information flow between the different ASes. 

For this patch, I extended the initial sharing of stack and code memory regions to
all memory regions that are available in the tasks original AS to also allow
dynamically linked applications and multi-threaded applications to flawlessly use
first class virtual address spaces.

To put it in a nutshell, we envisioned first class virtual address spaces to be
rather used as shareable/reusable data containers which made sharing various memory
regions that are crucial for the execution of the application a feasible
implementation decision.

Thank you all very much for the feedback. I really appreciate it.

Till

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
       [not found]   ` <CALCETrX5gv+zdhOYro4-u3wGWjVCab28DFHPSm5=BVG_hKxy3A@mail.gmail.com>
@ 2017-03-15 16:57     ` Matthew Wilcox
  2017-03-15 19:44     ` Till Smejkal
  1 sibling, 0 replies; 17+ messages in thread
From: Matthew Wilcox @ 2017-03-15 16:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux MIPS Mailing List, linux-s390, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Heiko Carstens,
	ALSA development, USB list, James E.J. Bottomley,
	J. Bruce Fields, Max Filippov, Paul Mackerras, Laurent Pinchart,
	H. Peter Anvin, sparclinux, linux-hexagon, Cyrille Pitchen,
	Linux API, Jeff

On Wed, Mar 15, 2017 at 09:51:31AM -0700, Andy Lutomirski wrote:
> > VAS segments on the other side allow sharing of pure in memory data by
> > arbitrary related tasks without the need of a file. This becomes especially
> > interesting if one combines VAS segments with non-volatile memory since one can keep
> > data structures in the NVM and still be able to share them between multiple tasks.
> 
> What's wrong with regular mmap?

I think it's the usual misunderstandings about how to use mmap.
>From the paper:

   Memory-centric computing demands careful organization of the
   virtual address space, but interfaces such as mmap only give limited
   control. Some systems do not support creation of address regions at
   specific offsets. In Linux, for example, mmap does not safely abort if
   a request is made to open a region of memory over an existing region;
   it simply writes over it.

The correct answer of course, is "Don't specify MAP_FIXED".  Specify the
'hint' address, and if you don't get it, either fix up your data structure
pointers, or just abort and complain noisily.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
       [not found] <8d9333d6-2f81-a9de-484e-e1d655e1d3c3@mellanox.com>
@ 2017-03-14 21:14 ` Till Smejkal
  0 siblings, 0 replies; 17+ messages in thread
From: Till Smejkal @ 2017-03-14 21:14 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: Andy Lutomirski, Andy Lutomirski, Till Smejkal,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens

On Tue, 14 Mar 2017, Chris Metcalf wrote:
> On 3/14/2017 12:12 PM, Till Smejkal wrote:
> > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> > > On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
> > > <till.smejkal@googlemail.com> wrote:
> > > > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> > > > > This sounds rather complicated.  Getting TLB flushing right seems
> > > > > tricky.  Why not just map the same thing into multiple mms?
> > > > This is exactly what happens at the end. The memory region that is described by the
> > > > VAS segment will be mapped in the ASes that use the segment.
> > > So why is this kernel feature better than just doing MAP_SHARED
> > > manually in userspace?
> > One advantage of VAS segments is that they can be globally queried by user programs
> > which means that VAS segments can be shared by applications that not necessarily have
> > to be related. If I am not mistaken, MAP_SHARED of pure in memory data will only work
> > if the tasks that share the memory region are related (aka. have a common parent that
> > initialized the shared mapping). Otherwise, the shared mapping have to be backed by a
> > file.
> 
> True, but why is this bad?  The shared mapping will be memory resident
> regardless, even if backed by a file (unless swapped out under heavy
> memory pressure, but arguably that's a feature anyway).  More importantly,
> having a file name is a simple and consistent way of identifying such
> shared memory segments.
> 
> With a little work, you can also arrange to map such files into memory
> at a fixed address in all participating processes, thus making internal
> pointers work correctly.

I don't want to say that the interface provided by MAP_SHARED is bad. I am only
arguing that VAS segments and the interface that they provide have an advantage over
the existing ones in my opinion. However, Matthew Wilcox also suggested in some
earlier mail that VAS segments could be exported to user space via a special purpose
filesystem. This would enable users of VAS segments to also just use some special
files to setup the shared memory regions. But since the VAS segment itself already
knows where at has to be mapped in the virtual address space of the process, the
establishing of the shared memory region would be very easy for the user.

> > VAS segments on the other side allow sharing of pure in memory data by
> > arbitrary related tasks without the need of a file. This becomes especially
> > interesting if one combines VAS segments with non-volatile memory since one can keep
> > data structures in the NVM and still be able to share them between multiple tasks.
> 
> I am not fully up to speed on NV/pmem stuff, but isn't that exactly what
> the DAX mode is supposed to allow you to do?  If so, isn't sharing a
> mapped file on a DAX filesystem on top of pmem equivalent to what
> you're proposing?

If I read the documentation to DAX filesystems correctly, it is indeed possible to us
them to create files that life purely in NVM. I wasn't fully aware of this feature.
Thanks for the pointer.

However, the main contribution of this patchset is actually the idea of first class
virtual address spaces and that they can be used to allow processes to have multiple
different views on the system's main memory. For us, VAS segments were another logic
step in the same direction (from first class virtual address spaces to first class
address space segments). However, if there is already functionality in the Linux
kernel to achieve the exact same behavior, there is no real need to add VAS segments.
I will continue thinking about them and either find a different situation where the
currently available interface is not sufficient/too complicated or drop VAS segments
from future version of the patch set.

Till

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
       [not found] <CALCETrXKvNWv1OtoSo_HWf5ZHSvyGS1NsuQod6Zt+tEg3MT5Sg@mail.gmail.com>
@ 2017-03-14 16:12 ` Till Smejkal
       [not found]   ` <CALCETrX5gv+zdhOYro4-u3wGWjVCab28DFHPSm5=BVG_hKxy3A@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Till Smejkal @ 2017-03-14 16:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Till Smejkal, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Vineet Gupta, Russell King,
	Catalin Marinas, Will Deacon, Steven Miao, Richard Kuo,
	Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato

On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal
> <till.smejkal@googlemail.com> wrote:
> > On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> >> This sounds rather complicated.  Getting TLB flushing right seems
> >> tricky.  Why not just map the same thing into multiple mms?
> >
> > This is exactly what happens at the end. The memory region that is described by the
> > VAS segment will be mapped in the ASes that use the segment.
> 
> So why is this kernel feature better than just doing MAP_SHARED
> manually in userspace?

One advantage of VAS segments is that they can be globally queried by user programs
which means that VAS segments can be shared by applications that not necessarily have
to be related. If I am not mistaken, MAP_SHARED of pure in memory data will only work
if the tasks that share the memory region are related (aka. have a common parent that
initialized the shared mapping). Otherwise, the shared mapping have to be backed by a
file. VAS segments on the other side allow sharing of pure in memory data by
arbitrary related tasks without the need of a file. This becomes especially
interesting if one combines VAS segments with non-volatile memory since one can keep
data structures in the NVM and still be able to share them between multiple tasks.

> >> Ick.  Please don't do this.  Can we please keep an mm as just an mm
> >> and not make it look magically different depending on which process
> >> maps it?  If you need a trampoline (which you do, of course), just
> >> write a trampoline in regular user code and map it manually.
> >
> > Did I understand you correctly that you are proposing that the switching thread
> > should make sure by itself that its code, stack, … memory regions are properly setup
> > in the new AS before/after switching into it? I think, this would make using first
> > class virtual address spaces much more difficult for user applications to the extend
> > that I am not even sure if they can be used at all. At the moment, switching into a
> > VAS is a very simple operation for an application because the kernel will just simply
> > do the right thing.
> 
> Yes.  I think that having the same mm_struct look different from
> different tasks is problematic.  Getting it right in the arch code is
> going to be nasty.  The heuristics of what to share are also tough --
> why would text + data + stack or whatever you're doing be adequate?
> What if you're in a thread?  What if two tasks have their stacks in
> the same place?

The different ASes that a task now can have when it uses first class virtual address
spaces are not realized in the kernel by using only one mm_struct per task that just
looks differently but by using multiple mm_structs - one for each AS that the task
can execute in. When a task attaches a first class virtual address space to itself to
be able to use another AS, the kernel adds a temporary mm_struct to this task that
contains the mappings of the first class virtual address space and the one shared
with the task's original AS. If a thread now wants to switch into this attached first
class virtual address space the kernel only changes the 'mm' and 'active_mm' pointers
in the task_struct of the thread to the temporary mm_struct and performs the
corresponding mm_switch operation. The original mm_struct of the thread will not be
changed.

Accordingly, I do not magically make mm_structs look differently depending on the
task that uses it, but create temporary mm_structs that only contain mappings to the
same memory regions.

I agree that finding a good heuristics of what to share is difficult. At the moment,
all memory regions that are available in the task's original AS will also be
available when a thread switches into an attached first class virtual address space
(aka. are shared). That means that VAS can mainly be used to extend the AS of a task
in the current state of the implementation. The reason why I implemented the sharing
in this way is that I didn't want to break shared libraries. If I only share
code+heap+stack, shared libraries would not work anymore after switching into a VAS.

> I could imagine something like a sigaltstack() mode that lets you set
> a signal up to also switch mm could be useful.

This is a very interesting idea. I will keep it in mind for future use cases of
multiple virtual address spaces per task.

Thanks
Till

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
  2017-03-14  0:58 ` Andy Lutomirski
@ 2017-03-14  2:07   ` Till Smejkal
  0 siblings, 0 replies; 17+ messages in thread
From: Till Smejkal @ 2017-03-14  2:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Till Smejkal, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Vineet Gupta, Russell King, Catalin Marinas, Will Deacon,
	Steven Miao, Richard Kuo, Tony Luck, Fenghua Yu, James Hogan,
	Ralf Baechle, James E.J. Bottomley, Helge Deller,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker

On Mon, 13 Mar 2017, Andy Lutomirski wrote:
> On Mon, Mar 13, 2017 at 3:14 PM, Till Smejkal
> <till.smejkal@googlemail.com> wrote:
> > This patchset extends the kernel memory management subsystem with a new
> > type of address spaces (called VAS) which can be created and destroyed
> > independently of processes by a user in the system. During its lifetime
> > such a VAS can be attached to processes by the user which allows a process
> > to have multiple address spaces and thereby multiple, potentially
> > different, views on the system's main memory. During its execution the
> > threads belonging to the process are able to switch freely between the
> > different attached VAS and the process' original AS enabling them to
> > utilize the different available views on the memory.
> 
> Sounds like the old SKAS feature for UML.

I haven't heard of this feature before, but after shortly looking at the description
on the UML website it actually has some similarities with what I am proposing. But as
far as I can see this was not merged into the mainline kernel, was it? In addition, I
think that first class virtual address spaces goes even one step further by allowing
AS to live independently of processes.

> > In addition to the concept of first class virtual address spaces, this
> > patchset introduces yet another feature called VAS segments. VAS segments
> > are memory regions which have a fixed size and position in the virtual
> > address space and can be shared between multiple first class virtual
> > address spaces. Such shareable memory regions are especially useful for
> > in-memory pointer-based data structures or other pure in-memory data.
> 
> This sounds rather complicated.  Getting TLB flushing right seems
> tricky.  Why not just map the same thing into multiple mms?

This is exactly what happens at the end. The memory region that is described by the
VAS segment will be mapped in the ASes that use the segment.

> >
> >             |     VAS     |  processes  |
> >     -------------------------------------
> >     switch  |       468ns |      1944ns |
> 
> The solution here is IMO to fix the scheduler.

IMHO it will be very difficult for the scheduler code to reach the same switching
time as the pure VAS switch because switching between VAS does not involve saving any
registers or FPU state and does not require selecting the next runnable task. VAS
switch is basically a system call that just changes the AS of the current thread
which makes it a very lightweight operation.

> Also, FWIW, I have patches (that need a little work) that will make
> switch_mm() waaaay faster on x86.

These patches will also improve the speed of the VAS switch operation. We are also
using the switch_mm function in the background to perform the actual hardware switch
between the two ASes. The main reason why the VAS switch is faster than the task
switch is that it just has to do fewer things.

> > At the current state of the development, first class virtual address spaces
> > have one limitation, that we haven't been able to solve so far. The feature
> > allows, that different threads of the same process can execute in different
> > AS at the same time. This is possible, because the VAS-switch operation
> > only changes the active mm_struct for the task_struct of the calling
> > thread. However, when a thread switches into a first class virtual address
> > space, some parts of its original AS are duplicated into the new one to
> > allow the thread to continue its execution at its current state.
> 
> Ick.  Please don't do this.  Can we please keep an mm as just an mm
> and not make it look magically different depending on which process
> maps it?  If you need a trampoline (which you do, of course), just
> write a trampoline in regular user code and map it manually.

Did I understand you correctly that you are proposing that the switching thread
should make sure by itself that its code, stack, … memory regions are properly setup
in the new AS before/after switching into it? I think, this would make using first
class virtual address spaces much more difficult for user applications to the extend
that I am not even sure if they can be used at all. At the moment, switching into a
VAS is a very simple operation for an application because the kernel will just simply
do the right thing.

Till

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
       [not found] <fa60d132-0eef-4350-3c88-1558f447a48f@twiddle.net>
@ 2017-03-14  1:31 ` Till Smejkal
  0 siblings, 0 replies; 17+ messages in thread
From: Till Smejkal @ 2017-03-14  1:31 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Till Smejkal, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, David S. Miller

On Tue, 14 Mar 2017, Richard Henderson wrote:
> On 03/14/2017 10:39 AM, Till Smejkal wrote:
> > > Is this an indication that full virtual address spaces are useless?  It
> > > would seem like if you only use virtual address segments then you avoid all
> > > of the problems with executing code, active stacks, and brk.
> > 
> > What do you mean with *virtual address segments*? The nice part of first class
> > virtual address spaces is that one can share/reuse collections of address space
> > segments easily.
> 
> What do *I* mean?  You introduced the term, didn't you?
> Rereading your original I see you called them "VAS segments".

Oh, I am sorry. I thought that you were referring to some other feature that I don't
know.

> Anyway, whatever they are called, it would seem that these segments do not
> require any of the syncing mechanisms that are causing you problems.

Yes, VAS segments provide a possibility to share memory regions between multiple
address spaces without the need to synchronize heap, stack, etc. Unfortunately, the
VAS segment feature itself without the whole concept of first class virtual address
spaces is not as powerful. With some additional work it can probably be represented
with the existing shmem functionality.

The first class virtual address space feature on the other side provides a real
benefit for applications in our opinion namely that an application can switch between
different views on its memory which enables various interesting programming paradigms
as mentioned in the cover letter.

Till

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
  2017-03-13 22:14 Till Smejkal
  2017-03-14  0:18 ` Richard Henderson
@ 2017-03-14  0:58 ` Andy Lutomirski
  2017-03-14  2:07   ` Till Smejkal
  1 sibling, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2017-03-14  0:58 UTC (permalink / raw)
  To: Till Smejkal
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, David S. Miller

On Mon, Mar 13, 2017 at 3:14 PM, Till Smejkal
<till.smejkal@googlemail.com> wrote:
> This patchset extends the kernel memory management subsystem with a new
> type of address spaces (called VAS) which can be created and destroyed
> independently of processes by a user in the system. During its lifetime
> such a VAS can be attached to processes by the user which allows a process
> to have multiple address spaces and thereby multiple, potentially
> different, views on the system's main memory. During its execution the
> threads belonging to the process are able to switch freely between the
> different attached VAS and the process' original AS enabling them to
> utilize the different available views on the memory.

Sounds like the old SKAS feature for UML.

> In addition to the concept of first class virtual address spaces, this
> patchset introduces yet another feature called VAS segments. VAS segments
> are memory regions which have a fixed size and position in the virtual
> address space and can be shared between multiple first class virtual
> address spaces. Such shareable memory regions are especially useful for
> in-memory pointer-based data structures or other pure in-memory data.

This sounds rather complicated.  Getting TLB flushing right seems
tricky.  Why not just map the same thing into multiple mms?

>
>             |     VAS     |  processes  |
>     -------------------------------------
>     switch  |       468ns |      1944ns |

The solution here is IMO to fix the scheduler.

Also, FWIW, I have patches (that need a little work) that will make
switch_mm() waaaay faster on x86.

> At the current state of the development, first class virtual address spaces
> have one limitation, that we haven't been able to solve so far. The feature
> allows, that different threads of the same process can execute in different
> AS at the same time. This is possible, because the VAS-switch operation
> only changes the active mm_struct for the task_struct of the calling
> thread. However, when a thread switches into a first class virtual address
> space, some parts of its original AS are duplicated into the new one to
> allow the thread to continue its execution at its current state.

Ick.  Please don't do this.  Can we please keep an mm as just an mm
and not make it look magically different depending on which process
maps it?  If you need a trampoline (which you do, of course), just
write a trampoline in regular user code and map it manually.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
  2017-03-14  0:18 ` Richard Henderson
@ 2017-03-14  0:39   ` Till Smejkal
  0 siblings, 0 replies; 17+ messages in thread
From: Till Smejkal @ 2017-03-14  0:39 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Till Smejkal, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, David S. Miller

On Tue, 14 Mar 2017, Richard Henderson wrote:
> On 03/14/2017 08:14 AM, Till Smejkal wrote:
> > At the current state of the development, first class virtual address spaces
> > have one limitation, that we haven't been able to solve so far. The feature
> > allows, that different threads of the same process can execute in different
> > AS at the same time. This is possible, because the VAS-switch operation
> > only changes the active mm_struct for the task_struct of the calling
> > thread. However, when a thread switches into a first class virtual address
> > space, some parts of its original AS are duplicated into the new one to
> > allow the thread to continue its execution at its current state.
> > Accordingly, parts of the processes AS (e.g. the code section, data
> > section, heap section and stack sections) exist in multiple AS if the
> > process has a VAS attached to it. Changes to these shared memory regions
> > are synchronized between the address spaces whenever a thread switches
> > between two of them. Unfortunately, in some scenarios the kernel is not
> > able to properly synchronize all these shared memory regions because of
> > conflicting changes. One such example happens if there are two threads, one
> > executing in an attached first class virtual address space, the other in
> > the tasks original address space. If both threads make changes to the heap
> > section that cause expansion of the underlying vm_area_struct, the kernel
> > cannot correctly synchronize these changes, because that would cause parts
> > of the virtual address space to be overwritten with unrelated data. In the
> > current implementation such conflicts are only detected but not resolved
> > and result in an error code being returned by the kernel during the VAS
> > switch operation. Unfortunately, that means for the particular thread that
> > tried to make the switch, that it cannot do this anymore in the future and
> > accordingly has to be killed.
> 
> This sounds like a fairly fundamental problem to me.

Yes I agree. This is a significant limitation of first class virtual address spaces.
However, conflict like this can be mitigated by being careful in the application
that uses multiple first class virtual address spaces. If all threads make sure that
they never resize shared memory regions when executing inside a VAS such conflicts do
not occur. Another possibility that I investigated but not yet finished is that such
resizes of shared memory regions have to be synchronized more frequently than just at
every switch between VASes. If one for example "forward" memory region resizes to all
AS that share this particular memory region during the resize operation, one can
completely eliminate this problem. Unfortunately, this introduces a significant cost
and introduces a difficult to handle race condition.

> Is this an indication that full virtual address spaces are useless?  It
> would seem like if you only use virtual address segments then you avoid all
> of the problems with executing code, active stacks, and brk.

What do you mean with *virtual address segments*? The nice part of first class
virtual address spaces is that one can share/reuse collections of address space
segments easily.

Till

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
  2017-03-13 22:14 Till Smejkal
@ 2017-03-14  0:18 ` Richard Henderson
  2017-03-14  0:39   ` Till Smejkal
  2017-03-14  0:58 ` Andy Lutomirski
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2017-03-14  0:18 UTC (permalink / raw)
  To: Till Smejkal, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky, Heiko
  Cc: linux-kernel, linux-alpha, linux-snps-arc, linux-arm-kernel,
	adi-buildroot-devel, linux-hexagon, linux-ia64, linux-metag,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-media, linux-mtd, linux-usb,
	linux-fsdevel, linux-aio, linux-mm, linux-api, linux-arch,
	alsa-devel

On 03/14/2017 08:14 AM, Till Smejkal wrote:
> At the current state of the development, first class virtual address spaces
> have one limitation, that we haven't been able to solve so far. The feature
> allows, that different threads of the same process can execute in different
> AS at the same time. This is possible, because the VAS-switch operation
> only changes the active mm_struct for the task_struct of the calling
> thread. However, when a thread switches into a first class virtual address
> space, some parts of its original AS are duplicated into the new one to
> allow the thread to continue its execution at its current state.
> Accordingly, parts of the processes AS (e.g. the code section, data
> section, heap section and stack sections) exist in multiple AS if the
> process has a VAS attached to it. Changes to these shared memory regions
> are synchronized between the address spaces whenever a thread switches
> between two of them. Unfortunately, in some scenarios the kernel is not
> able to properly synchronize all these shared memory regions because of
> conflicting changes. One such example happens if there are two threads, one
> executing in an attached first class virtual address space, the other in
> the tasks original address space. If both threads make changes to the heap
> section that cause expansion of the underlying vm_area_struct, the kernel
> cannot correctly synchronize these changes, because that would cause parts
> of the virtual address space to be overwritten with unrelated data. In the
> current implementation such conflicts are only detected but not resolved
> and result in an error code being returned by the kernel during the VAS
> switch operation. Unfortunately, that means for the particular thread that
> tried to make the switch, that it cannot do this anymore in the future and
> accordingly has to be killed.

This sounds like a fairly fundamental problem to me.

Is this an indication that full virtual address spaces are useless?  It would 
seem like if you only use virtual address segments then you avoid all of the 
problems with executing code, active stacks, and brk.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 00/13] Introduce first class virtual address spaces
@ 2017-03-13 22:14 Till Smejkal
  2017-03-14  0:18 ` Richard Henderson
  2017-03-14  0:58 ` Andy Lutomirski
  0 siblings, 2 replies; 17+ messages in thread
From: Till Smejkal @ 2017-03-13 22:14 UTC (permalink / raw)
  To: Richard Henderson, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, David S. Miller
  Cc: linux-kernel, linux-alpha, linux-snps-arc, linux-arm-kernel,
	adi-buildroot-devel, linux-hexagon, linux-ia64, linux-metag,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-media, linux-mtd, linux-usb,
	linux-fsdevel, linux-aio, linux-mm, linux-api, linux-arch,
	alsa-devel

First class virtual address spaces (also called VAS) are a new functionality of
the Linux kernel allowing address spaces to exist independently of processes.
The general idea behind this feature is described in a paper at ASPLOS16 with
the title 'SpaceJMP: Programming with Multiple Virtual Address Spaces' [1].

This patchset extends the kernel memory management subsystem with a new
type of address spaces (called VAS) which can be created and destroyed
independently of processes by a user in the system. During its lifetime
such a VAS can be attached to processes by the user which allows a process
to have multiple address spaces and thereby multiple, potentially
different, views on the system's main memory. During its execution the
threads belonging to the process are able to switch freely between the
different attached VAS and the process' original AS enabling them to
utilize the different available views on the memory. These multiple virtual
address spaces per process and the possibility to switch between them
freely can be used in multiple interesting ways as also outlined in the
mentioned paper. Some of the many possible applications are for example to
compartmentalize a process for security reasons, to improve the performance
of data-centric applications and to introduce new application models [1].

In addition to the concept of first class virtual address spaces, this
patchset introduces yet another feature called VAS segments. VAS segments
are memory regions which have a fixed size and position in the virtual
address space and can be shared between multiple first class virtual
address spaces. Such shareable memory regions are especially useful for
in-memory pointer-based data structures or other pure in-memory data.

First class virtual address spaces have a significant advantage compared to
forking a process and using inter process communication mechanism, namely
that creating and switching between VAS is significant faster than creating
and switching between processes. As it can be seen in the following table,
measured on an Intel Xeon E5620 CPU with 2.40GHz, creating a VAS is about 7
times faster than forking and switching between VAS is up to 4 times faster
than switching between processes.

            |     VAS     |  processes  |
    -------------------------------------
    switch  |       468ns |      1944ns |
    create  |     20003ns |    150491ns |

Hence, first class virtual address spaces provide a fast mechanism for
applications to utilize multiple virtual address spaces in parallel with a
higher performance than splitting up the application into multiple
independent processes.

Both VAS and VAS segments have another significant advantage when combined
with non-volatile memory. Because of their independent life cycle from
processes and other kernel data structures, they can be used to save
special memory regions or even whole AS into non-volatile memory making it
possible to reuse them across multiple system reboots.

At the current state of the development, first class virtual address spaces
have one limitation, that we haven't been able to solve so far. The feature
allows, that different threads of the same process can execute in different
AS at the same time. This is possible, because the VAS-switch operation
only changes the active mm_struct for the task_struct of the calling
thread. However, when a thread switches into a first class virtual address
space, some parts of its original AS are duplicated into the new one to
allow the thread to continue its execution at its current state.
Accordingly, parts of the processes AS (e.g. the code section, data
section, heap section and stack sections) exist in multiple AS if the
process has a VAS attached to it. Changes to these shared memory regions
are synchronized between the address spaces whenever a thread switches
between two of them. Unfortunately, in some scenarios the kernel is not
able to properly synchronize all these shared memory regions because of
conflicting changes. One such example happens if there are two threads, one
executing in an attached first class virtual address space, the other in
the tasks original address space. If both threads make changes to the heap
section that cause expansion of the underlying vm_area_struct, the kernel
cannot correctly synchronize these changes, because that would cause parts
of the virtual address space to be overwritten with unrelated data. In the
current implementation such conflicts are only detected but not resolved
and result in an error code being returned by the kernel during the VAS
switch operation. Unfortunately, that means for the particular thread that
tried to make the switch, that it cannot do this anymore in the future and
accordingly has to be killed.

This code was developed during an internship at Hewlett Packard Enterprise.

[1] http://impact.crhc.illinois.edu/shared/Papers/ASPLOS16-SpaceJMP.pdf

Till Smejkal (13):
  mm: Add mm_struct argument to 'mmap_region'
  mm: Add mm_struct argument to 'do_mmap' and 'do_mmap_pgoff'
  mm: Rename 'unmap_region' and add mm_struct argument
  mm: Add mm_struct argument to 'get_unmapped_area' and
    'vm_unmapped_area'
  mm: Add mm_struct argument to 'mm_populate' and '__mm_populate'
  mm/mmap: Export 'vma_link' and 'find_vma_links' to mm subsystem
  kernel/fork: Split and export 'mm_alloc' and 'mm_init'
  kernel/fork: Define explicitly which mm_struct to duplicate during
    fork
  mm/memory: Add function to one-to-one duplicate page ranges
  mm: Introduce first class virtual address spaces
  mm/vas: Introduce VAS segments - shareable address space regions
  mm/vas: Add lazy-attach support for first class virtual address spaces
  fs/proc: Add procfs support for first class virtual address spaces

 MAINTAINERS                                  |   10 +
 arch/alpha/kernel/osf_sys.c                  |   19 +-
 arch/arc/mm/mmap.c                           |    8 +-
 arch/arm/kernel/process.c                    |    2 +-
 arch/arm/mach-rpc/ecard.c                    |    2 +-
 arch/arm/mm/mmap.c                           |   19 +-
 arch/arm64/kernel/vdso.c                     |    2 +-
 arch/blackfin/include/asm/pgtable.h          |    3 +-
 arch/blackfin/kernel/sys_bfin.c              |    5 +-
 arch/frv/mm/elf-fdpic.c                      |   11 +-
 arch/hexagon/kernel/vdso.c                   |    2 +-
 arch/ia64/kernel/perfmon.c                   |    3 +-
 arch/ia64/kernel/sys_ia64.c                  |    6 +-
 arch/ia64/mm/hugetlbpage.c                   |    7 +-
 arch/metag/mm/hugetlbpage.c                  |   11 +-
 arch/mips/kernel/vdso.c                      |    4 +-
 arch/mips/mm/mmap.c                          |   27 +-
 arch/parisc/kernel/sys_parisc.c              |   19 +-
 arch/parisc/mm/hugetlbpage.c                 |    7 +-
 arch/powerpc/include/asm/book3s/64/hugetlb.h |    6 +-
 arch/powerpc/include/asm/page_64.h           |    3 +-
 arch/powerpc/kernel/vdso.c                   |    2 +-
 arch/powerpc/mm/hugetlbpage-radix.c          |    9 +-
 arch/powerpc/mm/hugetlbpage.c                |    9 +-
 arch/powerpc/mm/mmap.c                       |   17 +-
 arch/powerpc/mm/slice.c                      |   25 +-
 arch/s390/kernel/vdso.c                      |    3 +-
 arch/s390/mm/mmap.c                          |   42 +-
 arch/sh/kernel/vsyscall/vsyscall.c           |    2 +-
 arch/sh/mm/mmap.c                            |   19 +-
 arch/sparc/include/asm/pgtable_64.h          |    4 +-
 arch/sparc/kernel/sys_sparc_32.c             |    6 +-
 arch/sparc/kernel/sys_sparc_64.c             |   31 +-
 arch/sparc/mm/hugetlbpage.c                  |   26 +-
 arch/tile/kernel/vdso.c                      |    2 +-
 arch/tile/mm/elf.c                           |    2 +-
 arch/tile/mm/hugetlbpage.c                   |   26 +-
 arch/x86/entry/syscalls/syscall_32.tbl       |   16 +
 arch/x86/entry/syscalls/syscall_64.tbl       |   16 +
 arch/x86/entry/vdso/vma.c                    |    2 +-
 arch/x86/kernel/sys_x86_64.c                 |   19 +-
 arch/x86/mm/hugetlbpage.c                    |   26 +-
 arch/x86/mm/mpx.c                            |    6 +-
 arch/xtensa/kernel/syscall.c                 |    7 +-
 drivers/char/mem.c                           |   15 +-
 drivers/dax/dax.c                            |   10 +-
 drivers/media/usb/uvc/uvc_v4l2.c             |    6 +-
 drivers/media/v4l2-core/v4l2-dev.c           |    8 +-
 drivers/media/v4l2-core/videobuf2-v4l2.c     |    5 +-
 drivers/mtd/mtdchar.c                        |    3 +-
 drivers/usb/gadget/function/uvc_v4l2.c       |    3 +-
 fs/aio.c                                     |    4 +-
 fs/exec.c                                    |    5 +-
 fs/hugetlbfs/inode.c                         |    8 +-
 fs/proc/base.c                               |  528 ++++
 fs/proc/inode.c                              |   11 +-
 fs/proc/internal.h                           |    1 +
 fs/ramfs/file-mmu.c                          |    5 +-
 fs/ramfs/file-nommu.c                        |   10 +-
 fs/romfs/mmap-nommu.c                        |    3 +-
 include/linux/fs.h                           |    2 +-
 include/linux/huge_mm.h                      |   12 +-
 include/linux/hugetlb.h                      |   10 +-
 include/linux/mm.h                           |   53 +-
 include/linux/mm_types.h                     |   16 +-
 include/linux/sched.h                        |   34 +-
 include/linux/shmem_fs.h                     |    5 +-
 include/linux/syscalls.h                     |   21 +
 include/linux/vas.h                          |  322 +++
 include/linux/vas_types.h                    |  173 ++
 include/media/v4l2-dev.h                     |    3 +-
 include/media/videobuf2-v4l2.h               |    5 +-
 include/uapi/asm-generic/unistd.h            |   34 +-
 include/uapi/linux/Kbuild                    |    1 +
 include/uapi/linux/vas.h                     |   28 +
 init/main.c                                  |    2 +
 ipc/shm.c                                    |   22 +-
 kernel/events/uprobes.c                      |    2 +-
 kernel/exit.c                                |    2 +
 kernel/fork.c                                |   99 +-
 kernel/sys_ni.c                              |   18 +
 mm/Kconfig                                   |   47 +
 mm/Makefile                                  |    1 +
 mm/gup.c                                     |    4 +-
 mm/huge_memory.c                             |   83 +-
 mm/hugetlb.c                                 |  205 +-
 mm/internal.h                                |   19 +
 mm/memory.c                                  |  469 +++-
 mm/mlock.c                                   |   21 +-
 mm/mmap.c                                    |  124 +-
 mm/mremap.c                                  |   13 +-
 mm/nommu.c                                   |   17 +-
 mm/shmem.c                                   |   14 +-
 mm/util.c                                    |    4 +-
 mm/vas.c                                     | 3466 ++++++++++++++++++++++++++
 sound/core/pcm_native.c                      |    3 +-
 96 files changed, 5927 insertions(+), 545 deletions(-)
 create mode 100644 include/linux/vas.h
 create mode 100644 include/linux/vas_types.h
 create mode 100644 include/uapi/linux/vas.h
 create mode 100644 mm/vas.c

-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-03-16 17:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALCETrXfGgxaLivhci0VL=wUaWAnBiUXC47P7TUaEuOYV_-X_g@mail.gmail.com>
2017-03-15 22:02 ` [RFC PATCH 00/13] Introduce first class virtual address spaces Till Smejkal
2017-03-16  8:21   ` Thomas Gleixner
2017-03-16 17:29     ` Till Smejkal
2017-03-16 17:42       ` Thomas Gleixner
2017-03-16 17:50         ` Till Smejkal
     [not found] <20170315220952.GA1435@intel.com>
2017-03-15 23:18 ` Till Smejkal
     [not found] <20170315194723.GJ1693@brightrain.aerifal.cx>
2017-03-15 21:30 ` Till Smejkal
     [not found] <8d9333d6-2f81-a9de-484e-e1d655e1d3c3@mellanox.com>
2017-03-14 21:14 ` Till Smejkal
     [not found] <CALCETrXKvNWv1OtoSo_HWf5ZHSvyGS1NsuQod6Zt+tEg3MT5Sg@mail.gmail.com>
2017-03-14 16:12 ` Till Smejkal
     [not found]   ` <CALCETrX5gv+zdhOYro4-u3wGWjVCab28DFHPSm5=BVG_hKxy3A@mail.gmail.com>
2017-03-15 16:57     ` Matthew Wilcox
2017-03-15 19:44     ` Till Smejkal
     [not found] <fa60d132-0eef-4350-3c88-1558f447a48f@twiddle.net>
2017-03-14  1:31 ` Till Smejkal
2017-03-13 22:14 Till Smejkal
2017-03-14  0:18 ` Richard Henderson
2017-03-14  0:39   ` Till Smejkal
2017-03-14  0:58 ` Andy Lutomirski
2017-03-14  2:07   ` Till Smejkal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).