linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/13] Introduce first class virtual address spaces
@ 2017-03-13 22:14 Till Smejkal
  2017-03-13 22:14 ` [RFC PATCH 01/13] mm: Add mm_struct argument to 'mmap_region' Till Smejkal
                   ` (14 more replies)
  0 siblings, 15 replies; 45+ messages in thread
From: Till Smejkal @ 2017-03-13 22:14 UTC (permalink / raw)
  To: Richard Henderson, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, David S. Miller,
	Chris Metcalf, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andy Lutomirski, Chris Zankel, Max Filippov, Arnd Bergmann,
	Greg Kroah-Hartman, Laurent Pinchart, Mauro Carvalho Chehab,
	Pawel Osciak, Marek Szyprowski, Kyungmin Park, David Woodhouse,
	Brian Norris, Boris Brezillon, Marek Vasut, Richard Weinberger,
	Cyrille Pitchen, Felipe Balbi, Alexander Viro, Benjamin LaHaise,
	Nadia Yvette Chambers, Jeff Layton, J. Bruce Fields,
	Peter Zijlstra, Hugh Dickins, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jaroslav Kysela, Takashi Iwai
  Cc: linux-kernel, linux-alpha, linux-snps-arc, linux-arm-kernel,
	adi-buildroot-devel, linux-hexagon, linux-ia64, linux-metag,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-media, linux-mtd, linux-usb,
	linux-fsdevel, linux-aio, linux-mm, linux-api, linux-arch,
	alsa-devel

First class virtual address spaces (also called VAS) are a new functionality of
the Linux kernel allowing address spaces to exist independently of processes.
The general idea behind this feature is described in a paper at ASPLOS16 with
the title 'SpaceJMP: Programming with Multiple Virtual Address Spaces' [1].

This patchset extends the kernel memory management subsystem with a new
type of address spaces (called VAS) which can be created and destroyed
independently of processes by a user in the system. During its lifetime
such a VAS can be attached to processes by the user which allows a process
to have multiple address spaces and thereby multiple, potentially
different, views on the system's main memory. During its execution the
threads belonging to the process are able to switch freely between the
different attached VAS and the process' original AS enabling them to
utilize the different available views on the memory. These multiple virtual
address spaces per process and the possibility to switch between them
freely can be used in multiple interesting ways as also outlined in the
mentioned paper. Some of the many possible applications are for example to
compartmentalize a process for security reasons, to improve the performance
of data-centric applications and to introduce new application models [1].

In addition to the concept of first class virtual address spaces, this
patchset introduces yet another feature called VAS segments. VAS segments
are memory regions which have a fixed size and position in the virtual
address space and can be shared between multiple first class virtual
address spaces. Such shareable memory regions are especially useful for
in-memory pointer-based data structures or other pure in-memory data.

First class virtual address spaces have a significant advantage compared to
forking a process and using inter process communication mechanism, namely
that creating and switching between VAS is significant faster than creating
and switching between processes. As it can be seen in the following table,
measured on an Intel Xeon E5620 CPU with 2.40GHz, creating a VAS is about 7
times faster than forking and switching between VAS is up to 4 times faster
than switching between processes.

            |     VAS     |  processes  |
    -------------------------------------
    switch  |       468ns |      1944ns |
    create  |     20003ns |    150491ns |

Hence, first class virtual address spaces provide a fast mechanism for
applications to utilize multiple virtual address spaces in parallel with a
higher performance than splitting up the application into multiple
independent processes.

Both VAS and VAS segments have another significant advantage when combined
with non-volatile memory. Because of their independent life cycle from
processes and other kernel data structures, they can be used to save
special memory regions or even whole AS into non-volatile memory making it
possible to reuse them across multiple system reboots.

At the current state of the development, first class virtual address spaces
have one limitation, that we haven't been able to solve so far. The feature
allows, that different threads of the same process can execute in different
AS at the same time. This is possible, because the VAS-switch operation
only changes the active mm_struct for the task_struct of the calling
thread. However, when a thread switches into a first class virtual address
space, some parts of its original AS are duplicated into the new one to
allow the thread to continue its execution at its current state.
Accordingly, parts of the processes AS (e.g. the code section, data
section, heap section and stack sections) exist in multiple AS if the
process has a VAS attached to it. Changes to these shared memory regions
are synchronized between the address spaces whenever a thread switches
between two of them. Unfortunately, in some scenarios the kernel is not
able to properly synchronize all these shared memory regions because of
conflicting changes. One such example happens if there are two threads, one
executing in an attached first class virtual address space, the other in
the tasks original address space. If both threads make changes to the heap
section that cause expansion of the underlying vm_area_struct, the kernel
cannot correctly synchronize these changes, because that would cause parts
of the virtual address space to be overwritten with unrelated data. In the
current implementation such conflicts are only detected but not resolved
and result in an error code being returned by the kernel during the VAS
switch operation. Unfortunately, that means for the particular thread that
tried to make the switch, that it cannot do this anymore in the future and
accordingly has to be killed.

This code was developed during an internship at Hewlett Packard Enterprise.

[1] http://impact.crhc.illinois.edu/shared/Papers/ASPLOS16-SpaceJMP.pdf

Till Smejkal (13):
  mm: Add mm_struct argument to 'mmap_region'
  mm: Add mm_struct argument to 'do_mmap' and 'do_mmap_pgoff'
  mm: Rename 'unmap_region' and add mm_struct argument
  mm: Add mm_struct argument to 'get_unmapped_area' and
    'vm_unmapped_area'
  mm: Add mm_struct argument to 'mm_populate' and '__mm_populate'
  mm/mmap: Export 'vma_link' and 'find_vma_links' to mm subsystem
  kernel/fork: Split and export 'mm_alloc' and 'mm_init'
  kernel/fork: Define explicitly which mm_struct to duplicate during
    fork
  mm/memory: Add function to one-to-one duplicate page ranges
  mm: Introduce first class virtual address spaces
  mm/vas: Introduce VAS segments - shareable address space regions
  mm/vas: Add lazy-attach support for first class virtual address spaces
  fs/proc: Add procfs support for first class virtual address spaces

 MAINTAINERS                                  |   10 +
 arch/alpha/kernel/osf_sys.c                  |   19 +-
 arch/arc/mm/mmap.c                           |    8 +-
 arch/arm/kernel/process.c                    |    2 +-
 arch/arm/mach-rpc/ecard.c                    |    2 +-
 arch/arm/mm/mmap.c                           |   19 +-
 arch/arm64/kernel/vdso.c                     |    2 +-
 arch/blackfin/include/asm/pgtable.h          |    3 +-
 arch/blackfin/kernel/sys_bfin.c              |    5 +-
 arch/frv/mm/elf-fdpic.c                      |   11 +-
 arch/hexagon/kernel/vdso.c                   |    2 +-
 arch/ia64/kernel/perfmon.c                   |    3 +-
 arch/ia64/kernel/sys_ia64.c                  |    6 +-
 arch/ia64/mm/hugetlbpage.c                   |    7 +-
 arch/metag/mm/hugetlbpage.c                  |   11 +-
 arch/mips/kernel/vdso.c                      |    4 +-
 arch/mips/mm/mmap.c                          |   27 +-
 arch/parisc/kernel/sys_parisc.c              |   19 +-
 arch/parisc/mm/hugetlbpage.c                 |    7 +-
 arch/powerpc/include/asm/book3s/64/hugetlb.h |    6 +-
 arch/powerpc/include/asm/page_64.h           |    3 +-
 arch/powerpc/kernel/vdso.c                   |    2 +-
 arch/powerpc/mm/hugetlbpage-radix.c          |    9 +-
 arch/powerpc/mm/hugetlbpage.c                |    9 +-
 arch/powerpc/mm/mmap.c                       |   17 +-
 arch/powerpc/mm/slice.c                      |   25 +-
 arch/s390/kernel/vdso.c                      |    3 +-
 arch/s390/mm/mmap.c                          |   42 +-
 arch/sh/kernel/vsyscall/vsyscall.c           |    2 +-
 arch/sh/mm/mmap.c                            |   19 +-
 arch/sparc/include/asm/pgtable_64.h          |    4 +-
 arch/sparc/kernel/sys_sparc_32.c             |    6 +-
 arch/sparc/kernel/sys_sparc_64.c             |   31 +-
 arch/sparc/mm/hugetlbpage.c                  |   26 +-
 arch/tile/kernel/vdso.c                      |    2 +-
 arch/tile/mm/elf.c                           |    2 +-
 arch/tile/mm/hugetlbpage.c                   |   26 +-
 arch/x86/entry/syscalls/syscall_32.tbl       |   16 +
 arch/x86/entry/syscalls/syscall_64.tbl       |   16 +
 arch/x86/entry/vdso/vma.c                    |    2 +-
 arch/x86/kernel/sys_x86_64.c                 |   19 +-
 arch/x86/mm/hugetlbpage.c                    |   26 +-
 arch/x86/mm/mpx.c                            |    6 +-
 arch/xtensa/kernel/syscall.c                 |    7 +-
 drivers/char/mem.c                           |   15 +-
 drivers/dax/dax.c                            |   10 +-
 drivers/media/usb/uvc/uvc_v4l2.c             |    6 +-
 drivers/media/v4l2-core/v4l2-dev.c           |    8 +-
 drivers/media/v4l2-core/videobuf2-v4l2.c     |    5 +-
 drivers/mtd/mtdchar.c                        |    3 +-
 drivers/usb/gadget/function/uvc_v4l2.c       |    3 +-
 fs/aio.c                                     |    4 +-
 fs/exec.c                                    |    5 +-
 fs/hugetlbfs/inode.c                         |    8 +-
 fs/proc/base.c                               |  528 ++++
 fs/proc/inode.c                              |   11 +-
 fs/proc/internal.h                           |    1 +
 fs/ramfs/file-mmu.c                          |    5 +-
 fs/ramfs/file-nommu.c                        |   10 +-
 fs/romfs/mmap-nommu.c                        |    3 +-
 include/linux/fs.h                           |    2 +-
 include/linux/huge_mm.h                      |   12 +-
 include/linux/hugetlb.h                      |   10 +-
 include/linux/mm.h                           |   53 +-
 include/linux/mm_types.h                     |   16 +-
 include/linux/sched.h                        |   34 +-
 include/linux/shmem_fs.h                     |    5 +-
 include/linux/syscalls.h                     |   21 +
 include/linux/vas.h                          |  322 +++
 include/linux/vas_types.h                    |  173 ++
 include/media/v4l2-dev.h                     |    3 +-
 include/media/videobuf2-v4l2.h               |    5 +-
 include/uapi/asm-generic/unistd.h            |   34 +-
 include/uapi/linux/Kbuild                    |    1 +
 include/uapi/linux/vas.h                     |   28 +
 init/main.c                                  |    2 +
 ipc/shm.c                                    |   22 +-
 kernel/events/uprobes.c                      |    2 +-
 kernel/exit.c                                |    2 +
 kernel/fork.c                                |   99 +-
 kernel/sys_ni.c                              |   18 +
 mm/Kconfig                                   |   47 +
 mm/Makefile                                  |    1 +
 mm/gup.c                                     |    4 +-
 mm/huge_memory.c                             |   83 +-
 mm/hugetlb.c                                 |  205 +-
 mm/internal.h                                |   19 +
 mm/memory.c                                  |  469 +++-
 mm/mlock.c                                   |   21 +-
 mm/mmap.c                                    |  124 +-
 mm/mremap.c                                  |   13 +-
 mm/nommu.c                                   |   17 +-
 mm/shmem.c                                   |   14 +-
 mm/util.c                                    |    4 +-
 mm/vas.c                                     | 3466 ++++++++++++++++++++++++++
 sound/core/pcm_native.c                      |    3 +-
 96 files changed, 5927 insertions(+), 545 deletions(-)
 create mode 100644 include/linux/vas.h
 create mode 100644 include/linux/vas_types.h
 create mode 100644 include/uapi/linux/vas.h
 create mode 100644 mm/vas.c

-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2017-03-16 17:50 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-13 22:14 [RFC PATCH 00/13] Introduce first class virtual address spaces Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 01/13] mm: Add mm_struct argument to 'mmap_region' Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 02/13] mm: Add mm_struct argument to 'do_mmap' and 'do_mmap_pgoff' Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 03/13] mm: Rename 'unmap_region' and add mm_struct argument Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 04/13] mm: Add mm_struct argument to 'get_unmapped_area' and 'vm_unmapped_area' Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 05/13] mm: Add mm_struct argument to 'mm_populate' and '__mm_populate' Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 06/13] mm/mmap: Export 'vma_link' and 'find_vma_links' to mm subsystem Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 07/13] kernel/fork: Split and export 'mm_alloc' and 'mm_init' Till Smejkal
2017-03-14 10:18   ` David Laight
2017-03-14 16:18     ` Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 08/13] kernel/fork: Define explicitly which mm_struct to duplicate during fork Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 09/13] mm/memory: Add function to one-to-one duplicate page ranges Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 10/13] mm: Introduce first class virtual address spaces Till Smejkal
2017-03-13 23:52   ` Greg Kroah-Hartman
2017-03-14  0:24     ` Till Smejkal
2017-03-14  1:35   ` Vineet Gupta
2017-03-14  2:34     ` Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 11/13] mm/vas: Introduce VAS segments - shareable address space regions Till Smejkal
2017-03-13 22:27   ` Matthew Wilcox
2017-03-13 22:45     ` Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 12/13] mm/vas: Add lazy-attach support for first class virtual address spaces Till Smejkal
2017-03-13 22:14 ` [RFC PATCH 13/13] fs/proc: Add procfs " Till Smejkal
2017-03-14  0:18 ` [RFC PATCH 00/13] Introduce " Richard Henderson
2017-03-14  0:39   ` Till Smejkal
2017-03-14  1:02     ` Richard Henderson
2017-03-14  1:31       ` Till Smejkal
2017-03-14  0:58 ` Andy Lutomirski
2017-03-14  2:07   ` Till Smejkal
2017-03-14  5:37     ` Andy Lutomirski
2017-03-14 16:12       ` Till Smejkal
2017-03-14 19:53         ` Chris Metcalf
2017-03-14 21:14           ` Till Smejkal
2017-03-15 16:51         ` Andy Lutomirski
2017-03-15 16:57           ` Matthew Wilcox
2017-03-15 19:44           ` Till Smejkal
2017-03-15 19:47             ` Rich Felker
2017-03-15 21:30               ` Till Smejkal
2017-03-15 20:06             ` Andy Lutomirski
2017-03-15 22:02               ` Till Smejkal
2017-03-15 22:09                 ` Luck, Tony
2017-03-15 23:18                   ` Till Smejkal
2017-03-16  8:21                 ` Thomas Gleixner
2017-03-16 17:29                   ` Till Smejkal
2017-03-16 17:42                     ` Thomas Gleixner
2017-03-16 17:50                       ` Till Smejkal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).