[RFC PATCH 00/13] Introduce first class virtual address spaces

* [RFC PATCH 00/13] Introduce first class virtual address spaces
@ 2017-03-13 22:14 Till Smejkal
  2017-03-13 22:14 ` [RFC PATCH 01/13] mm: Add mm_struct argument to 'mmap_region' Till Smejkal
                   ` (14 more replies)
  0 siblings, 15 replies; 45+ messages in thread
From: Till Smejkal @ 2017-03-13 22:14 UTC (permalink / raw)
  To: Richard Henderson, Ivan Kokshaysky, Matt Turner, Vineet Gupta,
	Russell King, Catalin Marinas, Will Deacon, Steven Miao,
	Richard Kuo, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, David S. Miller,
	Chris Metcalf, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andy Lutomirski, Chris Zankel, Max Filippov, Arnd Bergmann,
	Greg Kroah-Hartman, Laurent Pinchart, Mauro Carvalho Chehab,
	Pawel Osciak, Marek Szyprowski, Kyungmin Park, David Woodhouse,
	Brian Norris, Boris Brezillon, Marek Vasut, Richard Weinberger,
	Cyrille Pitchen, Felipe Balbi, Alexander Viro, Benjamin LaHaise,
	Nadia Yvette Chambers, Jeff Layton, J. Bruce Fields,
	Peter Zijlstra, Hugh Dickins, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jaroslav Kysela, Takashi Iwai
  Cc: linux-kernel, linux-alpha, linux-snps-arc, linux-arm-kernel,
	adi-buildroot-devel, linux-hexagon, linux-ia64, linux-metag,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-media, linux-mtd, linux-usb,
	linux-fsdevel, linux-aio, linux-mm, linux-api, linux-arch,
	alsa-devel

First class virtual address spaces (also called VAS) are a new functionality of
the Linux kernel allowing address spaces to exist independently of processes.
The general idea behind this feature is described in a paper at ASPLOS16 with
the title 'SpaceJMP: Programming with Multiple Virtual Address Spaces' [1].

This patchset extends the kernel memory management subsystem with a new
type of address spaces (called VAS) which can be created and destroyed
independently of processes by a user in the system. During its lifetime
such a VAS can be attached to processes by the user which allows a process
to have multiple address spaces and thereby multiple, potentially
different, views on the system's main memory. During its execution the
threads belonging to the process are able to switch freely between the
different attached VAS and the process' original AS enabling them to
utilize the different available views on the memory. These multiple virtual
address spaces per process and the possibility to switch between them
freely can be used in multiple interesting ways as also outlined in the
mentioned paper. Some of the many possible applications are for example to
compartmentalize a process for security reasons, to improve the performance
of data-centric applications and to introduce new application models [1].

In addition to the concept of first class virtual address spaces, this
patchset introduces yet another feature called VAS segments. VAS segments
are memory regions which have a fixed size and position in the virtual
address space and can be shared between multiple first class virtual
address spaces. Such shareable memory regions are especially useful for
in-memory pointer-based data structures or other pure in-memory data.

First class virtual address spaces have a significant advantage compared to
forking a process and using inter process communication mechanism, namely
that creating and switching between VAS is significant faster than creating
and switching between processes. As it can be seen in the following table,
measured on an Intel Xeon E5620 CPU with 2.40GHz, creating a VAS is about 7
times faster than forking and switching between VAS is up to 4 times faster
than switching between processes.

            |     VAS     |  processes  |
    -------------------------------------
    switch  |       468ns |      1944ns |
    create  |     20003ns |    150491ns |

Hence, first class virtual address spaces provide a fast mechanism for
applications to utilize multiple virtual address spaces in parallel with a
higher performance than splitting up the application into multiple
independent processes.

Both VAS and VAS segments have another significant advantage when combined
with non-volatile memory. Because of their independent life cycle from
processes and other kernel data structures, they can be used to save
special memory regions or even whole AS into non-volatile memory making it
possible to reuse them across multiple system reboots.

At the current state of the development, first class virtual address spaces
have one limitation, that we haven't been able to solve so far. The feature
allows, that different threads of the same process can execute in different
AS at the same time. This is possible, because the VAS-switch operation
only changes the active mm_struct for the task_struct of the calling
thread. However, when a thread switches into a first class virtual address
space, some parts of its original AS are duplicated into the new one to
allow the thread to continue its execution at its current state.
Accordingly, parts of the processes AS (e.g. the code section, data
section, heap section and stack sections) exist in multiple AS if the
process has a VAS attached to it. Changes to these shared memory regions
are synchronized between the address spaces whenever a thread switches
between two of them. Unfortunately, in some scenarios the kernel is not
able to properly synchronize all these shared memory regions because of
conflicting changes. One such example happens if there are two threads, one
executing in an attached first class virtual address space, the other in
the tasks original address space. If both threads make changes to the heap
section that cause expansion of the underlying vm_area_struct, the kernel
cannot correctly synchronize these changes, because that would cause parts
of the virtual address space to be overwritten with unrelated data. In the
current implementation such conflicts are only detected but not resolved
and result in an error code being returned by the kernel during the VAS
switch operation. Unfortunately, that means for the particular thread that
tried to make the switch, that it cannot do this anymore in the future and
accordingly has to be killed.

This code was developed during an internship at Hewlett Packard Enterprise.

[1] http://impact.crhc.illinois.edu/shared/Papers/ASPLOS16-SpaceJMP.pdf

Till Smejkal (13):
  mm: Add mm_struct argument to 'mmap_region'
  mm: Add mm_struct argument to 'do_mmap' and 'do_mmap_pgoff'
  mm: Rename 'unmap_region' and add mm_struct argument
  mm: Add mm_struct argument to 'get_unmapped_area' and
    'vm_unmapped_area'
  mm: Add mm_struct argument to 'mm_populate' and '__mm_populate'
  mm/mmap: Export 'vma_link' and 'find_vma_links' to mm subsystem
  kernel/fork: Split and export 'mm_alloc' and 'mm_init'
  kernel/fork: Define explicitly which mm_struct to duplicate during
    fork
  mm/memory: Add function to one-to-one duplicate page ranges
  mm: Introduce first class virtual address spaces
  mm/vas: Introduce VAS segments - shareable address space regions
  mm/vas: Add lazy-attach support for first class virtual address spaces
  fs/proc: Add procfs support for first class virtual address spaces

 MAINTAINERS                                  |   10 +
 arch/alpha/kernel/osf_sys.c                  |   19 +-
 arch/arc/mm/mmap.c                           |    8 +-
 arch/arm/kernel/process.c                    |    2 +-
 arch/arm/mach-rpc/ecard.c                    |    2 +-
 arch/arm/mm/mmap.c                           |   19 +-
 arch/arm64/kernel/vdso.c                     |    2 +-
 arch/blackfin/include/asm/pgtable.h          |    3 +-
 arch/blackfin/kernel/sys_bfin.c              |    5 +-
 arch/frv/mm/elf-fdpic.c                      |   11 +-
 arch/hexagon/kernel/vdso.c                   |    2 +-
 arch/ia64/kernel/perfmon.c                   |    3 +-
 arch/ia64/kernel/sys_ia64.c                  |    6 +-
 arch/ia64/mm/hugetlbpage.c                   |    7 +-
 arch/metag/mm/hugetlbpage.c                  |   11 +-
 arch/mips/kernel/vdso.c                      |    4 +-
 arch/mips/mm/mmap.c                          |   27 +-
 arch/parisc/kernel/sys_parisc.c              |   19 +-
 arch/parisc/mm/hugetlbpage.c                 |    7 +-
 arch/powerpc/include/asm/book3s/64/hugetlb.h |    6 +-
 arch/powerpc/include/asm/page_64.h           |    3 +-
 arch/powerpc/kernel/vdso.c                   |    2 +-
 arch/powerpc/mm/hugetlbpage-radix.c          |    9 +-
 arch/powerpc/mm/hugetlbpage.c                |    9 +-
 arch/powerpc/mm/mmap.c                       |   17 +-
 arch/powerpc/mm/slice.c                      |   25 +-
 arch/s390/kernel/vdso.c                      |    3 +-
 arch/s390/mm/mmap.c                          |   42 +-
 arch/sh/kernel/vsyscall/vsyscall.c           |    2 +-
 arch/sh/mm/mmap.c                            |   19 +-
 arch/sparc/include/asm/pgtable_64.h          |    4 +-
 arch/sparc/kernel/sys_sparc_32.c             |    6 +-
 arch/sparc/kernel/sys_sparc_64.c             |   31 +-
 arch/sparc/mm/hugetlbpage.c                  |   26 +-
 arch/tile/kernel/vdso.c                      |    2 +-
 arch/tile/mm/elf.c                           |    2 +-
 arch/tile/mm/hugetlbpage.c                   |   26 +-
 arch/x86/entry/syscalls/syscall_32.tbl       |   16 +
 arch/x86/entry/syscalls/syscall_64.tbl       |   16 +
 arch/x86/entry/vdso/vma.c                    |    2 +-
 arch/x86/kernel/sys_x86_64.c                 |   19 +-
 arch/x86/mm/hugetlbpage.c                    |   26 +-
 arch/x86/mm/mpx.c                            |    6 +-
 arch/xtensa/kernel/syscall.c                 |    7 +-
 drivers/char/mem.c                           |   15 +-
 drivers/dax/dax.c                            |   10 +-
 drivers/media/usb/uvc/uvc_v4l2.c             |    6 +-
 drivers/media/v4l2-core/v4l2-dev.c           |    8 +-
 drivers/media/v4l2-core/videobuf2-v4l2.c     |    5 +-
 drivers/mtd/mtdchar.c                        |    3 +-
 drivers/usb/gadget/function/uvc_v4l2.c       |    3 +-
 fs/aio.c                                     |    4 +-
 fs/exec.c                                    |    5 +-
 fs/hugetlbfs/inode.c                         |    8 +-
 fs/proc/base.c                               |  528 ++++
 fs/proc/inode.c                              |   11 +-
 fs/proc/internal.h                           |    1 +
 fs/ramfs/file-mmu.c                          |    5 +-
 fs/ramfs/file-nommu.c                        |   10 +-
 fs/romfs/mmap-nommu.c                        |    3 +-
 include/linux/fs.h                           |    2 +-
 include/linux/huge_mm.h                      |   12 +-
 include/linux/hugetlb.h                      |   10 +-
 include/linux/mm.h                           |   53 +-
 include/linux/mm_types.h                     |   16 +-
 include/linux/sched.h                        |   34 +-
 include/linux/shmem_fs.h                     |    5 +-
 include/linux/syscalls.h                     |   21 +
 include/linux/vas.h                          |  322 +++
 include/linux/vas_types.h                    |  173 ++
 include/media/v4l2-dev.h                     |    3 +-
 include/media/videobuf2-v4l2.h               |    5 +-
 include/uapi/asm-generic/unistd.h            |   34 +-
 include/uapi/linux/Kbuild                    |    1 +
 include/uapi/linux/vas.h                     |   28 +
 init/main.c                                  |    2 +
 ipc/shm.c                                    |   22 +-
 kernel/events/uprobes.c                      |    2 +-
 kernel/exit.c                                |    2 +
 kernel/fork.c                                |   99 +-
 kernel/sys_ni.c                              |   18 +
 mm/Kconfig                                   |   47 +
 mm/Makefile                                  |    1 +
 mm/gup.c                                     |    4 +-
 mm/huge_memory.c                             |   83 +-
 mm/hugetlb.c                                 |  205 +-
 mm/internal.h                                |   19 +
 mm/memory.c                                  |  469 +++-
 mm/mlock.c                                   |   21 +-
 mm/mmap.c                                    |  124 +-
 mm/mremap.c                                  |   13 +-
 mm/nommu.c                                   |   17 +-
 mm/shmem.c                                   |   14 +-
 mm/util.c                                    |    4 +-
 mm/vas.c                                     | 3466 ++++++++++++++++++++++++++
 sound/core/pcm_native.c                      |    3 +-
 96 files changed, 5927 insertions(+), 545 deletions(-)
 create mode 100644 include/linux/vas.h
 create mode 100644 include/linux/vas_types.h
 create mode 100644 include/uapi/linux/vas.h
 create mode 100644 mm/vas.c

-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread