linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] vm: add a syscall to map a process memory into a pipe
@ 2017-11-27  7:19 Mike Rapoport
  2017-11-27  7:19 ` [PATCH v4 1/4] fs/splice: introduce pages_to_pipe helper Mike Rapoport
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Mike Rapoport @ 2017-11-27  7:19 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu,
	Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner,
	Josh Triplett, Jann Horn, Greg KH, Andrei Vagin, Mike Rapoport

Hi,

This patches introduces new process_vmsplice system call that combines
functionality of process_vm_read and vmsplice.

It allows to map the memory of another process into a pipe, similarly to
what vmsplice does for its own address space.

The patch 2/4 ("vm: add a syscall to map a process memory into a pipe")
actually adds the new system call and provides its elaborate description.

The patchset is against -mm tree.

v4: skip test when process_vmsplice syscall is not available
v3: minor refactoring to reduce code duplication
v2: move this syscall under CONFIG_CROSS_MEMORY_ATTACH
    give correct flags to get_user_pages_remote()

Andrei Vagin (3):
  vm: add a syscall to map a process memory into a pipe
  x86: wire up the process_vmsplice syscall
  test: add a test for the process_vmsplice syscall

Mike Rapoport (1):
  fs/splice: introduce pages_to_pipe helper

 arch/x86/entry/syscalls/syscall_32.tbl             |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl             |   2 +
 fs/splice.c                                        | 262 +++++++++++++++++++--
 include/linux/compat.h                             |   3 +
 include/linux/syscalls.h                           |   4 +
 include/uapi/asm-generic/unistd.h                  |   5 +-
 kernel/sys_ni.c                                    |   2 +
 tools/testing/selftests/process_vmsplice/Makefile  |   5 +
 .../process_vmsplice/process_vmsplice_test.c       | 196 +++++++++++++++
 9 files changed, 458 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/process_vmsplice/Makefile
 create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/4] fs/splice: introduce pages_to_pipe helper
  2017-11-27  7:19 [PATCH v4 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
@ 2017-11-27  7:19 ` Mike Rapoport
  2017-11-27  7:19 ` [PATCH v4 2/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Mike Rapoport @ 2017-11-27  7:19 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu,
	Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner,
	Josh Triplett, Jann Horn, Greg KH, Andrei Vagin, Mike Rapoport

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 fs/splice.c | 57 ++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 39e2dc0..7f1ffc5 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1185,6 +1185,36 @@ static long do_splice(struct file *in, loff_t __user *off_in,
 	return -EINVAL;
 }
 
+static int pages_to_pipe(struct page **pages, struct pipe_inode_info *pipe,
+			 struct pipe_buffer *buf, size_t *total,
+			 ssize_t copied, size_t start)
+{
+	bool failed = false;
+	size_t len = 0;
+	int ret = 0;
+	int n;
+
+	for (n = 0; copied; n++, start = 0) {
+		int size = min_t(int, copied, PAGE_SIZE - start);
+		if (!failed) {
+			buf->page = pages[n];
+			buf->offset = start;
+			buf->len = size;
+			ret = add_to_pipe(pipe, buf);
+			if (unlikely(ret < 0))
+				failed = true;
+			else
+				len += ret;
+		} else {
+			put_page(pages[n]);
+		}
+		copied -= size;
+	}
+
+	*total += len;
+	return failed ? ret : len;
+}
+
 static int iter_to_pipe(struct iov_iter *from,
 			struct pipe_inode_info *pipe,
 			unsigned flags)
@@ -1195,13 +1225,11 @@ static int iter_to_pipe(struct iov_iter *from,
 	};
 	size_t total = 0;
 	int ret = 0;
-	bool failed = false;
 
-	while (iov_iter_count(from) && !failed) {
+	while (iov_iter_count(from)) {
 		struct page *pages[16];
 		ssize_t copied;
 		size_t start;
-		int n;
 
 		copied = iov_iter_get_pages(from, pages, ~0UL, 16, &start);
 		if (copied <= 0) {
@@ -1209,24 +1237,11 @@ static int iter_to_pipe(struct iov_iter *from,
 			break;
 		}
 
-		for (n = 0; copied; n++, start = 0) {
-			int size = min_t(int, copied, PAGE_SIZE - start);
-			if (!failed) {
-				buf.page = pages[n];
-				buf.offset = start;
-				buf.len = size;
-				ret = add_to_pipe(pipe, &buf);
-				if (unlikely(ret < 0)) {
-					failed = true;
-				} else {
-					iov_iter_advance(from, ret);
-					total += ret;
-				}
-			} else {
-				put_page(pages[n]);
-			}
-			copied -= size;
-		}
+		ret = pages_to_pipe(pages, pipe, &buf, &total, copied, start);
+		if (unlikely(ret < 0))
+			break;
+
+		iov_iter_advance(from, ret);
 	}
 	return total ? total : ret;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/4] vm: add a syscall to map a process memory into a pipe
  2017-11-27  7:19 [PATCH v4 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
  2017-11-27  7:19 ` [PATCH v4 1/4] fs/splice: introduce pages_to_pipe helper Mike Rapoport
@ 2017-11-27  7:19 ` Mike Rapoport
  2017-11-27 23:42   ` Andrew Morton
  2017-11-27  7:19 ` [PATCH v4 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Mike Rapoport @ 2017-11-27  7:19 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu,
	Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner,
	Josh Triplett, Jann Horn, Greg KH, Andrei Vagin, Mike Rapoport,
	Andrei Vagin

From: Andrei Vagin <avagin@virtuozzo.com>

It is a hybrid of process_vm_readv() and vmsplice().

vmsplice can map memory from a current address space into a pipe.
process_vm_readv can read memory of another process.

A new system call can map memory of another process into a pipe.

ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov,
                        unsigned long nr_segs, unsigned int flags)

All arguments are identical with vmsplice except pid which specifies a
target process.

Currently if we want to dump a process memory to a file or to a socket,
we can use process_vm_readv() + write(), but it works slow, because data
are copied into a temporary user-space buffer.

A second way is to use vmsplice() + splice(). It is more effective,
because data are not copied into a temporary buffer, but here is another
problem. vmsplice works with the currect address space, so it can be
used only if we inject our code into a target process.

The second way suffers from a few other issues:
* a process has to be stopped to run a parasite code
* a number of pipes is limited, so it may be impossible to dump all
  memory in one iteration, and we have to stop process and inject our
  code a few times.
* pages in pipes are unreclaimable, so it isn't good to hold a lot of
  memory in pipes.

The introduced syscall allows to use a second way without injecting any
code into a target process.

My experiments shows that process_vmsplice() + splice() works two time
faster than process_vm_readv() + write().

It is particularly useful on a pre-dump stage. On this stage we enable a
memory tracker, and then we are dumping  a process memory while a
process continues work. On the first iteration we are dumping all
memory, and then we are dumpung only modified memory from a previous
iteration.  After a few pre-dump operations, a process is stopped and
dumped finally. The pre-dump operations allow to significantly decrease
a process downtime, when a process is migrated to another host.

v2: move this syscall under CONFIG_CROSS_MEMORY_ATTACH
    give correct flags to get_user_pages_remote()

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 fs/splice.c                       | 205 ++++++++++++++++++++++++++++++++++++++
 include/linux/compat.h            |   3 +
 include/linux/syscalls.h          |   4 +
 include/uapi/asm-generic/unistd.h |   5 +-
 kernel/sys_ni.c                   |   2 +
 5 files changed, 218 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 7f1ffc5..72397d2 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -34,6 +34,7 @@
 #include <linux/socket.h>
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
+#include <linux/sched/mm.h>
 
 #include "internal.h"
 
@@ -1373,6 +1374,210 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, iov,
 	return error;
 }
 
+#ifdef CONFIG_CROSS_MEMORY_ATTACH
+/*
+ * Map pages from a specified task into a pipe
+ */
+static int remote_single_vec_to_pipe(struct task_struct *task,
+			struct mm_struct *mm,
+			const struct iovec *rvec,
+			struct pipe_inode_info *pipe,
+			unsigned int flags,
+			size_t *total)
+{
+	struct pipe_buffer buf = {
+		.ops = &user_page_pipe_buf_ops,
+		.flags = flags
+	};
+	unsigned long addr = (unsigned long) rvec->iov_base;
+	unsigned long pa = addr & PAGE_MASK;
+	unsigned long start_offset = addr - pa;
+	unsigned long nr_pages;
+	ssize_t len = rvec->iov_len;
+	struct page *process_pages[16];
+	bool failed = false;
+	int ret = 0;
+
+	nr_pages = (addr + len - 1) / PAGE_SIZE - addr / PAGE_SIZE + 1;
+	while (nr_pages) {
+		long pages = min(nr_pages, 16UL);
+		int locked = 1;
+		ssize_t copied;
+
+		/*
+		 * Get the pages we're interested in.  We must
+		 * access remotely because task/mm might not
+		 * current/current->mm
+		 */
+		down_read(&mm->mmap_sem);
+		pages = get_user_pages_remote(task, mm, pa, pages, 0,
+					      process_pages, NULL, &locked);
+		if (locked)
+			up_read(&mm->mmap_sem);
+		if (pages <= 0) {
+			failed = true;
+			ret = -EFAULT;
+			break;
+		}
+
+		copied = pages * PAGE_SIZE - start_offset;
+		if (copied > len)
+			copied = len;
+		len -= copied;
+
+		ret = pages_to_pipe(process_pages, pipe, &buf, total, copied,
+				    start_offset);
+		if (unlikely(ret < 0))
+			break;
+
+		start_offset = 0;
+		nr_pages -= pages;
+		pa += pages * PAGE_SIZE;
+	}
+	return ret < 0 ? ret : 0;
+}
+
+static ssize_t remote_iovec_to_pipe(struct task_struct *task,
+			struct mm_struct *mm,
+			const struct iovec *rvec,
+			unsigned long riovcnt,
+			struct pipe_inode_info *pipe,
+			unsigned int flags)
+{
+	size_t total = 0;
+	int ret = 0, i;
+
+	for (i = 0; i < riovcnt; i++) {
+		/* Work out address and page range required */
+		if (rvec[i].iov_len == 0)
+			continue;
+
+		ret = remote_single_vec_to_pipe(
+				task, mm, &rvec[i], pipe, flags, &total);
+		if (ret < 0)
+			break;
+	}
+	return total ? total : ret;
+}
+
+static long process_vmsplice_to_pipe(struct task_struct *task,
+				struct mm_struct *mm, struct file *file,
+				const struct iovec __user *uiov,
+				unsigned long nr_segs, unsigned int flags)
+{
+	struct pipe_inode_info *pipe;
+	struct iovec iovstack[UIO_FASTIOV];
+	struct iovec *iov = iovstack;
+	unsigned int buf_flag = 0;
+	long ret;
+
+	if (flags & SPLICE_F_GIFT)
+		buf_flag = PIPE_BUF_FLAG_GIFT;
+
+	pipe = get_pipe_info(file);
+	if (!pipe)
+		return -EBADF;
+
+	ret = rw_copy_check_uvector(CHECK_IOVEC_ONLY, uiov, nr_segs,
+					UIO_FASTIOV, iovstack, &iov);
+	if (ret < 0)
+		return ret;
+
+	pipe_lock(pipe);
+	ret = wait_for_space(pipe, flags);
+	if (!ret)
+		ret = remote_iovec_to_pipe(task, mm, iov,
+						nr_segs, pipe, buf_flag);
+	pipe_unlock(pipe);
+	if (ret > 0)
+		wakeup_pipe_readers(pipe);
+
+	if (iov != iovstack)
+		kfree(iov);
+	return ret;
+}
+
+/* process_vmsplice splices a process address range into a pipe. */
+SYSCALL_DEFINE5(process_vmsplice, int, pid, int, fd,
+		const struct iovec __user *, iov,
+		unsigned long, nr_segs, unsigned int, flags)
+{
+	struct task_struct *task;
+	struct mm_struct *mm;
+	struct fd f;
+	long ret;
+
+	if (unlikely(flags & ~SPLICE_F_ALL))
+		return -EINVAL;
+	if (unlikely(nr_segs > UIO_MAXIOV))
+		return -EINVAL;
+	else if (unlikely(!nr_segs))
+		return 0;
+
+	f = fdget(fd);
+	if (!f.file)
+		return -EBADF;
+
+	/* Get process information */
+	task = find_get_task_by_vpid(pid);
+	if (!task) {
+		ret = -ESRCH;
+		goto out_fput;
+	}
+
+	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+	if (!mm || IS_ERR(mm)) {
+		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
+		/*
+		 * Explicitly map EACCES to EPERM as EPERM is a more a
+		 * appropriate error code for process_vw_readv/writev
+		 */
+		if (ret == -EACCES)
+			ret = -EPERM;
+		goto put_task_struct;
+	}
+
+	ret = -EBADF;
+	if (f.file->f_mode & FMODE_WRITE)
+		ret = process_vmsplice_to_pipe(task, mm, f.file,
+						iov, nr_segs, flags);
+	mmput(mm);
+
+put_task_struct:
+	put_task_struct(task);
+
+out_fput:
+	fdput(f);
+
+	return ret;
+}
+
+#ifdef CONFIG_COMPAT
+COMPAT_SYSCALL_DEFINE5(process_vmsplice, pid_t, pid, int, fd,
+			const struct compat_iovec __user *, iov32,
+			unsigned int, nr_segs, unsigned int, flags)
+{
+	struct iovec __user *iov;
+	unsigned int i;
+
+	if (nr_segs > UIO_MAXIOV)
+		return -EINVAL;
+
+	iov = compat_alloc_user_space(nr_segs * sizeof(struct iovec));
+	for (i = 0; i < nr_segs; i++) {
+		struct compat_iovec v;
+
+		if (get_user(v.iov_base, &iov32[i].iov_base) ||
+		    get_user(v.iov_len, &iov32[i].iov_len) ||
+		    put_user(compat_ptr(v.iov_base), &iov[i].iov_base) ||
+		    put_user(v.iov_len, &iov[i].iov_len))
+			return -EFAULT;
+	}
+	return sys_process_vmsplice(pid, fd, iov, nr_segs, flags);
+}
+#endif
+#endif /* CONFIG_CROSS_MEMORY_ATTACH */
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, iov32,
 		    unsigned int, nr_segs, unsigned int, flags)
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 0fc3640..11b3753 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -550,6 +550,9 @@ asmlinkage long compat_sys_getdents(unsigned int fd,
 				    unsigned int count);
 asmlinkage long compat_sys_vmsplice(int fd, const struct compat_iovec __user *,
 				    unsigned int nr_segs, unsigned int flags);
+asmlinkage long compat_sys_process_vmsplice(pid_t pid, int fd,
+				    const struct compat_iovec __user *,
+				    unsigned int nr_segs, unsigned int flags);
 asmlinkage long compat_sys_open(const char __user *filename, int flags,
 				umode_t mode);
 asmlinkage long compat_sys_openat(int dfd, const char __user *filename,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d..4ba9333 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -941,4 +941,8 @@ asmlinkage long sys_pkey_free(int pkey);
 asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 
+asmlinkage long sys_process_vmsplice(pid_t pid,
+			int fd, const struct iovec __user *iov,
+			unsigned long nr_segs, unsigned int flags);
+
 #endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8b87de0..37f1832 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,9 +732,12 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
 __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 #define __NR_statx 291
 __SYSCALL(__NR_statx,     sys_statx)
+#define __NR_process_vmsplice 292
+__SC_COMP(__NR_process_vmsplice, sys_process_vmsplice,
+	  compat_sys_process_vmsplice)
 
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index b518976..a939fbb 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -158,8 +158,10 @@ cond_syscall(sys_sysfs);
 cond_syscall(sys_syslog);
 cond_syscall(sys_process_vm_readv);
 cond_syscall(sys_process_vm_writev);
+cond_syscall(sys_process_vmsplice);
 cond_syscall(compat_sys_process_vm_readv);
 cond_syscall(compat_sys_process_vm_writev);
+cond_syscall(compat_sys_process_vmsplice);
 cond_syscall(sys_uselib);
 cond_syscall(sys_fadvise64);
 cond_syscall(sys_fadvise64_64);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/4] x86: wire up the process_vmsplice syscall
  2017-11-27  7:19 [PATCH v4 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
  2017-11-27  7:19 ` [PATCH v4 1/4] fs/splice: introduce pages_to_pipe helper Mike Rapoport
  2017-11-27  7:19 ` [PATCH v4 2/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
@ 2017-11-27  7:19 ` Mike Rapoport
  2017-11-28 12:35   ` kbuild test robot
  2017-11-27  7:19 ` [PATCH v4 4/4] test: add a test for " Mike Rapoport
  2017-11-27  7:20 ` [PATCH] process_vmsplice.2: New page describing process_vmsplice(2) system call Mike Rapoport
  4 siblings, 1 reply; 9+ messages in thread
From: Mike Rapoport @ 2017-11-27  7:19 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu,
	Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner,
	Josh Triplett, Jann Horn, Greg KH, Andrei Vagin, Mike Rapoport

From: Andrei Vagin <avagin@openvz.org>

Signed-off-by: Andrei Vagin <avagin@openvz.org>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac21..dc64bf5 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382	i386	pkey_free		sys_pkey_free
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
+385	i386	process_vmsplice	sys_process_vmsplice		compat_sys_process_vmsplice
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183..d2f916c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330	common	pkey_alloc		sys_pkey_alloc
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
+333	64	process_vmsplice	sys_process_vmsplice
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
@@ -380,3 +381,4 @@
 545	x32	execveat		compat_sys_execveat/ptregs
 546	x32	preadv2			compat_sys_preadv64v2
 547	x32	pwritev2		compat_sys_pwritev64v2
+548	x32	process_vmsplice	compat_sys_process_vmsplice
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/4] test: add a test for the process_vmsplice syscall
  2017-11-27  7:19 [PATCH v4 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
                   ` (2 preceding siblings ...)
  2017-11-27  7:19 ` [PATCH v4 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport
@ 2017-11-27  7:19 ` Mike Rapoport
  2017-11-27  7:20 ` [PATCH] process_vmsplice.2: New page describing process_vmsplice(2) system call Mike Rapoport
  4 siblings, 0 replies; 9+ messages in thread
From: Mike Rapoport @ 2017-11-27  7:19 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu,
	Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner,
	Josh Triplett, Jann Horn, Greg KH, Andrei Vagin, Mike Rapoport

From: Andrei Vagin <avagin@openvz.org>

This test checks that process_vmsplice() can splice pages from a remote
process and returns EFAULT, if process_vmsplice() tries to splice pages
by an unaccessiable address.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 tools/testing/selftests/process_vmsplice/Makefile  |   5 +
 .../process_vmsplice/process_vmsplice_test.c       | 196 +++++++++++++++++++++
 2 files changed, 201 insertions(+)
 create mode 100644 tools/testing/selftests/process_vmsplice/Makefile
 create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c

diff --git a/tools/testing/selftests/process_vmsplice/Makefile b/tools/testing/selftests/process_vmsplice/Makefile
new file mode 100644
index 0000000..246d5a7
--- /dev/null
+++ b/tools/testing/selftests/process_vmsplice/Makefile
@@ -0,0 +1,5 @@
+CFLAGS += -I../../../../usr/include/
+
+TEST_GEN_PROGS := process_vmsplice_test
+
+include ../lib.mk
diff --git a/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c b/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c
new file mode 100644
index 0000000..1682bdb
--- /dev/null
+++ b/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c
@@ -0,0 +1,196 @@
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <fcntl.h>
+#include <sys/uio.h>
+#include <errno.h>
+#include <signal.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#include "../kselftest.h"
+
+#ifndef __NR_process_vmsplice
+#define __NR_process_vmsplice 333
+#endif
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt, \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			KSFT_FAIL; \
+		})
+#define pr_perror(fmt, ...) pr_err(fmt ": %m\n", ##__VA_ARGS__)
+#define fail(fmt, ...) pr_err("FAIL:" fmt, ##__VA_ARGS__)
+
+static ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov,
+			unsigned long nr_segs, unsigned int flags)
+{
+	return syscall(__NR_process_vmsplice, pid, fd, iov, nr_segs, flags);
+
+}
+
+#define MEM_SIZE (4096 * 100)
+#define MEM_WRONLY_SIZE (4096 * 10)
+
+int main(int argc, char **argv)
+{
+	char *addr, *addr_wronly;
+	int p[2];
+	struct iovec iov[2];
+	char buf[4096];
+	int status, ret;
+	pid_t pid;
+
+	ksft_print_header();
+
+	if (process_vmsplice(0, 0, 0, 0, 0)) {
+		if (errno == ENOSYS) {
+			ksft_exit_skip("process_vmsplice is not supported\n");
+			return 0;
+		}
+		return pr_perror("Zero-length process_vmsplice failed");
+	}
+
+	addr = mmap(0, MEM_SIZE, PROT_READ | PROT_WRITE,
+					MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED)
+		return pr_perror("Unable to create a mapping");
+
+	addr_wronly = mmap(0, MEM_WRONLY_SIZE, PROT_WRITE,
+				MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr_wronly == MAP_FAILED)
+		return pr_perror("Unable to create a write-only mapping");
+
+	if (pipe(p))
+		return pr_perror("Unable to create a pipe");
+
+	pid = fork();
+	if (pid < 0)
+		return pr_perror("Unable to fork");
+
+	if (pid == 0) {
+		addr[0] = 'C';
+		addr[4096 + 128] = 'A';
+		addr[4096 + 128 + 4096 - 1] = 'B';
+
+		if (prctl(PR_SET_PDEATHSIG, SIGKILL))
+			return pr_perror("Unable to set PR_SET_PDEATHSIG");
+		if (write(p[1], "c", 1) != 1)
+			return pr_perror("Unable to write data into pipe");
+
+		while (1)
+			sleep(1);
+		return 1;
+	}
+	if (read(p[0], buf, 1) != 1) {
+		pr_perror("Unable to read data from pipe");
+		kill(pid, SIGKILL);
+		wait(&status);
+		return 1;
+	}
+
+	munmap(addr, MEM_SIZE);
+	munmap(addr_wronly, MEM_WRONLY_SIZE);
+
+	iov[0].iov_base = addr;
+	iov[0].iov_len = 1;
+
+	iov[1].iov_base = addr + 4096 + 128;
+	iov[1].iov_len = 4096;
+
+	/* check one iovec */
+	if (process_vmsplice(pid, p[1], iov, 1, SPLICE_F_GIFT) != 1)
+		return pr_perror("Unable to splice pages");
+
+	if (read(p[0], buf, 1) != 1)
+		return pr_perror("Unable to read from pipe");
+
+	if (buf[0] != 'C')
+		ksft_test_result_fail("Get wrong data\n");
+	else
+		ksft_test_result_pass("Check process_vmsplice with one vec\n");
+
+	/* check two iovec-s */
+	if (process_vmsplice(pid, p[1], iov, 2, SPLICE_F_GIFT) != 4097)
+		return pr_perror("Unable to spice pages\n");
+
+	if (read(p[0], buf, 1) != 1)
+		return pr_perror("Unable to read from pipe\n");
+
+	if (buf[0] != 'C')
+		ksft_test_result_fail("Get wrong data\n");
+
+	if (read(p[0], buf, 4096) != 4096)
+		return pr_perror("Unable to read from pipe\n");
+
+	if (buf[0] != 'A' || buf[4095] != 'B')
+		ksft_test_result_fail("Get wrong data\n");
+	else
+		ksft_test_result_pass("check process_vmsplice with two vecs\n");
+
+	/* check how an unreadable region in a second vec is handled */
+	iov[0].iov_base = addr;
+	iov[0].iov_len = 1;
+
+	iov[1].iov_base = addr_wronly + 5;
+	iov[1].iov_len = 1;
+
+	if (process_vmsplice(pid, p[1], iov, 2, SPLICE_F_GIFT) != 1)
+		return pr_perror("Unable to splice data");
+
+	if (read(p[0], buf, 1) != 1)
+		return pr_perror("Unable to read form pipe");
+
+	if (buf[0] != 'C')
+		ksft_test_result_fail("Get wrong data\n");
+	else
+		ksft_test_result_pass("unreadable region in a second vec\n");
+
+	/* check how an unreadable region in a first vec is handled */
+	errno = 0;
+	if (process_vmsplice(pid, p[1], iov + 1, 1, SPLICE_F_GIFT) != -1 ||
+	    errno != EFAULT)
+		ksft_test_result_fail("Got anexpected errno %d\n", errno);
+	else
+		ksft_test_result_pass("splice as much as possible\n");
+
+	iov[0].iov_base = addr;
+	iov[0].iov_len = 1;
+
+	iov[1].iov_base = addr;
+	iov[1].iov_len = MEM_SIZE;
+
+	/* splice as much as possible */
+	ret = process_vmsplice(pid, p[1], iov, 2,
+				SPLICE_F_GIFT | SPLICE_F_NONBLOCK);
+	if (ret != 4096 * 15 + 1) /* by default a pipe can fit 16 pages */
+		return pr_perror("Unable to splice pages");
+
+	while (ret > 0) {
+		int len;
+
+		len = read(p[0], buf, 4096);
+		if (len < 0)
+			return pr_perror("Unable to read data");
+		if (len > ret)
+			return pr_err("Read more than expected\n");
+		ret -= len;
+	}
+	ksft_test_result_pass("splice as much as possible\n");
+
+	if (kill(pid, SIGTERM))
+		return pr_perror("Unable to kill a child process");
+	status = -1;
+	if (wait(&status) < 0)
+		return pr_perror("Unable to wait a child process");
+	if (!WIFSIGNALED(status) || WTERMSIG(status) != SIGTERM)
+		return pr_err("The child exited with an unexpected code %d\n",
+									status);
+
+	if (ksft_get_fail_cnt())
+		return ksft_exit_fail();
+	return ksft_exit_pass();
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] process_vmsplice.2: New page describing process_vmsplice(2) system call.
  2017-11-27  7:19 [PATCH v4 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
                   ` (3 preceding siblings ...)
  2017-11-27  7:19 ` [PATCH v4 4/4] test: add a test for " Mike Rapoport
@ 2017-11-27  7:20 ` Mike Rapoport
  4 siblings, 0 replies; 9+ messages in thread
From: Mike Rapoport @ 2017-11-27  7:20 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Alexander Viro, linux-mm, linux-fsdevel,
	linux-kernel, linux-api, criu, Arnd Bergmann, Pavel Emelyanov,
	Thomas Gleixner, Josh Triplett, Jann Horn, Greg KH, Andrei Vagin,
	Mike Rapoport

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 man2/process_vmsplice.2 | 188 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 188 insertions(+)
 create mode 100644 man2/process_vmsplice.2

diff --git a/man2/process_vmsplice.2 b/man2/process_vmsplice.2
new file mode 100644
index 0000000..b99c06b
--- /dev/null
+++ b/man2/process_vmsplice.2
@@ -0,0 +1,188 @@
+.\" Copyright (c) 2017, IBM Corporation.
+.\" Written by Mike Rapoport <rppt@linux.vnet.ibm.com>
+.\" Based on vmsplice(2) by Jens Axboe and
+.\" process_vm_read(2) by Christopher Yeoh, Mike Frysinger and Michael Kerrisk
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH PROCESS_VMSPLICE 2 2017-11-23 "Linux" "Linux Programmer's Manual"
+.SH NAME
+process_vmsplice \- splice user pages from a specific process
+address space into a pipe
+.SH SYNOPSIS
+.nf
+.BR "#define _GNU_SOURCE" "         /* See feature_test_macros(7) */"
+.B #include <unistd.h>
+.B #include <sys/uio.h>
+.PP
+.BI "ssize_t process_vmsplice(pid_t " pid ", int " fd ,
+.BI "                         const struct iovec *" iov ,
+.BI "                         unsigned long " nr_segs ,
+.BI "                         unsigned int " flags );
+.fi
+.PP
+.IR Note :
+There is no glibc wrapper for this system call; see NOTES.
+.SH DESCRIPTION
+The
+.BR process_vmsplice ()
+system call maps
+.I nr_segs
+ranges of user memory described by
+.I iov
+from address space of the process identified by
+.I pid
+into a pipe.
+The file descriptor
+.I fd
+must refer to a pipe.
+.PP
+The pointer
+.I iov
+points to an array of
+.I iovec
+structures as defined in
+.IR <sys/uio.h> :
+.PP
+.in +4n
+.EX
+struct iovec {
+    void  *iov_base;        /* Starting address */
+    size_t iov_len;         /* Number of bytes */
+};
+.EE
+.in
+.PP
+The
+.I flags
+argument is a bit mask that is composed by ORing together
+zero or more of the following values:
+.RS
+.TP 1.9i
+.B SPLICE_F_MOVE
+Unused for
+.BR process_vmsplice ();
+see
+.BR splice (2).
+.TP
+.B SPLICE_F_NONBLOCK
+Do not block on I/O; see
+.BR splice (2)
+for further details.
+.TP
+.B SPLICE_F_MORE
+Currently has no effect for
+.BR process_vmsplice ()
+.TP
+.B SPLICE_F_GIFT
+The user pages are a gift to the kernel.
+see
+.BR vmsplice (2)
+for further details.
+.RE
+.PP
+Buffers pointed by the
+.I iov
+parameter are processed in array order.
+This means that
+.BR process_vmsplice ()
+completely fills
+.I iov[0]
+before proceeding to
+.IR iov[1] ,
+and so on.
+.PP
+The
+.BR process_vmsplice ()
+does not check the memory regions in the process
+until just before remapping those regions into the pipe.
+Consequently, a partial read may result if one of the
+.I iov
+elements points to an invalid memory region in the process.
+No further reads will be attempted beyond that point.
+.PP
+Permission to read from or write to another process
+is governed by a ptrace access mode
+.B PTRACE_MODE_ATTACH_REALCREDS
+check; see
+.BR ptrace (2).
+.SH RETURN VALUE
+Upon successful completion,
+.BR process_vmsplice ()
+returns the number of bytes transferred to the pipe.
+On error,
+.BR process_vmsplice ()
+returns \-1 and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EAGAIN
+.B SPLICE_F_NONBLOCK
+was specified in
+.IR flags ,
+and the operation would block.
+.TP
+.B EBADF
+.I fd
+either not valid, or doesn't refer to a pipe.
+.TP
+.B EINVAL
+.I nr_segs
+is greater than
+.BR IOV_MAX ;
+or memory not aligned if
+.B SPLICE_F_GIFT
+set.
+.TP
+.B ENOMEM
+Out of memory.
+.TP
+.B ESRCH
+No process with ID
+.I pid
+exists.
+.SH VERSIONS
+The
+.BR process_vmsplice ()
+system call first appeared in Linux 4.15.
+.SH CONFORMING TO
+This system call is Linux-specific.
+.SH NOTES
+Glibc does not provide a wrapper for this system call; call it using
+.BR syscall (2).
+.BR process_vmsplice ()
+follows the other vectorized read/write type functions when it comes to
+limitations on the number of segments being passed in.
+This limit is
+.B IOV_MAX
+as defined in
+.IR <limits.h> .
+Currently,
+.\" UIO_MAXIOV in kernel source
+this limit is 1024.
+.SH SEE ALSO
+.BR process_vm_read (2)
+.BR ptrace (2),
+.BR splice (2),
+.BR pipe (7)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/4] vm: add a syscall to map a process memory into a pipe
  2017-11-27  7:19 ` [PATCH v4 2/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
@ 2017-11-27 23:42   ` Andrew Morton
  2017-11-29  7:42     ` Andrei Vagin
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2017-11-27 23:42 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-kernel, linux-api,
	criu, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk,
	Thomas Gleixner, Josh Triplett, Jann Horn, Greg KH, Andrei Vagin,
	Andrei Vagin

On Mon, 27 Nov 2017 09:19:39 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:

> From: Andrei Vagin <avagin@virtuozzo.com>
> 
> It is a hybrid of process_vm_readv() and vmsplice().
> 
> vmsplice can map memory from a current address space into a pipe.
> process_vm_readv can read memory of another process.
> 
> A new system call can map memory of another process into a pipe.
> 
> ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov,
>                         unsigned long nr_segs, unsigned int flags)
> 
> All arguments are identical with vmsplice except pid which specifies a
> target process.
> 
> Currently if we want to dump a process memory to a file or to a socket,
> we can use process_vm_readv() + write(), but it works slow, because data
> are copied into a temporary user-space buffer.
> 
> A second way is to use vmsplice() + splice(). It is more effective,
> because data are not copied into a temporary buffer, but here is another
> problem. vmsplice works with the currect address space, so it can be
> used only if we inject our code into a target process.
> 
> The second way suffers from a few other issues:
> * a process has to be stopped to run a parasite code
> * a number of pipes is limited, so it may be impossible to dump all
>   memory in one iteration, and we have to stop process and inject our
>   code a few times.
> * pages in pipes are unreclaimable, so it isn't good to hold a lot of
>   memory in pipes.
> 
> The introduced syscall allows to use a second way without injecting any
> code into a target process.
> 
> My experiments shows that process_vmsplice() + splice() works two time
> faster than process_vm_readv() + write().
>
> It is particularly useful on a pre-dump stage. On this stage we enable a
> memory tracker, and then we are dumping  a process memory while a
> process continues work. On the first iteration we are dumping all
> memory, and then we are dumpung only modified memory from a previous
> iteration.  After a few pre-dump operations, a process is stopped and
> dumped finally. The pre-dump operations allow to significantly decrease
> a process downtime, when a process is migrated to another host.

What is the overall improvement in a typical dumping operation?

Does that improvement justify the addition of a new syscall, and all
that this entails?  If so, why?

Are there any other applications of this syscall?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/4] x86: wire up the process_vmsplice syscall
  2017-11-27  7:19 ` [PATCH v4 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport
@ 2017-11-28 12:35   ` kbuild test robot
  0 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2017-11-28 12:35 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: kbuild-all, Andrew Morton, Alexander Viro, linux-mm,
	linux-fsdevel, linux-kernel, linux-api, criu, Arnd Bergmann,
	Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner, Josh Triplett,
	Jann Horn, Greg KH, Andrei Vagin, Mike Rapoport

[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]

Hi Andrei,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.15-rc1 next-20171128]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Mike-Rapoport/vm-add-a-syscall-to-map-a-process-memory-into-a-pipe/20171128-182837
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

>> <stdin>:1332:2: warning: #warning syscall process_vmsplice not implemented [-Wcpp]

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 52627 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/4] vm: add a syscall to map a process memory into a pipe
  2017-11-27 23:42   ` Andrew Morton
@ 2017-11-29  7:42     ` Andrei Vagin
  0 siblings, 0 replies; 9+ messages in thread
From: Andrei Vagin @ 2017-11-29  7:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Rapoport, Alexander Viro, linux-mm, linux-fsdevel,
	linux-kernel, linux-api, criu, Arnd Bergmann, Pavel Emelyanov,
	Michael Kerrisk, Thomas Gleixner, Josh Triplett, Jann Horn,
	Greg KH, Andrei Vagin

On Mon, Nov 27, 2017 at 03:42:49PM -0800, Andrew Morton wrote:
> On Mon, 27 Nov 2017 09:19:39 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> 
> > From: Andrei Vagin <avagin@virtuozzo.com>
> > 
> > It is a hybrid of process_vm_readv() and vmsplice().
> > 
> > vmsplice can map memory from a current address space into a pipe.
> > process_vm_readv can read memory of another process.
> > 
> > A new system call can map memory of another process into a pipe.
> > 
> > ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov,
> >                         unsigned long nr_segs, unsigned int flags)
> > 
> > All arguments are identical with vmsplice except pid which specifies a
> > target process.
> > 
> > Currently if we want to dump a process memory to a file or to a socket,
> > we can use process_vm_readv() + write(), but it works slow, because data
> > are copied into a temporary user-space buffer.
> > 
> > A second way is to use vmsplice() + splice(). It is more effective,
> > because data are not copied into a temporary buffer, but here is another
> > problem. vmsplice works with the currect address space, so it can be
> > used only if we inject our code into a target process.
> > 
> > The second way suffers from a few other issues:
> > * a process has to be stopped to run a parasite code
> > * a number of pipes is limited, so it may be impossible to dump all
> >   memory in one iteration, and we have to stop process and inject our
> >   code a few times.
> > * pages in pipes are unreclaimable, so it isn't good to hold a lot of
> >   memory in pipes.
> > 
> > The introduced syscall allows to use a second way without injecting any
> > code into a target process.
> > 
> > My experiments shows that process_vmsplice() + splice() works two time
> > faster than process_vm_readv() + write().
> >
> > It is particularly useful on a pre-dump stage. On this stage we enable a
> > memory tracker, and then we are dumping  a process memory while a
> > process continues work. On the first iteration we are dumping all
> > memory, and then we are dumpung only modified memory from a previous
> > iteration.  After a few pre-dump operations, a process is stopped and
> > dumped finally. The pre-dump operations allow to significantly decrease
> > a process downtime, when a process is migrated to another host.
> 
> What is the overall improvement in a typical dumping operation?
> 
> Does that improvement justify the addition of a new syscall, and all
> that this entails?  If so, why?

In criu, we have a pre-dump operation, which is used to reduce a process
downtime during live migration of processes. The pre-dump operation
allows to dump memory without stopping processes. On the first
iteration, criu pre-dump dumps the whole memory of processes, on the
second iteration it saves only changed pages after the first pre-dump
and so on.

The primary goal here is to do this operation without a downtime of
processes, or as maximum this downtime has to be as small as possible.

Currently when we are doing pre-dump, we do next steps:

1. stop all processes by ptrace
2. inject a parasite code into each process to call vmsplice
3. read /proc/pid/pagemap and splice all dirty pages into pipes
4. reset the soft-dirty memory tracker
5. resume processes
6. splice memory from pipe to sockets

But this way has a few limitations:

1. We need to inject a parasite code into processes. This operation is
slow, and it requires to stop processes, so we can't do this step many
times. As result, we have to splice the whole memory to pipes at once.

2. A number of pipes are limited, and a size of each pipe is limited

A default limit for a number of file descriptors is 1024.  The reliable
maximum pipe size is 3354624 bytes.

        pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer),
                             GFP_KERNEL_ACCOUNT);

so the maximum pipe size can be calculated by this formula:
(1 << PAGE_ALLOC_COSTLY_ORDER) * PAGE_SIZE / sizeof(struct
kernel_pipe_buffer)) * PAGE_SIZE)

This means that we can dump only 1.5 GB of memory.

The major issue of this way is that we need to inject a parasite code
and we can't do this many times, so we have to splice the whole memory
in one iteration.

With the introduced syscall, we are able to splice memory without a
parasite code and even without stopping processes, so we can dump memory
in a few iterations.

> 
> Are there any other applications of this syscall?
> 


For example, gdb can use it to generate a core file, it can splice
memory of a process into a pipe and then splice it from the pipe to a file.
This method works much faster than using PTRACE_PEEK* commands.

This syscall can be interesting for users of process_vm_readv(), in case
if they read memory to send it to somewhere else.

process_vmsplice() may be useful for debuggers from another side.
process_vmsplice() attaches a real process page to a pipe, so we can
splice it once and observe how it is being changed many times.

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-29  7:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27  7:19 [PATCH v4 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
2017-11-27  7:19 ` [PATCH v4 1/4] fs/splice: introduce pages_to_pipe helper Mike Rapoport
2017-11-27  7:19 ` [PATCH v4 2/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
2017-11-27 23:42   ` Andrew Morton
2017-11-29  7:42     ` Andrei Vagin
2017-11-27  7:19 ` [PATCH v4 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport
2017-11-28 12:35   ` kbuild test robot
2017-11-27  7:19 ` [PATCH v4 4/4] test: add a test for " Mike Rapoport
2017-11-27  7:20 ` [PATCH] process_vmsplice.2: New page describing process_vmsplice(2) system call Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).