linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
@ 2018-01-09  6:30 Mike Rapoport
  2018-01-09  6:30 ` [PATCH v5 1/4] fs/splice: introduce pages_to_pipe helper Mike Rapoport
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Mike Rapoport @ 2018-01-09  6:30 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, gdb,
	devel, rr-dev, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk,
	Thomas Gleixner, Josh Triplett, Jann Horn, Greg KH, Andrei Vagin,
	Mike Rapoport

Hi,

This patches introduces new process_vmsplice system call that combines
functionality of process_vm_read and vmsplice.

It allows to map the memory of another process into a pipe, similarly to
what vmsplice does for its own address space.

The patch 2/4 ("vm: add a syscall to map a process memory into a pipe")
actually adds the new system call and provides its elaborate description.

The patchset is against -mm tree.

v5: update changelog with more elaborate usecase description
v4: skip test when process_vmsplice syscall is not available
v3: minor refactoring to reduce code duplication
v2: move this syscall under CONFIG_CROSS_MEMORY_ATTACH
    give correct flags to get_user_pages_remote()


Andrei Vagin (3):
  vm: add a syscall to map a process memory into a pipe
  x86: wire up the process_vmsplice syscall
  test: add a test for the process_vmsplice syscall

Mike Rapoport (1):
  fs/splice: introduce pages_to_pipe helper

 arch/x86/entry/syscalls/syscall_32.tbl             |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl             |   2 +
 fs/splice.c                                        | 262 +++++++++++++++++++--
 include/linux/compat.h                             |   3 +
 include/linux/syscalls.h                           |   4 +
 include/uapi/asm-generic/unistd.h                  |   5 +-
 kernel/sys_ni.c                                    |   2 +
 tools/testing/selftests/process_vmsplice/Makefile  |   5 +
 .../process_vmsplice/process_vmsplice_test.c       | 196 +++++++++++++++
 9 files changed, 458 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/process_vmsplice/Makefile
 create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5 1/4] fs/splice: introduce pages_to_pipe helper
  2018-01-09  6:30 [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
@ 2018-01-09  6:30 ` Mike Rapoport
  2018-01-09  6:30 ` [PATCH v5 2/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Mike Rapoport @ 2018-01-09  6:30 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, gdb,
	devel, rr-dev, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk,
	Thomas Gleixner, Josh Triplett, Jann Horn, Greg KH, Andrei Vagin,
	Mike Rapoport

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 fs/splice.c | 57 ++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 39e2dc01ac12..7f1ffc50ff1d 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1185,6 +1185,36 @@ static long do_splice(struct file *in, loff_t __user *off_in,
 	return -EINVAL;
 }
 
+static int pages_to_pipe(struct page **pages, struct pipe_inode_info *pipe,
+			 struct pipe_buffer *buf, size_t *total,
+			 ssize_t copied, size_t start)
+{
+	bool failed = false;
+	size_t len = 0;
+	int ret = 0;
+	int n;
+
+	for (n = 0; copied; n++, start = 0) {
+		int size = min_t(int, copied, PAGE_SIZE - start);
+		if (!failed) {
+			buf->page = pages[n];
+			buf->offset = start;
+			buf->len = size;
+			ret = add_to_pipe(pipe, buf);
+			if (unlikely(ret < 0))
+				failed = true;
+			else
+				len += ret;
+		} else {
+			put_page(pages[n]);
+		}
+		copied -= size;
+	}
+
+	*total += len;
+	return failed ? ret : len;
+}
+
 static int iter_to_pipe(struct iov_iter *from,
 			struct pipe_inode_info *pipe,
 			unsigned flags)
@@ -1195,13 +1225,11 @@ static int iter_to_pipe(struct iov_iter *from,
 	};
 	size_t total = 0;
 	int ret = 0;
-	bool failed = false;
 
-	while (iov_iter_count(from) && !failed) {
+	while (iov_iter_count(from)) {
 		struct page *pages[16];
 		ssize_t copied;
 		size_t start;
-		int n;
 
 		copied = iov_iter_get_pages(from, pages, ~0UL, 16, &start);
 		if (copied <= 0) {
@@ -1209,24 +1237,11 @@ static int iter_to_pipe(struct iov_iter *from,
 			break;
 		}
 
-		for (n = 0; copied; n++, start = 0) {
-			int size = min_t(int, copied, PAGE_SIZE - start);
-			if (!failed) {
-				buf.page = pages[n];
-				buf.offset = start;
-				buf.len = size;
-				ret = add_to_pipe(pipe, &buf);
-				if (unlikely(ret < 0)) {
-					failed = true;
-				} else {
-					iov_iter_advance(from, ret);
-					total += ret;
-				}
-			} else {
-				put_page(pages[n]);
-			}
-			copied -= size;
-		}
+		ret = pages_to_pipe(pages, pipe, &buf, &total, copied, start);
+		if (unlikely(ret < 0))
+			break;
+
+		iov_iter_advance(from, ret);
 	}
 	return total ? total : ret;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 2/4] vm: add a syscall to map a process memory into a pipe
  2018-01-09  6:30 [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
  2018-01-09  6:30 ` [PATCH v5 1/4] fs/splice: introduce pages_to_pipe helper Mike Rapoport
@ 2018-01-09  6:30 ` Mike Rapoport
  2018-01-09  6:30 ` [PATCH v5 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Mike Rapoport @ 2018-01-09  6:30 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, gdb,
	devel, rr-dev, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk,
	Thomas Gleixner, Josh Triplett, Jann Horn, Greg KH, Andrei Vagin,
	Mike Rapoport, Andrei Vagin

From: Andrei Vagin <avagin@virtuozzo.com>

It is a hybrid of process_vm_readv() and vmsplice().

vmsplice can map memory from a current address space into a pipe.
process_vm_readv can read memory of another process.

A new system call can map memory of another process into a pipe.

ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov,
                        unsigned long nr_segs, unsigned int flags)

All arguments are identical with vmsplice except pid which specifies a
target process.

Currently if we want to dump a process memory to a file or to a socket,
we can use process_vm_readv() + write(), but it works slow, because data
are copied into a temporary user-space buffer.

A second way is to use vmsplice() + splice(). It is more effective,
because data are not copied into a temporary buffer, but here is another
problem. vmsplice works with the currect address space, so it can be
used only if we inject our code into a target process.

The second way suffers from a few other issues:
* a process has to be stopped to run a parasite code
* a number of pipes is limited, so it may be impossible to dump all
  memory in one iteration, and we have to stop process and inject our
  code a few times.
* pages in pipes are unreclaimable, so it isn't good to hold a lot of
  memory in pipes.

The introduced syscall allows to use a second way without injecting any
code into a target process.

My experiments shows that process_vmsplice() + splice() works two time
faster than process_vm_readv() + write().

It is particularly useful for iterative migration. In those cases we enable
a memory tracker, and then we are dumping  a process memory while a process
continues to work. On the first iteration we are dumping all memory, and
then we are dumpung only memory that was modified from the previous
iteration.  After a few pre-dump operations, a process is stopped and
dumped finally. The pre-dump operations allow to significantly decrease a
process downtime, when a process is migrated to another host.

The primary disadvantage of the process_vm_readv() + write() approach for
the iterative migration usecase is the need to stop the processes with, e.g.
ptrace, and inject parasite code. And, since the the number of pipes and
their size is limited, we either have to limit the dump size to about 1.5GB
or to keep the target processes stopped. The process_vmsplice system call
allows splicing memory without the parasite code and even without stopping
target processes.

This system call maybe usefult for debuggers. For example, a debugger can
use it to generate a core file, it can splice memory of a process into a
pipe and then splice it from the pipe to a file. This method works much
faster than using PTRACE_PEEK* commands.

Debuggers may also use process_vmsplice() to observe memory changes in a
process page. process_vmsplice() attaches a real process page to a pipe, so
we can splice it once and observe how it is being changed over time.

The process_vmsplice syscall might be of interest for users of
process_vm_readv(), in case if they read memory to send it to somewhere
else.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 fs/splice.c                       | 205 ++++++++++++++++++++++++++++++++++++++
 include/linux/compat.h            |   3 +
 include/linux/syscalls.h          |   4 +
 include/uapi/asm-generic/unistd.h |   5 +-
 kernel/sys_ni.c                   |   2 +
 5 files changed, 218 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 7f1ffc50ff1d..72397d2a59b9 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -34,6 +34,7 @@
 #include <linux/socket.h>
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
+#include <linux/sched/mm.h>
 
 #include "internal.h"
 
@@ -1373,6 +1374,210 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, iov,
 	return error;
 }
 
+#ifdef CONFIG_CROSS_MEMORY_ATTACH
+/*
+ * Map pages from a specified task into a pipe
+ */
+static int remote_single_vec_to_pipe(struct task_struct *task,
+			struct mm_struct *mm,
+			const struct iovec *rvec,
+			struct pipe_inode_info *pipe,
+			unsigned int flags,
+			size_t *total)
+{
+	struct pipe_buffer buf = {
+		.ops = &user_page_pipe_buf_ops,
+		.flags = flags
+	};
+	unsigned long addr = (unsigned long) rvec->iov_base;
+	unsigned long pa = addr & PAGE_MASK;
+	unsigned long start_offset = addr - pa;
+	unsigned long nr_pages;
+	ssize_t len = rvec->iov_len;
+	struct page *process_pages[16];
+	bool failed = false;
+	int ret = 0;
+
+	nr_pages = (addr + len - 1) / PAGE_SIZE - addr / PAGE_SIZE + 1;
+	while (nr_pages) {
+		long pages = min(nr_pages, 16UL);
+		int locked = 1;
+		ssize_t copied;
+
+		/*
+		 * Get the pages we're interested in.  We must
+		 * access remotely because task/mm might not
+		 * current/current->mm
+		 */
+		down_read(&mm->mmap_sem);
+		pages = get_user_pages_remote(task, mm, pa, pages, 0,
+					      process_pages, NULL, &locked);
+		if (locked)
+			up_read(&mm->mmap_sem);
+		if (pages <= 0) {
+			failed = true;
+			ret = -EFAULT;
+			break;
+		}
+
+		copied = pages * PAGE_SIZE - start_offset;
+		if (copied > len)
+			copied = len;
+		len -= copied;
+
+		ret = pages_to_pipe(process_pages, pipe, &buf, total, copied,
+				    start_offset);
+		if (unlikely(ret < 0))
+			break;
+
+		start_offset = 0;
+		nr_pages -= pages;
+		pa += pages * PAGE_SIZE;
+	}
+	return ret < 0 ? ret : 0;
+}
+
+static ssize_t remote_iovec_to_pipe(struct task_struct *task,
+			struct mm_struct *mm,
+			const struct iovec *rvec,
+			unsigned long riovcnt,
+			struct pipe_inode_info *pipe,
+			unsigned int flags)
+{
+	size_t total = 0;
+	int ret = 0, i;
+
+	for (i = 0; i < riovcnt; i++) {
+		/* Work out address and page range required */
+		if (rvec[i].iov_len == 0)
+			continue;
+
+		ret = remote_single_vec_to_pipe(
+				task, mm, &rvec[i], pipe, flags, &total);
+		if (ret < 0)
+			break;
+	}
+	return total ? total : ret;
+}
+
+static long process_vmsplice_to_pipe(struct task_struct *task,
+				struct mm_struct *mm, struct file *file,
+				const struct iovec __user *uiov,
+				unsigned long nr_segs, unsigned int flags)
+{
+	struct pipe_inode_info *pipe;
+	struct iovec iovstack[UIO_FASTIOV];
+	struct iovec *iov = iovstack;
+	unsigned int buf_flag = 0;
+	long ret;
+
+	if (flags & SPLICE_F_GIFT)
+		buf_flag = PIPE_BUF_FLAG_GIFT;
+
+	pipe = get_pipe_info(file);
+	if (!pipe)
+		return -EBADF;
+
+	ret = rw_copy_check_uvector(CHECK_IOVEC_ONLY, uiov, nr_segs,
+					UIO_FASTIOV, iovstack, &iov);
+	if (ret < 0)
+		return ret;
+
+	pipe_lock(pipe);
+	ret = wait_for_space(pipe, flags);
+	if (!ret)
+		ret = remote_iovec_to_pipe(task, mm, iov,
+						nr_segs, pipe, buf_flag);
+	pipe_unlock(pipe);
+	if (ret > 0)
+		wakeup_pipe_readers(pipe);
+
+	if (iov != iovstack)
+		kfree(iov);
+	return ret;
+}
+
+/* process_vmsplice splices a process address range into a pipe. */
+SYSCALL_DEFINE5(process_vmsplice, int, pid, int, fd,
+		const struct iovec __user *, iov,
+		unsigned long, nr_segs, unsigned int, flags)
+{
+	struct task_struct *task;
+	struct mm_struct *mm;
+	struct fd f;
+	long ret;
+
+	if (unlikely(flags & ~SPLICE_F_ALL))
+		return -EINVAL;
+	if (unlikely(nr_segs > UIO_MAXIOV))
+		return -EINVAL;
+	else if (unlikely(!nr_segs))
+		return 0;
+
+	f = fdget(fd);
+	if (!f.file)
+		return -EBADF;
+
+	/* Get process information */
+	task = find_get_task_by_vpid(pid);
+	if (!task) {
+		ret = -ESRCH;
+		goto out_fput;
+	}
+
+	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+	if (!mm || IS_ERR(mm)) {
+		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
+		/*
+		 * Explicitly map EACCES to EPERM as EPERM is a more a
+		 * appropriate error code for process_vw_readv/writev
+		 */
+		if (ret == -EACCES)
+			ret = -EPERM;
+		goto put_task_struct;
+	}
+
+	ret = -EBADF;
+	if (f.file->f_mode & FMODE_WRITE)
+		ret = process_vmsplice_to_pipe(task, mm, f.file,
+						iov, nr_segs, flags);
+	mmput(mm);
+
+put_task_struct:
+	put_task_struct(task);
+
+out_fput:
+	fdput(f);
+
+	return ret;
+}
+
+#ifdef CONFIG_COMPAT
+COMPAT_SYSCALL_DEFINE5(process_vmsplice, pid_t, pid, int, fd,
+			const struct compat_iovec __user *, iov32,
+			unsigned int, nr_segs, unsigned int, flags)
+{
+	struct iovec __user *iov;
+	unsigned int i;
+
+	if (nr_segs > UIO_MAXIOV)
+		return -EINVAL;
+
+	iov = compat_alloc_user_space(nr_segs * sizeof(struct iovec));
+	for (i = 0; i < nr_segs; i++) {
+		struct compat_iovec v;
+
+		if (get_user(v.iov_base, &iov32[i].iov_base) ||
+		    get_user(v.iov_len, &iov32[i].iov_len) ||
+		    put_user(compat_ptr(v.iov_base), &iov[i].iov_base) ||
+		    put_user(v.iov_len, &iov[i].iov_len))
+			return -EFAULT;
+	}
+	return sys_process_vmsplice(pid, fd, iov, nr_segs, flags);
+}
+#endif
+#endif /* CONFIG_CROSS_MEMORY_ATTACH */
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, iov32,
 		    unsigned int, nr_segs, unsigned int, flags)
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 0fc36406f32c..11b375319289 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -550,6 +550,9 @@ asmlinkage long compat_sys_getdents(unsigned int fd,
 				    unsigned int count);
 asmlinkage long compat_sys_vmsplice(int fd, const struct compat_iovec __user *,
 				    unsigned int nr_segs, unsigned int flags);
+asmlinkage long compat_sys_process_vmsplice(pid_t pid, int fd,
+				    const struct compat_iovec __user *,
+				    unsigned int nr_segs, unsigned int flags);
 asmlinkage long compat_sys_open(const char __user *filename, int flags,
 				umode_t mode);
 asmlinkage long compat_sys_openat(int dfd, const char __user *filename,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..4ba93336bf05 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -941,4 +941,8 @@ asmlinkage long sys_pkey_free(int pkey);
 asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 
+asmlinkage long sys_process_vmsplice(pid_t pid,
+			int fd, const struct iovec __user *iov,
+			unsigned long nr_segs, unsigned int flags);
+
 #endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8b87de067bc7..37f18320ae94 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,9 +732,12 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
 __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 #define __NR_statx 291
 __SYSCALL(__NR_statx,     sys_statx)
+#define __NR_process_vmsplice 292
+__SC_COMP(__NR_process_vmsplice, sys_process_vmsplice,
+	  compat_sys_process_vmsplice)
 
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index b5189762d275..a939fbb92d9e 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -158,8 +158,10 @@ cond_syscall(sys_sysfs);
 cond_syscall(sys_syslog);
 cond_syscall(sys_process_vm_readv);
 cond_syscall(sys_process_vm_writev);
+cond_syscall(sys_process_vmsplice);
 cond_syscall(compat_sys_process_vm_readv);
 cond_syscall(compat_sys_process_vm_writev);
+cond_syscall(compat_sys_process_vmsplice);
 cond_syscall(sys_uselib);
 cond_syscall(sys_fadvise64);
 cond_syscall(sys_fadvise64_64);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 3/4] x86: wire up the process_vmsplice syscall
  2018-01-09  6:30 [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
  2018-01-09  6:30 ` [PATCH v5 1/4] fs/splice: introduce pages_to_pipe helper Mike Rapoport
  2018-01-09  6:30 ` [PATCH v5 2/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
@ 2018-01-09  6:30 ` Mike Rapoport
  2018-01-11 17:10   ` kbuild test robot
  2018-01-09  6:30 ` [PATCH v5 4/4] test: add a test for " Mike Rapoport
  2018-02-21  0:44 ` [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Andrew Morton
  4 siblings, 1 reply; 15+ messages in thread
From: Mike Rapoport @ 2018-01-09  6:30 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, gdb,
	devel, rr-dev, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk,
	Thomas Gleixner, Josh Triplett, Jann Horn, Greg KH, Andrei Vagin,
	Mike Rapoport

From: Andrei Vagin <avagin@openvz.org>

Signed-off-by: Andrei Vagin <avagin@openvz.org>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..dc64bf577b17 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382	i386	pkey_free		sys_pkey_free
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
+385	i386	process_vmsplice	sys_process_vmsplice		compat_sys_process_vmsplice
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..d2f916c0309a 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330	common	pkey_alloc		sys_pkey_alloc
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
+333	64	process_vmsplice	sys_process_vmsplice
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
@@ -380,3 +381,4 @@
 545	x32	execveat		compat_sys_execveat/ptregs
 546	x32	preadv2			compat_sys_preadv64v2
 547	x32	pwritev2		compat_sys_pwritev64v2
+548	x32	process_vmsplice	compat_sys_process_vmsplice
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 4/4] test: add a test for the process_vmsplice syscall
  2018-01-09  6:30 [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
                   ` (2 preceding siblings ...)
  2018-01-09  6:30 ` [PATCH v5 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport
@ 2018-01-09  6:30 ` Mike Rapoport
  2018-02-21  0:44 ` [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Andrew Morton
  4 siblings, 0 replies; 15+ messages in thread
From: Mike Rapoport @ 2018-01-09  6:30 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, gdb,
	devel, rr-dev, Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk,
	Thomas Gleixner, Josh Triplett, Jann Horn, Greg KH, Andrei Vagin,
	Mike Rapoport

From: Andrei Vagin <avagin@openvz.org>

This test checks that process_vmsplice() can splice pages from a remote
process and returns EFAULT, if process_vmsplice() tries to splice pages
by an unaccessiable address.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 tools/testing/selftests/process_vmsplice/Makefile  |   5 +
 .../process_vmsplice/process_vmsplice_test.c       | 196 +++++++++++++++++++++
 2 files changed, 201 insertions(+)
 create mode 100644 tools/testing/selftests/process_vmsplice/Makefile
 create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c

diff --git a/tools/testing/selftests/process_vmsplice/Makefile b/tools/testing/selftests/process_vmsplice/Makefile
new file mode 100644
index 000000000000..246d5a7dfed6
--- /dev/null
+++ b/tools/testing/selftests/process_vmsplice/Makefile
@@ -0,0 +1,5 @@
+CFLAGS += -I../../../../usr/include/
+
+TEST_GEN_PROGS := process_vmsplice_test
+
+include ../lib.mk
diff --git a/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c b/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c
new file mode 100644
index 000000000000..1682bdb32de3
--- /dev/null
+++ b/tools/testing/selftests/process_vmsplice/process_vmsplice_test.c
@@ -0,0 +1,196 @@
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <fcntl.h>
+#include <sys/uio.h>
+#include <errno.h>
+#include <signal.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#include "../kselftest.h"
+
+#ifndef __NR_process_vmsplice
+#define __NR_process_vmsplice 333
+#endif
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt, \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			KSFT_FAIL; \
+		})
+#define pr_perror(fmt, ...) pr_err(fmt ": %m\n", ##__VA_ARGS__)
+#define fail(fmt, ...) pr_err("FAIL:" fmt, ##__VA_ARGS__)
+
+static ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov,
+			unsigned long nr_segs, unsigned int flags)
+{
+	return syscall(__NR_process_vmsplice, pid, fd, iov, nr_segs, flags);
+
+}
+
+#define MEM_SIZE (4096 * 100)
+#define MEM_WRONLY_SIZE (4096 * 10)
+
+int main(int argc, char **argv)
+{
+	char *addr, *addr_wronly;
+	int p[2];
+	struct iovec iov[2];
+	char buf[4096];
+	int status, ret;
+	pid_t pid;
+
+	ksft_print_header();
+
+	if (process_vmsplice(0, 0, 0, 0, 0)) {
+		if (errno == ENOSYS) {
+			ksft_exit_skip("process_vmsplice is not supported\n");
+			return 0;
+		}
+		return pr_perror("Zero-length process_vmsplice failed");
+	}
+
+	addr = mmap(0, MEM_SIZE, PROT_READ | PROT_WRITE,
+					MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED)
+		return pr_perror("Unable to create a mapping");
+
+	addr_wronly = mmap(0, MEM_WRONLY_SIZE, PROT_WRITE,
+				MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr_wronly == MAP_FAILED)
+		return pr_perror("Unable to create a write-only mapping");
+
+	if (pipe(p))
+		return pr_perror("Unable to create a pipe");
+
+	pid = fork();
+	if (pid < 0)
+		return pr_perror("Unable to fork");
+
+	if (pid == 0) {
+		addr[0] = 'C';
+		addr[4096 + 128] = 'A';
+		addr[4096 + 128 + 4096 - 1] = 'B';
+
+		if (prctl(PR_SET_PDEATHSIG, SIGKILL))
+			return pr_perror("Unable to set PR_SET_PDEATHSIG");
+		if (write(p[1], "c", 1) != 1)
+			return pr_perror("Unable to write data into pipe");
+
+		while (1)
+			sleep(1);
+		return 1;
+	}
+	if (read(p[0], buf, 1) != 1) {
+		pr_perror("Unable to read data from pipe");
+		kill(pid, SIGKILL);
+		wait(&status);
+		return 1;
+	}
+
+	munmap(addr, MEM_SIZE);
+	munmap(addr_wronly, MEM_WRONLY_SIZE);
+
+	iov[0].iov_base = addr;
+	iov[0].iov_len = 1;
+
+	iov[1].iov_base = addr + 4096 + 128;
+	iov[1].iov_len = 4096;
+
+	/* check one iovec */
+	if (process_vmsplice(pid, p[1], iov, 1, SPLICE_F_GIFT) != 1)
+		return pr_perror("Unable to splice pages");
+
+	if (read(p[0], buf, 1) != 1)
+		return pr_perror("Unable to read from pipe");
+
+	if (buf[0] != 'C')
+		ksft_test_result_fail("Get wrong data\n");
+	else
+		ksft_test_result_pass("Check process_vmsplice with one vec\n");
+
+	/* check two iovec-s */
+	if (process_vmsplice(pid, p[1], iov, 2, SPLICE_F_GIFT) != 4097)
+		return pr_perror("Unable to spice pages\n");
+
+	if (read(p[0], buf, 1) != 1)
+		return pr_perror("Unable to read from pipe\n");
+
+	if (buf[0] != 'C')
+		ksft_test_result_fail("Get wrong data\n");
+
+	if (read(p[0], buf, 4096) != 4096)
+		return pr_perror("Unable to read from pipe\n");
+
+	if (buf[0] != 'A' || buf[4095] != 'B')
+		ksft_test_result_fail("Get wrong data\n");
+	else
+		ksft_test_result_pass("check process_vmsplice with two vecs\n");
+
+	/* check how an unreadable region in a second vec is handled */
+	iov[0].iov_base = addr;
+	iov[0].iov_len = 1;
+
+	iov[1].iov_base = addr_wronly + 5;
+	iov[1].iov_len = 1;
+
+	if (process_vmsplice(pid, p[1], iov, 2, SPLICE_F_GIFT) != 1)
+		return pr_perror("Unable to splice data");
+
+	if (read(p[0], buf, 1) != 1)
+		return pr_perror("Unable to read form pipe");
+
+	if (buf[0] != 'C')
+		ksft_test_result_fail("Get wrong data\n");
+	else
+		ksft_test_result_pass("unreadable region in a second vec\n");
+
+	/* check how an unreadable region in a first vec is handled */
+	errno = 0;
+	if (process_vmsplice(pid, p[1], iov + 1, 1, SPLICE_F_GIFT) != -1 ||
+	    errno != EFAULT)
+		ksft_test_result_fail("Got anexpected errno %d\n", errno);
+	else
+		ksft_test_result_pass("splice as much as possible\n");
+
+	iov[0].iov_base = addr;
+	iov[0].iov_len = 1;
+
+	iov[1].iov_base = addr;
+	iov[1].iov_len = MEM_SIZE;
+
+	/* splice as much as possible */
+	ret = process_vmsplice(pid, p[1], iov, 2,
+				SPLICE_F_GIFT | SPLICE_F_NONBLOCK);
+	if (ret != 4096 * 15 + 1) /* by default a pipe can fit 16 pages */
+		return pr_perror("Unable to splice pages");
+
+	while (ret > 0) {
+		int len;
+
+		len = read(p[0], buf, 4096);
+		if (len < 0)
+			return pr_perror("Unable to read data");
+		if (len > ret)
+			return pr_err("Read more than expected\n");
+		ret -= len;
+	}
+	ksft_test_result_pass("splice as much as possible\n");
+
+	if (kill(pid, SIGTERM))
+		return pr_perror("Unable to kill a child process");
+	status = -1;
+	if (wait(&status) < 0)
+		return pr_perror("Unable to wait a child process");
+	if (!WIFSIGNALED(status) || WTERMSIG(status) != SIGTERM)
+		return pr_err("The child exited with an unexpected code %d\n",
+									status);
+
+	if (ksft_get_fail_cnt())
+		return ksft_exit_fail();
+	return ksft_exit_pass();
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 3/4] x86: wire up the process_vmsplice syscall
  2018-01-09  6:30 ` [PATCH v5 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport
@ 2018-01-11 17:10   ` kbuild test robot
  0 siblings, 0 replies; 15+ messages in thread
From: kbuild test robot @ 2018-01-11 17:10 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: kbuild-all, Andrew Morton, Alexander Viro, linux-mm,
	linux-fsdevel, linux-kernel, linux-api, criu, gdb, devel, rr-dev,
	Arnd Bergmann, Pavel Emelyanov, Michael Kerrisk, Thomas Gleixner,
	Josh Triplett, Jann Horn, Greg KH, Andrei Vagin, Mike Rapoport

[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]

Hi Andrei,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.15-rc7 next-20180111]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Mike-Rapoport/vm-add-a-syscall-to-map-a-process-memory-into-a-pipe/20180111-220440
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

>> <stdin>:1332:2: warning: #warning syscall process_vmsplice not implemented [-Wcpp]

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 52667 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
  2018-01-09  6:30 [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
                   ` (3 preceding siblings ...)
  2018-01-09  6:30 ` [PATCH v5 4/4] test: add a test for " Mike Rapoport
@ 2018-02-21  0:44 ` Andrew Morton
  2018-02-26  9:02   ` Pavel Emelyanov
  4 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2018-02-21  0:44 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-kernel, linux-api,
	criu, gdb, devel, rr-dev, Arnd Bergmann, Pavel Emelyanov,
	Michael Kerrisk, Thomas Gleixner, Josh Triplett, Jann Horn,
	Greg KH, Andrei Vagin

On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:

> This patches introduces new process_vmsplice system call that combines
> functionality of process_vm_read and vmsplice.

All seems fairly strightforward.  The big question is: do we know that
people will actually use this, and get sufficient value from it to
justify its addition?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
  2018-02-21  0:44 ` [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Andrew Morton
@ 2018-02-26  9:02   ` Pavel Emelyanov
  2018-02-26 16:38     ` [OMPI devel] " Nathan Hjelm
  2018-02-27  2:18     ` Dmitry V. Levin
  0 siblings, 2 replies; 15+ messages in thread
From: Pavel Emelyanov @ 2018-02-26  9:02 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-kernel, linux-api,
	criu, gdb, devel, rr-dev, Arnd Bergmann, Michael Kerrisk,
	Thomas Gleixner, Josh Triplett, Jann Horn, Greg KH, Andrei Vagin

On 02/21/2018 03:44 AM, Andrew Morton wrote:
> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> 
>> This patches introduces new process_vmsplice system call that combines
>> functionality of process_vm_read and vmsplice.
> 
> All seems fairly strightforward.  The big question is: do we know that
> people will actually use this, and get sufficient value from it to
> justify its addition?

Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
these syscalls are? If its users operate on big amount of memory, they could benefit from
the proposed splice extension.

-- Pavel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OMPI devel] [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
  2018-02-26  9:02   ` Pavel Emelyanov
@ 2018-02-26 16:38     ` Nathan Hjelm
  2018-02-27  7:10       ` Mike Rapoport
  2018-02-27  2:18     ` Dmitry V. Levin
  1 sibling, 1 reply; 15+ messages in thread
From: Nathan Hjelm @ 2018-02-26 16:38 UTC (permalink / raw)
  To: Open MPI Developers
  Cc: Andrew Morton, Mike Rapoport, Andrei Vagin, Arnd Bergmann,
	Jann Horn, rr-dev, linux-api, linux-kernel, Josh Triplett, criu,
	linux-mm, Greg KH, Alexander Viro, gdb, linux-fsdevel,
	Thomas Gleixner, Michael Kerrisk

[-- Attachment #1: Type: text/plain, Size: 1364 bytes --]

All MPI implementations have support for using CMA to transfer data between local processes. The performance is fairly good (not as good as XPMEM) but the interface limits what we can do with to remote process memory (no atomics). I have not heard about this new proposal. What is the benefit of the proposed calls over the existing calls?

-Nathan

> On Feb 26, 2018, at 2:02 AM, Pavel Emelyanov <xemul@virtuozzo.com> wrote:
> 
> On 02/21/2018 03:44 AM, Andrew Morton wrote:
>> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
>> 
>>> This patches introduces new process_vmsplice system call that combines
>>> functionality of process_vm_read and vmsplice.
>> 
>> All seems fairly strightforward.  The big question is: do we know that
>> people will actually use this, and get sufficient value from it to
>> justify its addition?
> 
> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone
> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> these syscalls are? If its users operate on big amount of memory, they could benefit from
> the proposed splice extension.
> 
> -- Pavel
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
  2018-02-26  9:02   ` Pavel Emelyanov
  2018-02-26 16:38     ` [OMPI devel] " Nathan Hjelm
@ 2018-02-27  2:18     ` Dmitry V. Levin
  2018-02-28  6:11       ` Andrei Vagin
  2018-02-28  7:12       ` Pavel Emelyanov
  1 sibling, 2 replies; 15+ messages in thread
From: Dmitry V. Levin @ 2018-02-27  2:18 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Andrew Morton, Mike Rapoport, Alexander Viro, linux-mm,
	linux-fsdevel, linux-kernel, linux-api, criu, gdb, devel, rr-dev,
	Arnd Bergmann, Michael Kerrisk, Thomas Gleixner, Josh Triplett,
	Jann Horn, Greg KH, Andrei Vagin

[-- Attachment #1: Type: text/plain, Size: 920 bytes --]

On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
> On 02/21/2018 03:44 AM, Andrew Morton wrote:
> > On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> > 
> >> This patches introduces new process_vmsplice system call that combines
> >> functionality of process_vm_read and vmsplice.
> > 
> > All seems fairly strightforward.  The big question is: do we know that
> > people will actually use this, and get sufficient value from it to
> > justify its addition?
> 
> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> these syscalls are?

Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
see e.g.
$ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null


-- 
ldv

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OMPI devel] [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
  2018-02-26 16:38     ` [OMPI devel] " Nathan Hjelm
@ 2018-02-27  7:10       ` Mike Rapoport
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Rapoport @ 2018-02-27  7:10 UTC (permalink / raw)
  To: Nathan Hjelm
  Cc: Open MPI Developers, Andrei Vagin, Arnd Bergmann, Jann Horn,
	rr-dev, linux-api, linux-kernel, Josh Triplett, criu, linux-mm,
	gdb, Alexander Viro, Greg KH, linux-fsdevel, Andrew Morton,
	Thomas Gleixner, Michael Kerrisk

On Mon, Feb 26, 2018 at 09:38:19AM -0700, Nathan Hjelm wrote:
> All MPI implementations have support for using CMA to transfer data
> between local processes. The performance is fairly good (not as good as
> XPMEM) but the interface limits what we can do with to remote process
> memory (no atomics). I have not heard about this new proposal. What is
> the benefit of the proposed calls over the existing calls?

The proposed system call call that combines functionality of
process_vm_read and vmsplice [1] and it's particularly useful when one
needs to read the remote process memory and then write it to a file
descriptor. In this case a sequence of process_vm_read() + write() calls
that involves two copies of data can be replaced with process_vm_splice() +
splice() which does not involve copy at all.

[1] https://lkml.org/lkml/2018/1/9/32
 
> -Nathan
> 
> > On Feb 26, 2018, at 2:02 AM, Pavel Emelyanov <xemul@virtuozzo.com> wrote:
> > 
> > On 02/21/2018 03:44 AM, Andrew Morton wrote:
> >> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> >> 
> >>> This patches introduces new process_vmsplice system call that combines
> >>> functionality of process_vm_read and vmsplice.
> >> 
> >> All seems fairly strightforward.  The big question is: do we know that
> >> people will actually use this, and get sufficient value from it to
> >> justify its addition?
> > 
> > Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone
> > used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> > these syscalls are? If its users operate on big amount of memory, they could benefit from
> > the proposed splice extension.
> > 
> > -- Pavel

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
  2018-02-27  2:18     ` Dmitry V. Levin
@ 2018-02-28  6:11       ` Andrei Vagin
  2018-02-28  7:12       ` Pavel Emelyanov
  1 sibling, 0 replies; 15+ messages in thread
From: Andrei Vagin @ 2018-02-28  6:11 UTC (permalink / raw)
  To: Dmitry V. Levin
  Cc: Pavel Emelyanov, Andrew Morton, Mike Rapoport, Alexander Viro,
	linux-mm, linux-fsdevel, linux-kernel, linux-api, criu, gdb,
	devel, rr-dev, Arnd Bergmann, Michael Kerrisk, Thomas Gleixner,
	Josh Triplett, Jann Horn, Greg KH, Andrei Vagin

On Tue, Feb 27, 2018 at 05:18:18AM +0300, Dmitry V. Levin wrote:
> On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
> > On 02/21/2018 03:44 AM, Andrew Morton wrote:
> > > On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> > > 
> > >> This patches introduces new process_vmsplice system call that combines
> > >> functionality of process_vm_read and vmsplice.
> > > 
> > > All seems fairly strightforward.  The big question is: do we know that
> > > people will actually use this, and get sufficient value from it to
> > > justify its addition?
> > 
> > Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
> > used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> > these syscalls are?
> 
> Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
> see e.g.
> $ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null

For this case, there is no advantage from process_vmsplice().

But it can significantly optimize a process of generating a core file.
In this case, we need to read a process memory and save content into a
file. process_vmsplice() allows to do this more optimal than
process_vm_readv(), because it doesn't copy data into a userspace.

Here is a part of strace how gdb saves memory content into a core file:

10593 open("/proc/10193/mem", O_RDONLY|O_CLOEXEC) = 17
10593 pread64(17, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1048576, 140009356111872) = 1048576
10593 close(17)                         = 0
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 4096) = 4096
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1044480) = 1044480
10593 open("/proc/10193/mem", O_RDONLY|O_CLOEXEC) = 17
10593 pread64(17, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1048576, 140009357160448) = 1048576
10593 close(17)                         = 0
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 4096) = 4096
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1044480) = 1044480
10593 open("/proc/10193/mem", O_RDONLY|O_CLOEXEC) = 17
10593 pread64(17, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1048576, 140009358209024) = 1048576
10593 close(17)                         = 0
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 4096) = 4096
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1044480) = 1044480
10593 open("/proc/10193/mem", O_RDONLY|O_CLOEXEC) = 17
10593 pread64(17, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1048576, 140009359257600) = 1048576
10593 close(17)

It is strange that process_vm_readv() isn't used and that
/proc/10193/mem is opened many times.

BTW: "strace -fo strace-gdb.log gdb -p PID" doesn't work properly.

Thanks,
Andrei

> 
> 
> -- 
> ldv



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
  2018-02-27  2:18     ` Dmitry V. Levin
  2018-02-28  6:11       ` Andrei Vagin
@ 2018-02-28  7:12       ` Pavel Emelyanov
  2018-02-28 17:50         ` Andrei Vagin
  2018-02-28 23:12         ` [OMPI devel] " Atchley, Scott
  1 sibling, 2 replies; 15+ messages in thread
From: Pavel Emelyanov @ 2018-02-28  7:12 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, Alexander Viro, linux-mm,
	linux-fsdevel, linux-kernel, linux-api, criu, gdb, devel, rr-dev,
	Arnd Bergmann, Michael Kerrisk, Thomas Gleixner, Josh Triplett,
	Jann Horn, Greg KH, Andrei Vagin

On 02/27/2018 05:18 AM, Dmitry V. Levin wrote:
> On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
>> On 02/21/2018 03:44 AM, Andrew Morton wrote:
>>> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
>>>
>>>> This patches introduces new process_vmsplice system call that combines
>>>> functionality of process_vm_read and vmsplice.
>>>
>>> All seems fairly strightforward.  The big question is: do we know that
>>> people will actually use this, and get sufficient value from it to
>>> justify its addition?
>>
>> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
>> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
>> these syscalls are?
> 
> Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
> see e.g.
> $ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null

I see. Well, yes, this use-case will not benefit much from remote splice. How about more
interactive debug by, say, gdb? It may attach, then splice all the memory, then analyze
the victim code/data w/o copying it to its address space?

-- Pavel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
  2018-02-28  7:12       ` Pavel Emelyanov
@ 2018-02-28 17:50         ` Andrei Vagin
  2018-02-28 23:12         ` [OMPI devel] " Atchley, Scott
  1 sibling, 0 replies; 15+ messages in thread
From: Andrei Vagin @ 2018-02-28 17:50 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Andrew Morton, Mike Rapoport, Alexander Viro, linux-mm,
	linux-fsdevel, linux-kernel, linux-api, criu, gdb, devel, rr-dev,
	Arnd Bergmann, Michael Kerrisk, Thomas Gleixner, Josh Triplett,
	Jann Horn, Greg KH, Andrei Vagin

On Wed, Feb 28, 2018 at 10:12:55AM +0300, Pavel Emelyanov wrote:
> On 02/27/2018 05:18 AM, Dmitry V. Levin wrote:
> > On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
> >> On 02/21/2018 03:44 AM, Andrew Morton wrote:
> >>> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> >>>
> >>>> This patches introduces new process_vmsplice system call that combines
> >>>> functionality of process_vm_read and vmsplice.
> >>>
> >>> All seems fairly strightforward.  The big question is: do we know that
> >>> people will actually use this, and get sufficient value from it to
> >>> justify its addition?
> >>
> >> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
> >> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> >> these syscalls are?
> > 
> > Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
> > see e.g.
> > $ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null
> 
> I see. Well, yes, this use-case will not benefit much from remote splice. How about more
> interactive debug by, say, gdb? It may attach, then splice all the memory, then analyze
> the victim code/data w/o copying it to its address space?

Hmm, in this case, you probably will want to be able to map pipe pages
into memory.

> 
> -- Pavel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OMPI devel] [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe
  2018-02-28  7:12       ` Pavel Emelyanov
  2018-02-28 17:50         ` Andrei Vagin
@ 2018-02-28 23:12         ` Atchley, Scott
  1 sibling, 0 replies; 15+ messages in thread
From: Atchley, Scott @ 2018-02-28 23:12 UTC (permalink / raw)
  To: Open MPI Developers
  Cc: Andrew Morton, Mike Rapoport, Alexander Viro, linux-mm,
	linux-fsdevel, linux-kernel, linux-api, criu, gdb, rr-dev,
	Arnd Bergmann, Michael Kerrisk, Thomas Gleixner, Josh Triplett,
	Jann Horn, Greg KH, Andrei Vagin

> On Feb 28, 2018, at 2:12 AM, Pavel Emelyanov <xemul@virtuozzo.com> wrote:
> 
> On 02/27/2018 05:18 AM, Dmitry V. Levin wrote:
>> On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
>>> On 02/21/2018 03:44 AM, Andrew Morton wrote:
>>>> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
>>>> 
>>>>> This patches introduces new process_vmsplice system call that combines
>>>>> functionality of process_vm_read and vmsplice.
>>>> 
>>>> All seems fairly strightforward.  The big question is: do we know that
>>>> people will actually use this, and get sufficient value from it to
>>>> justify its addition?
>>> 
>>> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
>>> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
>>> these syscalls are?
>> 
>> Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
>> see e.g.
>> $ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null
> 
> I see. Well, yes, this use-case will not benefit much from remote splice. How about more
> interactive debug by, say, gdb? It may attach, then splice all the memory, then analyze
> the victim code/data w/o copying it to its address space?
> 
> -- Pavel

I may be completely off base, but could a FUSE daemon use this to read memory from the client and dump it to a file descriptor without copying the data into the kernel? 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-02-28 23:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-09  6:30 [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
2018-01-09  6:30 ` [PATCH v5 1/4] fs/splice: introduce pages_to_pipe helper Mike Rapoport
2018-01-09  6:30 ` [PATCH v5 2/4] vm: add a syscall to map a process memory into a pipe Mike Rapoport
2018-01-09  6:30 ` [PATCH v5 3/4] x86: wire up the process_vmsplice syscall Mike Rapoport
2018-01-11 17:10   ` kbuild test robot
2018-01-09  6:30 ` [PATCH v5 4/4] test: add a test for " Mike Rapoport
2018-02-21  0:44 ` [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe Andrew Morton
2018-02-26  9:02   ` Pavel Emelyanov
2018-02-26 16:38     ` [OMPI devel] " Nathan Hjelm
2018-02-27  7:10       ` Mike Rapoport
2018-02-27  2:18     ` Dmitry V. Levin
2018-02-28  6:11       ` Andrei Vagin
2018-02-28  7:12       ` Pavel Emelyanov
2018-02-28 17:50         ` Andrei Vagin
2018-02-28 23:12         ` [OMPI devel] " Atchley, Scott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).