[v5,0/4] vm: add a syscall to map a process memory into a pipe
mbox series

Message ID 1515479453-14672-1-git-send-email-rppt@linux.vnet.ibm.com
Headers show
Series
  • vm: add a syscall to map a process memory into a pipe
Related show

Message

Mike Rapoport Jan. 9, 2018, 6:30 a.m. UTC
Hi,

This patches introduces new process_vmsplice system call that combines
functionality of process_vm_read and vmsplice.

It allows to map the memory of another process into a pipe, similarly to
what vmsplice does for its own address space.

The patch 2/4 ("vm: add a syscall to map a process memory into a pipe")
actually adds the new system call and provides its elaborate description.

The patchset is against -mm tree.

v5: update changelog with more elaborate usecase description
v4: skip test when process_vmsplice syscall is not available
v3: minor refactoring to reduce code duplication
v2: move this syscall under CONFIG_CROSS_MEMORY_ATTACH
    give correct flags to get_user_pages_remote()


Andrei Vagin (3):
  vm: add a syscall to map a process memory into a pipe
  x86: wire up the process_vmsplice syscall
  test: add a test for the process_vmsplice syscall

Mike Rapoport (1):
  fs/splice: introduce pages_to_pipe helper

 arch/x86/entry/syscalls/syscall_32.tbl             |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl             |   2 +
 fs/splice.c                                        | 262 +++++++++++++++++++--
 include/linux/compat.h                             |   3 +
 include/linux/syscalls.h                           |   4 +
 include/uapi/asm-generic/unistd.h                  |   5 +-
 kernel/sys_ni.c                                    |   2 +
 tools/testing/selftests/process_vmsplice/Makefile  |   5 +
 .../process_vmsplice/process_vmsplice_test.c       | 196 +++++++++++++++
 9 files changed, 458 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/process_vmsplice/Makefile
 create mode 100644 tools/testing/selftests/process_vmsplice/process_vmsplice_test.c

Comments

Andrew Morton Feb. 21, 2018, 12:44 a.m. UTC | #1
On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:

> This patches introduces new process_vmsplice system call that combines
> functionality of process_vm_read and vmsplice.

All seems fairly strightforward.  The big question is: do we know that
people will actually use this, and get sufficient value from it to
justify its addition?


--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Emelyanov Feb. 26, 2018, 9:02 a.m. UTC | #2
On 02/21/2018 03:44 AM, Andrew Morton wrote:
> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> 
>> This patches introduces new process_vmsplice system call that combines
>> functionality of process_vm_read and vmsplice.
> 
> All seems fairly strightforward.  The big question is: do we know that
> people will actually use this, and get sufficient value from it to
> justify its addition?

Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
these syscalls are? If its users operate on big amount of memory, they could benefit from
the proposed splice extension.

-- Pavel
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nathan Hjelm Feb. 26, 2018, 4:38 p.m. UTC | #3
All MPI implementations have support for using CMA to transfer data between local processes. The performance is fairly good (not as good as XPMEM) but the interface limits what we can do with to remote process memory (no atomics). I have not heard about this new proposal. What is the benefit of the proposed calls over the existing calls?

-Nathan

> On Feb 26, 2018, at 2:02 AM, Pavel Emelyanov <xemul@virtuozzo.com> wrote:
> 
> On 02/21/2018 03:44 AM, Andrew Morton wrote:
>> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
>> 
>>> This patches introduces new process_vmsplice system call that combines
>>> functionality of process_vm_read and vmsplice.
>> 
>> All seems fairly strightforward.  The big question is: do we know that
>> people will actually use this, and get sufficient value from it to
>> justify its addition?
> 
> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone
> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> these syscalls are? If its users operate on big amount of memory, they could benefit from
> the proposed splice extension.
> 
> -- Pavel
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
Dmitry V. Levin Feb. 27, 2018, 2:18 a.m. UTC | #4
On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
> On 02/21/2018 03:44 AM, Andrew Morton wrote:
> > On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> > 
> >> This patches introduces new process_vmsplice system call that combines
> >> functionality of process_vm_read and vmsplice.
> > 
> > All seems fairly strightforward.  The big question is: do we know that
> > people will actually use this, and get sufficient value from it to
> > justify its addition?
> 
> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> these syscalls are?

Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
see e.g.
$ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null
Mike Rapoport Feb. 27, 2018, 7:10 a.m. UTC | #5
On Mon, Feb 26, 2018 at 09:38:19AM -0700, Nathan Hjelm wrote:
> All MPI implementations have support for using CMA to transfer data
> between local processes. The performance is fairly good (not as good as
> XPMEM) but the interface limits what we can do with to remote process
> memory (no atomics). I have not heard about this new proposal. What is
> the benefit of the proposed calls over the existing calls?

The proposed system call call that combines functionality of
process_vm_read and vmsplice [1] and it's particularly useful when one
needs to read the remote process memory and then write it to a file
descriptor. In this case a sequence of process_vm_read() + write() calls
that involves two copies of data can be replaced with process_vm_splice() +
splice() which does not involve copy at all.

[1] https://lkml.org/lkml/2018/1/9/32
 
> -Nathan
> 
> > On Feb 26, 2018, at 2:02 AM, Pavel Emelyanov <xemul@virtuozzo.com> wrote:
> > 
> > On 02/21/2018 03:44 AM, Andrew Morton wrote:
> >> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> >> 
> >>> This patches introduces new process_vmsplice system call that combines
> >>> functionality of process_vm_read and vmsplice.
> >> 
> >> All seems fairly strightforward.  The big question is: do we know that
> >> people will actually use this, and get sufficient value from it to
> >> justify its addition?
> > 
> > Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone
> > used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> > these syscalls are? If its users operate on big amount of memory, they could benefit from
> > the proposed splice extension.
> > 
> > -- Pavel
Andrey Vagin Feb. 28, 2018, 6:11 a.m. UTC | #6
On Tue, Feb 27, 2018 at 05:18:18AM +0300, Dmitry V. Levin wrote:
> On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
> > On 02/21/2018 03:44 AM, Andrew Morton wrote:
> > > On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> > > 
> > >> This patches introduces new process_vmsplice system call that combines
> > >> functionality of process_vm_read and vmsplice.
> > > 
> > > All seems fairly strightforward.  The big question is: do we know that
> > > people will actually use this, and get sufficient value from it to
> > > justify its addition?
> > 
> > Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
> > used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> > these syscalls are?
> 
> Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
> see e.g.
> $ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null

For this case, there is no advantage from process_vmsplice().

But it can significantly optimize a process of generating a core file.
In this case, we need to read a process memory and save content into a
file. process_vmsplice() allows to do this more optimal than
process_vm_readv(), because it doesn't copy data into a userspace.

Here is a part of strace how gdb saves memory content into a core file:

10593 open("/proc/10193/mem", O_RDONLY|O_CLOEXEC) = 17
10593 pread64(17, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1048576, 140009356111872) = 1048576
10593 close(17)                         = 0
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 4096) = 4096
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1044480) = 1044480
10593 open("/proc/10193/mem", O_RDONLY|O_CLOEXEC) = 17
10593 pread64(17, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1048576, 140009357160448) = 1048576
10593 close(17)                         = 0
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 4096) = 4096
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1044480) = 1044480
10593 open("/proc/10193/mem", O_RDONLY|O_CLOEXEC) = 17
10593 pread64(17, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1048576, 140009358209024) = 1048576
10593 close(17)                         = 0
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 4096) = 4096
10593 write(16, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1044480) = 1044480
10593 open("/proc/10193/mem", O_RDONLY|O_CLOEXEC) = 17
10593 pread64(17, "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"..., 1048576, 140009359257600) = 1048576
10593 close(17)

It is strange that process_vm_readv() isn't used and that
/proc/10193/mem is opened many times.

BTW: "strace -fo strace-gdb.log gdb -p PID" doesn't work properly.

Thanks,
Andrei

> 
> 
> -- 
> ldv


--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Emelyanov Feb. 28, 2018, 7:12 a.m. UTC | #7
On 02/27/2018 05:18 AM, Dmitry V. Levin wrote:
> On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
>> On 02/21/2018 03:44 AM, Andrew Morton wrote:
>>> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
>>>
>>>> This patches introduces new process_vmsplice system call that combines
>>>> functionality of process_vm_read and vmsplice.
>>>
>>> All seems fairly strightforward.  The big question is: do we know that
>>> people will actually use this, and get sufficient value from it to
>>> justify its addition?
>>
>> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
>> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
>> these syscalls are?
> 
> Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
> see e.g.
> $ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null

I see. Well, yes, this use-case will not benefit much from remote splice. How about more
interactive debug by, say, gdb? It may attach, then splice all the memory, then analyze
the victim code/data w/o copying it to its address space?

-- Pavel
Andrey Vagin Feb. 28, 2018, 5:50 p.m. UTC | #8
On Wed, Feb 28, 2018 at 10:12:55AM +0300, Pavel Emelyanov wrote:
> On 02/27/2018 05:18 AM, Dmitry V. Levin wrote:
> > On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
> >> On 02/21/2018 03:44 AM, Andrew Morton wrote:
> >>> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> >>>
> >>>> This patches introduces new process_vmsplice system call that combines
> >>>> functionality of process_vm_read and vmsplice.
> >>>
> >>> All seems fairly strightforward.  The big question is: do we know that
> >>> people will actually use this, and get sufficient value from it to
> >>> justify its addition?
> >>
> >> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
> >> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
> >> these syscalls are?
> > 
> > Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
> > see e.g.
> > $ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null
> 
> I see. Well, yes, this use-case will not benefit much from remote splice. How about more
> interactive debug by, say, gdb? It may attach, then splice all the memory, then analyze
> the victim code/data w/o copying it to its address space?

Hmm, in this case, you probably will want to be able to map pipe pages
into memory.

> 
> -- Pavel
Atchley, Scott Feb. 28, 2018, 11:12 p.m. UTC | #9
> On Feb 28, 2018, at 2:12 AM, Pavel Emelyanov <xemul@virtuozzo.com> wrote:
> 
> On 02/27/2018 05:18 AM, Dmitry V. Levin wrote:
>> On Mon, Feb 26, 2018 at 12:02:25PM +0300, Pavel Emelyanov wrote:
>>> On 02/21/2018 03:44 AM, Andrew Morton wrote:
>>>> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
>>>> 
>>>>> This patches introduces new process_vmsplice system call that combines
>>>>> functionality of process_vm_read and vmsplice.
>>>> 
>>>> All seems fairly strightforward.  The big question is: do we know that
>>>> people will actually use this, and get sufficient value from it to
>>>> justify its addition?
>>> 
>>> Yes, that's what bothers us a lot too :) I've tried to start with finding out if anyone 
>>> used the sys_read/write_process_vm() calls, but failed :( Does anybody know how popular
>>> these syscalls are?
>> 
>> Well, process_vm_readv itself is quite popular, it's used by debuggers nowadays,
>> see e.g.
>> $ strace -qq -esignal=none -eprocess_vm_readv strace -qq -o/dev/null cat /dev/null
> 
> I see. Well, yes, this use-case will not benefit much from remote splice. How about more
> interactive debug by, say, gdb? It may attach, then splice all the memory, then analyze
> the victim code/data w/o copying it to its address space?
> 
> -- Pavel

I may be completely off base, but could a FUSE daemon use this to read memory from the client and dump it to a file descriptor without copying the data into the kernel?