All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] New page describing userfaultfd(2) system call.
@ 2016-12-21  8:08 Mike Rapoport
       [not found] ` <1482307713-21853-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Rapoport @ 2016-12-21  8:08 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrea Arcangeli, linux-man-u79uwXL29TY76Z2rM5mHXA, Mike Rapoport

Signed-off-by: Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 man2/userfaultfd.2 | 314 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 314 insertions(+)
 create mode 100644 man2/userfaultfd.2

diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
new file mode 100644
index 0000000..d2196cd
--- /dev/null
+++ b/man2/userfaultfd.2
@@ -0,0 +1,314 @@
+.\" Copyright (c) 2016, IBM Corporation.
+.\" Written by Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH USERFAULTFD 2 1016-12-12 "Linux" "Linux Programmer's Manual"
+.SH NAME
+userfaultfd \- create a file descriptor for handling page faults in user
+space
+.SH SYNOPSIS
+.nf
+.B #include <sys/types.h>
+.sp
+.BI "int userfaultfd(int " flags );
+.fi
+.PP
+.IR Note :
+There is no glibc wrapper for this system call; see NOTES.
+.SH DESCRIPTION
+.BR userfaultfd (2)
+creates a userfaultfd object that can be used for delegation of page fault
+handling to a user space application.
+The userfaultfd should be configured using
+.BR ioctl (2).
+Once the userfaultfd is configured, the application can use
+.BR read (2)
+to receive userfaultfd notifications.
+The reads from userfaultfd may be blocking or non-blocking, depending on
+the value of
+.I flags
+used for the creation of the userfaultfd or subsequent calls to
+.BR fcntl (2) .
+
+The following values may be bitwise ORed in
+.IR flags
+to change the behavior of
+.BR userfaultfd ():
+.TP
+.BR O_CLOEXEC
+Enable the close-on-exec flag for the new userfaultfd object.
+See the description of the
+.B O_CLOEXEC
+flag in
+.BR open (2)
+.TP
+.BR O_NONBLOCK
+Enables non-blocking operation for the userfaultfd
+.BR O_NONBLOCK
+See the description of the
+.BR O_NONBLOCK
+flag in
+.BR open (2).
+.\"
+.SS Userfaultfd operation
+After the userfaultfd object is created with
+.BR userfaultfd (2)
+system call, the application have to enable it using
+.I UFFDIO_API
+ioctl to perform API version and supported features handshake between the
+kernel and the user space.
+If the
+.I UFFDIO_API
+is successful, the application should register memory ranges using
+.I UFFDIO_REGISTER
+ioctl. After successful completion of
+.I UFFDIO_REGISTER
+ioctl, a page fault occurring in the requested memory range, and satisfying
+the mode defined at the register time, will be forwarded by the kernel to
+the user space application.
+The application then can use
+.I UFFDIO_COPY
+or
+.I UFFDIO_ZERO
+ioctls to resolve the page fault.
+.PP
+Currently, userfaultfd can only be used with anonymous private memory
+mappings.
+.\"
+.SS API Ioctls
+The API ioctls are used to configure userfaultfd behavior.
+They allow to choose what features will be enabled and what kinds of events
+will be delivered to the application.
+.TP
+.BR "UFFDIO_API	struct uffdio_api *" api
+Enable userfaultfd and perform API handshake.
+The
+.I uffdio_api
+structure is defined as:
+.in +4n
+.nf
+
+struct uffdio_api {
+	__u64 api;
+	__u64 features;
+	__u64 ioctls;
+};
+
+.fi
+.in
+The
+.I api
+field denotes the API version requested by the application.
+The kernel verifies that it can support the required API, and sets the
+.I features
+and
+.I ioctls
+fields to bit masks representing all the available features and the generic
+ioctls available.
+.\"
+.TP
+.BI "UFFDIO_REGISTER	struct uffdio_register *" arg
+Register a memory range with userfaultfd.
+The
+.I uffdio_register
+structure is defined as:
+.in +4n
+.nf
+
+struct uffdio_range {
+	__u64 start;
+	__u64 end;
+};
+
+struct uffdio_register {
+	struct uffdio_range range;
+	__u64 mode;
+	__u64 ioctls;
+};
+
+.fi
+.in
+
+The
+.I range
+field defines a memory range starting at
+.I start
+and ending at
+.I end
+that should be handled by the userfaultfd.
+The
+.I mode
+defines mode of operation desired for this memory region.
+The following values may be bitwise ORed to set the userfaultfd mode for
+particular range:
+.RS
+.sp
+.PD 0
+.TP 12
+.B UFFDIO_REGISTER_MODE_MISSING
+Track page faults on missing pages
+.TP 12
+.B UFFDIO_REGISTER_MODE_WP
+Track page faults on write protected pages.
+Currently the only supported mode is
+.I UFFDIO_REGISTER_MODE_MISSING
+.PD
+.RE
+.IP
+The kernel answers which ioctl commands are available for the requested
+range in the
+.I ioctls
+field.
+.\"
+.TP
+.BI "UFFDIO_UNREGISTER	struct uffdio_register *" arg
+Unregister a memory range from userfaultfd.
+.\"
+.SS Range Ioctls
+The range ioctls enable the calling application to resolve page fault
+events in consistent way.
+.TP
+.BI "UFFDIO_COPY struct uffdio_copy *" arg
+Atomically copy a continuous memory chunk into the userfault registered
+range and optionally wake up the blocked thread.
+The source and destination addresses and the amount of bytes to copy are
+specified by
+.IR src ", " dst ", and " len
+fields of
+.I "struct uffdio_copy"
+respectively:
+
+.in +4n
+.nf
+struct uffdio_copy {
+	__u64 dst;
+	__u64 src;
+	__u64 len;
+	__u64 mode;
+	__s64 copy;
+};
+.nf
+.fi
+
+The following values may be bitwise ORed in
+.IR mode
+to change the behavior of
+.I UFFDIO_COPY
+ioctl:
+.RS
+.sp
+.PD 0
+.TP 12
+.B UFFDIO_COPY_MODE_DONTWAKE
+Do not wake up the thread that waits for page fault resolution
+.PD
+.RE
+.IP
+The
+.I copy
+field of the
+.I uffdio_copy
+structure is used by the kernel to return amount of bytes that was actually
+copied.
+.\"
+.TP
+.BI "UFFDIO_ZERO struct uffdio_zero *" arg
+Zero out a part of memory range registered with userfaultfd.
+The requested range is specified by
+.I range
+field of
+.I uffdio_zeropage
+structure:
+
+.in +4n
+.nf
+struct uffdio_zeropage {
+	struct uffdio_range range;
+	__u64 mode;
+	__s64 zeropage;
+};
+.nf
+.fi
+
+The following values may be bitwise ORed in
+.IR mode
+to change the behavior of
+.I UFFDIO_ZERO
+ioctl:
+.RS
+.sp
+.PD 0
+.TP 12
+.B UFFDIO_ZEROPAGE_MODE_DONTWAKE
+Do not wake up the thread that waits for page fault resolution
+.PD
+.RE
+.IP
+The
+.I zeropage
+field of the
+.I uffdio_zero
+structure is used by the kernel to return amount of bytes that was actually
+zeroed.
+.\"
+.TP
+.BI "UFFDIO_WAKE struct uffdio_range *" arg
+Wake up the thread waiting for the page fault resolution.
+.SH RETURN VALUE
+For a successful call, the 
+.BR userfaultfd (2)
+system call returns the new file descriptor for the userfaultfd object.
+On error, \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+.TP
+.B EINVAL
+An unsupported value was specified in
+.IR flags .
+.TP
+.BR EMFILE
+The per-process limit on the number of open file descriptors has been
+reached
+.TP
+.B ENFILE
+The system-wide limit on the total number of open files has been
+reached.
+.TP
+.B ENOMEM
+Insufficient kernel memory was available.
+.SH CONFORMING TO
+.BR userfaultfd ()
+is Linux-specific and should not be used in programs intended to be
+portable.
+.SH NOTES
+Glibc does not provide a wrapper for this system call; call it using
+.BR syscall (2).
+.SH SEE ALSO
+.BR fcntl (2),
+.BR ioctl (2)
+
+.IR Documentation/vm/userfaultfd.txt
+in the Linux kernel source tree
+
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] New page describing userfaultfd(2) system call.
       [not found] ` <1482307713-21853-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2016-12-27  8:14   ` Michael Kerrisk (man-pages)
  2016-12-27  8:16   ` Michael Kerrisk (man-pages)
  2016-12-27 11:16   ` Andrea Arcangeli
  2 siblings, 0 replies; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-12-27  8:14 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-man, Mike Rapoport

Hi Andrea,

Do you have any comment/input for this page?

Cheers,

Michael


On 21 December 2016 at 09:08, Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> Signed-off-by: Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  man2/userfaultfd.2 | 314 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 314 insertions(+)
>  create mode 100644 man2/userfaultfd.2
>
> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
> new file mode 100644
> index 0000000..d2196cd
> --- /dev/null
> +++ b/man2/userfaultfd.2
> @@ -0,0 +1,314 @@
> +.\" Copyright (c) 2016, IBM Corporation.
> +.\" Written by Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein.  The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH USERFAULTFD 2 1016-12-12 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +userfaultfd \- create a file descriptor for handling page faults in user
> +space
> +.SH SYNOPSIS
> +.nf
> +.B #include <sys/types.h>
> +.sp
> +.BI "int userfaultfd(int " flags );
> +.fi
> +.PP
> +.IR Note :
> +There is no glibc wrapper for this system call; see NOTES.
> +.SH DESCRIPTION
> +.BR userfaultfd (2)
> +creates a userfaultfd object that can be used for delegation of page fault
> +handling to a user space application.
> +The userfaultfd should be configured using
> +.BR ioctl (2).
> +Once the userfaultfd is configured, the application can use
> +.BR read (2)
> +to receive userfaultfd notifications.
> +The reads from userfaultfd may be blocking or non-blocking, depending on
> +the value of
> +.I flags
> +used for the creation of the userfaultfd or subsequent calls to
> +.BR fcntl (2) .
> +
> +The following values may be bitwise ORed in
> +.IR flags
> +to change the behavior of
> +.BR userfaultfd ():
> +.TP
> +.BR O_CLOEXEC
> +Enable the close-on-exec flag for the new userfaultfd object.
> +See the description of the
> +.B O_CLOEXEC
> +flag in
> +.BR open (2)
> +.TP
> +.BR O_NONBLOCK
> +Enables non-blocking operation for the userfaultfd
> +.BR O_NONBLOCK
> +See the description of the
> +.BR O_NONBLOCK
> +flag in
> +.BR open (2).
> +.\"
> +.SS Userfaultfd operation
> +After the userfaultfd object is created with
> +.BR userfaultfd (2)
> +system call, the application have to enable it using
> +.I UFFDIO_API
> +ioctl to perform API version and supported features handshake between the
> +kernel and the user space.
> +If the
> +.I UFFDIO_API
> +is successful, the application should register memory ranges using
> +.I UFFDIO_REGISTER
> +ioctl. After successful completion of
> +.I UFFDIO_REGISTER
> +ioctl, a page fault occurring in the requested memory range, and satisfying
> +the mode defined at the register time, will be forwarded by the kernel to
> +the user space application.
> +The application then can use
> +.I UFFDIO_COPY
> +or
> +.I UFFDIO_ZERO
> +ioctls to resolve the page fault.
> +.PP
> +Currently, userfaultfd can only be used with anonymous private memory
> +mappings.
> +.\"
> +.SS API Ioctls
> +The API ioctls are used to configure userfaultfd behavior.
> +They allow to choose what features will be enabled and what kinds of events
> +will be delivered to the application.
> +.TP
> +.BR "UFFDIO_API        struct uffdio_api *" api
> +Enable userfaultfd and perform API handshake.
> +The
> +.I uffdio_api
> +structure is defined as:
> +.in +4n
> +.nf
> +
> +struct uffdio_api {
> +       __u64 api;
> +       __u64 features;
> +       __u64 ioctls;
> +};
> +
> +.fi
> +.in
> +The
> +.I api
> +field denotes the API version requested by the application.
> +The kernel verifies that it can support the required API, and sets the
> +.I features
> +and
> +.I ioctls
> +fields to bit masks representing all the available features and the generic
> +ioctls available.
> +.\"
> +.TP
> +.BI "UFFDIO_REGISTER   struct uffdio_register *" arg
> +Register a memory range with userfaultfd.
> +The
> +.I uffdio_register
> +structure is defined as:
> +.in +4n
> +.nf
> +
> +struct uffdio_range {
> +       __u64 start;
> +       __u64 end;
> +};
> +
> +struct uffdio_register {
> +       struct uffdio_range range;
> +       __u64 mode;
> +       __u64 ioctls;
> +};
> +
> +.fi
> +.in
> +
> +The
> +.I range
> +field defines a memory range starting at
> +.I start
> +and ending at
> +.I end
> +that should be handled by the userfaultfd.
> +The
> +.I mode
> +defines mode of operation desired for this memory region.
> +The following values may be bitwise ORed to set the userfaultfd mode for
> +particular range:
> +.RS
> +.sp
> +.PD 0
> +.TP 12
> +.B UFFDIO_REGISTER_MODE_MISSING
> +Track page faults on missing pages
> +.TP 12
> +.B UFFDIO_REGISTER_MODE_WP
> +Track page faults on write protected pages.
> +Currently the only supported mode is
> +.I UFFDIO_REGISTER_MODE_MISSING
> +.PD
> +.RE
> +.IP
> +The kernel answers which ioctl commands are available for the requested
> +range in the
> +.I ioctls
> +field.
> +.\"
> +.TP
> +.BI "UFFDIO_UNREGISTER struct uffdio_register *" arg
> +Unregister a memory range from userfaultfd.
> +.\"
> +.SS Range Ioctls
> +The range ioctls enable the calling application to resolve page fault
> +events in consistent way.
> +.TP
> +.BI "UFFDIO_COPY struct uffdio_copy *" arg
> +Atomically copy a continuous memory chunk into the userfault registered
> +range and optionally wake up the blocked thread.
> +The source and destination addresses and the amount of bytes to copy are
> +specified by
> +.IR src ", " dst ", and " len
> +fields of
> +.I "struct uffdio_copy"
> +respectively:
> +
> +.in +4n
> +.nf
> +struct uffdio_copy {
> +       __u64 dst;
> +       __u64 src;
> +       __u64 len;
> +       __u64 mode;
> +       __s64 copy;
> +};
> +.nf
> +.fi
> +
> +The following values may be bitwise ORed in
> +.IR mode
> +to change the behavior of
> +.I UFFDIO_COPY
> +ioctl:
> +.RS
> +.sp
> +.PD 0
> +.TP 12
> +.B UFFDIO_COPY_MODE_DONTWAKE
> +Do not wake up the thread that waits for page fault resolution
> +.PD
> +.RE
> +.IP
> +The
> +.I copy
> +field of the
> +.I uffdio_copy
> +structure is used by the kernel to return amount of bytes that was actually
> +copied.
> +.\"
> +.TP
> +.BI "UFFDIO_ZERO struct uffdio_zero *" arg
> +Zero out a part of memory range registered with userfaultfd.
> +The requested range is specified by
> +.I range
> +field of
> +.I uffdio_zeropage
> +structure:
> +
> +.in +4n
> +.nf
> +struct uffdio_zeropage {
> +       struct uffdio_range range;
> +       __u64 mode;
> +       __s64 zeropage;
> +};
> +.nf
> +.fi
> +
> +The following values may be bitwise ORed in
> +.IR mode
> +to change the behavior of
> +.I UFFDIO_ZERO
> +ioctl:
> +.RS
> +.sp
> +.PD 0
> +.TP 12
> +.B UFFDIO_ZEROPAGE_MODE_DONTWAKE
> +Do not wake up the thread that waits for page fault resolution
> +.PD
> +.RE
> +.IP
> +The
> +.I zeropage
> +field of the
> +.I uffdio_zero
> +structure is used by the kernel to return amount of bytes that was actually
> +zeroed.
> +.\"
> +.TP
> +.BI "UFFDIO_WAKE struct uffdio_range *" arg
> +Wake up the thread waiting for the page fault resolution.
> +.SH RETURN VALUE
> +For a successful call, the
> +.BR userfaultfd (2)
> +system call returns the new file descriptor for the userfaultfd object.
> +On error, \-1 is returned, and
> +.I errno
> +is set appropriately.
> +.SH ERRORS
> +.TP
> +.B EINVAL
> +An unsupported value was specified in
> +.IR flags .
> +.TP
> +.BR EMFILE
> +The per-process limit on the number of open file descriptors has been
> +reached
> +.TP
> +.B ENFILE
> +The system-wide limit on the total number of open files has been
> +reached.
> +.TP
> +.B ENOMEM
> +Insufficient kernel memory was available.
> +.SH CONFORMING TO
> +.BR userfaultfd ()
> +is Linux-specific and should not be used in programs intended to be
> +portable.
> +.SH NOTES
> +Glibc does not provide a wrapper for this system call; call it using
> +.BR syscall (2).
> +.SH SEE ALSO
> +.BR fcntl (2),
> +.BR ioctl (2)
> +
> +.IR Documentation/vm/userfaultfd.txt
> +in the Linux kernel source tree
> +
> --
> 1.9.1
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] New page describing userfaultfd(2) system call.
       [not found] ` <1482307713-21853-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2016-12-27  8:14   ` Michael Kerrisk (man-pages)
@ 2016-12-27  8:16   ` Michael Kerrisk (man-pages)
       [not found]     ` <CAKgNAkgb=qfTZA4fzUGVmLis-bK5kFU0PfYwHs4dhz+1RRa2TQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-12-27 11:16   ` Andrea Arcangeli
  2 siblings, 1 reply; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-12-27  8:16 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: Andrea Arcangeli, linux-man

Hi Mike,

Thanks for working on this page. Just for background (since it helps
me fore review), how did you get the info that is documented in the
page?

Cheers,

Michael



On 21 December 2016 at 09:08, Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> Signed-off-by: Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  man2/userfaultfd.2 | 314 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 314 insertions(+)
>  create mode 100644 man2/userfaultfd.2
>
> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
> new file mode 100644
> index 0000000..d2196cd
> --- /dev/null
> +++ b/man2/userfaultfd.2
> @@ -0,0 +1,314 @@
> +.\" Copyright (c) 2016, IBM Corporation.
> +.\" Written by Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein.  The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH USERFAULTFD 2 1016-12-12 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +userfaultfd \- create a file descriptor for handling page faults in user
> +space
> +.SH SYNOPSIS
> +.nf
> +.B #include <sys/types.h>
> +.sp
> +.BI "int userfaultfd(int " flags );
> +.fi
> +.PP
> +.IR Note :
> +There is no glibc wrapper for this system call; see NOTES.
> +.SH DESCRIPTION
> +.BR userfaultfd (2)
> +creates a userfaultfd object that can be used for delegation of page fault
> +handling to a user space application.
> +The userfaultfd should be configured using
> +.BR ioctl (2).
> +Once the userfaultfd is configured, the application can use
> +.BR read (2)
> +to receive userfaultfd notifications.
> +The reads from userfaultfd may be blocking or non-blocking, depending on
> +the value of
> +.I flags
> +used for the creation of the userfaultfd or subsequent calls to
> +.BR fcntl (2) .
> +
> +The following values may be bitwise ORed in
> +.IR flags
> +to change the behavior of
> +.BR userfaultfd ():
> +.TP
> +.BR O_CLOEXEC
> +Enable the close-on-exec flag for the new userfaultfd object.
> +See the description of the
> +.B O_CLOEXEC
> +flag in
> +.BR open (2)
> +.TP
> +.BR O_NONBLOCK
> +Enables non-blocking operation for the userfaultfd
> +.BR O_NONBLOCK
> +See the description of the
> +.BR O_NONBLOCK
> +flag in
> +.BR open (2).
> +.\"
> +.SS Userfaultfd operation
> +After the userfaultfd object is created with
> +.BR userfaultfd (2)
> +system call, the application have to enable it using
> +.I UFFDIO_API
> +ioctl to perform API version and supported features handshake between the
> +kernel and the user space.
> +If the
> +.I UFFDIO_API
> +is successful, the application should register memory ranges using
> +.I UFFDIO_REGISTER
> +ioctl. After successful completion of
> +.I UFFDIO_REGISTER
> +ioctl, a page fault occurring in the requested memory range, and satisfying
> +the mode defined at the register time, will be forwarded by the kernel to
> +the user space application.
> +The application then can use
> +.I UFFDIO_COPY
> +or
> +.I UFFDIO_ZERO
> +ioctls to resolve the page fault.
> +.PP
> +Currently, userfaultfd can only be used with anonymous private memory
> +mappings.
> +.\"
> +.SS API Ioctls
> +The API ioctls are used to configure userfaultfd behavior.
> +They allow to choose what features will be enabled and what kinds of events
> +will be delivered to the application.
> +.TP
> +.BR "UFFDIO_API        struct uffdio_api *" api
> +Enable userfaultfd and perform API handshake.
> +The
> +.I uffdio_api
> +structure is defined as:
> +.in +4n
> +.nf
> +
> +struct uffdio_api {
> +       __u64 api;
> +       __u64 features;
> +       __u64 ioctls;
> +};
> +
> +.fi
> +.in
> +The
> +.I api
> +field denotes the API version requested by the application.
> +The kernel verifies that it can support the required API, and sets the
> +.I features
> +and
> +.I ioctls
> +fields to bit masks representing all the available features and the generic
> +ioctls available.
> +.\"
> +.TP
> +.BI "UFFDIO_REGISTER   struct uffdio_register *" arg
> +Register a memory range with userfaultfd.
> +The
> +.I uffdio_register
> +structure is defined as:
> +.in +4n
> +.nf
> +
> +struct uffdio_range {
> +       __u64 start;
> +       __u64 end;
> +};
> +
> +struct uffdio_register {
> +       struct uffdio_range range;
> +       __u64 mode;
> +       __u64 ioctls;
> +};
> +
> +.fi
> +.in
> +
> +The
> +.I range
> +field defines a memory range starting at
> +.I start
> +and ending at
> +.I end
> +that should be handled by the userfaultfd.
> +The
> +.I mode
> +defines mode of operation desired for this memory region.
> +The following values may be bitwise ORed to set the userfaultfd mode for
> +particular range:
> +.RS
> +.sp
> +.PD 0
> +.TP 12
> +.B UFFDIO_REGISTER_MODE_MISSING
> +Track page faults on missing pages
> +.TP 12
> +.B UFFDIO_REGISTER_MODE_WP
> +Track page faults on write protected pages.
> +Currently the only supported mode is
> +.I UFFDIO_REGISTER_MODE_MISSING
> +.PD
> +.RE
> +.IP
> +The kernel answers which ioctl commands are available for the requested
> +range in the
> +.I ioctls
> +field.
> +.\"
> +.TP
> +.BI "UFFDIO_UNREGISTER struct uffdio_register *" arg
> +Unregister a memory range from userfaultfd.
> +.\"
> +.SS Range Ioctls
> +The range ioctls enable the calling application to resolve page fault
> +events in consistent way.
> +.TP
> +.BI "UFFDIO_COPY struct uffdio_copy *" arg
> +Atomically copy a continuous memory chunk into the userfault registered
> +range and optionally wake up the blocked thread.
> +The source and destination addresses and the amount of bytes to copy are
> +specified by
> +.IR src ", " dst ", and " len
> +fields of
> +.I "struct uffdio_copy"
> +respectively:
> +
> +.in +4n
> +.nf
> +struct uffdio_copy {
> +       __u64 dst;
> +       __u64 src;
> +       __u64 len;
> +       __u64 mode;
> +       __s64 copy;
> +};
> +.nf
> +.fi
> +
> +The following values may be bitwise ORed in
> +.IR mode
> +to change the behavior of
> +.I UFFDIO_COPY
> +ioctl:
> +.RS
> +.sp
> +.PD 0
> +.TP 12
> +.B UFFDIO_COPY_MODE_DONTWAKE
> +Do not wake up the thread that waits for page fault resolution
> +.PD
> +.RE
> +.IP
> +The
> +.I copy
> +field of the
> +.I uffdio_copy
> +structure is used by the kernel to return amount of bytes that was actually
> +copied.
> +.\"
> +.TP
> +.BI "UFFDIO_ZERO struct uffdio_zero *" arg
> +Zero out a part of memory range registered with userfaultfd.
> +The requested range is specified by
> +.I range
> +field of
> +.I uffdio_zeropage
> +structure:
> +
> +.in +4n
> +.nf
> +struct uffdio_zeropage {
> +       struct uffdio_range range;
> +       __u64 mode;
> +       __s64 zeropage;
> +};
> +.nf
> +.fi
> +
> +The following values may be bitwise ORed in
> +.IR mode
> +to change the behavior of
> +.I UFFDIO_ZERO
> +ioctl:
> +.RS
> +.sp
> +.PD 0
> +.TP 12
> +.B UFFDIO_ZEROPAGE_MODE_DONTWAKE
> +Do not wake up the thread that waits for page fault resolution
> +.PD
> +.RE
> +.IP
> +The
> +.I zeropage
> +field of the
> +.I uffdio_zero
> +structure is used by the kernel to return amount of bytes that was actually
> +zeroed.
> +.\"
> +.TP
> +.BI "UFFDIO_WAKE struct uffdio_range *" arg
> +Wake up the thread waiting for the page fault resolution.
> +.SH RETURN VALUE
> +For a successful call, the
> +.BR userfaultfd (2)
> +system call returns the new file descriptor for the userfaultfd object.
> +On error, \-1 is returned, and
> +.I errno
> +is set appropriately.
> +.SH ERRORS
> +.TP
> +.B EINVAL
> +An unsupported value was specified in
> +.IR flags .
> +.TP
> +.BR EMFILE
> +The per-process limit on the number of open file descriptors has been
> +reached
> +.TP
> +.B ENFILE
> +The system-wide limit on the total number of open files has been
> +reached.
> +.TP
> +.B ENOMEM
> +Insufficient kernel memory was available.
> +.SH CONFORMING TO
> +.BR userfaultfd ()
> +is Linux-specific and should not be used in programs intended to be
> +portable.
> +.SH NOTES
> +Glibc does not provide a wrapper for this system call; call it using
> +.BR syscall (2).
> +.SH SEE ALSO
> +.BR fcntl (2),
> +.BR ioctl (2)
> +
> +.IR Documentation/vm/userfaultfd.txt
> +in the Linux kernel source tree
> +
> --
> 1.9.1
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] New page describing userfaultfd(2) system call.
       [not found] ` <1482307713-21853-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2016-12-27  8:14   ` Michael Kerrisk (man-pages)
  2016-12-27  8:16   ` Michael Kerrisk (man-pages)
@ 2016-12-27 11:16   ` Andrea Arcangeli
  2 siblings, 0 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2016-12-27 11:16 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: Michael Kerrisk, linux-man-u79uwXL29TY76Z2rM5mHXA

Hello,

On Wed, Dec 21, 2016 at 10:08:33AM +0200, Mike Rapoport wrote:
> +.TH USERFAULTFD 2 1016-12-12 "Linux" "Linux Programmer's Manual"

Typo: 2016

> +Currently, userfaultfd can only be used with anonymous private memory
> +mappings.

This will get obsoleted soon enough, as
shmem/hugetlbfs/non-cooperative patches are complete and are just in
queue for merging on -mm on linux-mm. Although the above is correct
current upstream.

hugetlbfs/shmem won't complicate things much, aside from
UFFDIO_ZEROCOPY not being supported for those areas (and it'll be
reported in the UFFDIO_REGISTER uffdio_register.ioctl structure so it
may not require additional docs).

The feature flags will probably require more documentation:

#define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP |	\
			   UFFD_FEATURE_EVENT_FORK |		\
			   UFFD_FEATURE_EVENT_REMAP |		\
			   UFFD_FEATURE_EVENT_MADVDONTNEED |	\
			   UFFD_FEATURE_MISSING_HUGETLBFS |	\
			   UFFD_FEATURE_MISSING_SHMEM)

UFFD_FEATURE_PAGEFAULT_FLAG_WP isn't queued for upstream merging yet
so that will be the last one to document.

If the kernel supports registering userfaultfd ranges on hugetlbfs
virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in
uffdio_api.features (no matter what is given in input to
UFFDIO_API). If UFFD_FEATURE_MISSING_HUGETLBFS is given in input to
UFFDIO_API by setting it in uffdio_api.features and the kernel doesn't
support registering userfaultfd ranges on hugetlbfs areas, UFFDIO_API
ioctl will return -EINVAL. So there are effectively two ways to probe
if the userfaultfd syscall supports hugetlbfs. Only MAP_PRIVATE
hugetlbfs mappings are supported. UFFD_FEATURE_MISSING_HUGETLBFS being
set in uffdio_api.features however doesn't mean the kernel was built
with hugetlbfs support, it only means if a MAP_PRIVATE hugetlbfs
virtual memory area exists, UFFDIO_REGISTER will succeed on it. It's
up to userland to prepare a hugetlbfs mapping (a fstatfs
HUGETLBFS_MAGIC check can provide such guarantee before creating the
mapping on the hugetlbfs file descriptor).

UFFD_FEATURE_MISSING_SHMEM works the same for shmem (covering all
tmpfs, IPCSHM /dev/zero MAP_SHARED shmem APIs, the only difference is
that there's no MAP_PRIVATE constraint for shmem). The API used to
create shmem doesn't matter as userfaultfd only cares about virtual
memory areas so it doesn't matter how those shmem backed areas have
been created beforehand.

UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_EVENT_REMAP|UFFD_FEATURE_EVENT_MADVDONTNEED
will require further documentation and they're a bit more complicated.

> +The following values may be bitwise ORed in
> +.IR mode
> +to change the behavior of
> +.I UFFDIO_COPY
> +ioctl:
> +.RS
> +.sp
> +.PD 0
> +.TP 12
> +.B UFFDIO_COPY_MODE_DONTWAKE
> +Do not wake up the thread that waits for page fault resolution
> +.PD
> +.RE
> +.IP
> +The
> +.I copy
> +field of the
> +.I uffdio_copy
> +structure is used by the kernel to return amount of bytes that was actually
> +copied.

... or an error (-EINVAL/-EFAULT/-ENOMEM/-EEXIST). If uffdio_copy.copy
doesn't match the uffdio_copy.len passed in input to UFFDIO_COPY, the
ioctl will return -EAGAIN. If the ioctl returns zero it means it
succeeded, no error was reported and the entire area was copied. If a
an invalid fault happens while writing to the uffdio_copy.copy field
the syscall will return -EFAULT. uffdio_copy.copy is an output-only
field so it is not being read by the UFFDIO_COPY ioctl.

> +The
> +.I zeropage
> +field of the
> +.I uffdio_zero
> +structure is used by the kernel to return amount of bytes that was actually
> +zeroed.

.. or an error.. same as UFFDIO_COPY.

Great start of the manpage, thanks!

Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] New page describing userfaultfd(2) system call.
       [not found]     ` <CAKgNAkgb=qfTZA4fzUGVmLis-bK5kFU0PfYwHs4dhz+1RRa2TQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-12-27 11:20       ` Andrea Arcangeli
       [not found]         ` <20161227112038.GD16976-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Andrea Arcangeli @ 2016-12-27 11:20 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Mike Rapoport, linux-man

Hello,

On Tue, Dec 27, 2016 at 09:16:25AM +0100, Michael Kerrisk (man-pages) wrote:
> Hi Mike,
> 
> Thanks for working on this page. Just for background (since it helps
> me fore review), how did you get the info that is documented in the
> page?

Mike maintained and contributed to the non-cooperative support of
userfaultfd (not documented yet in the manpage because not upstream
yet) and he also wrote from scratch the shmem support of userfaultfd
(also still queued for upstream merging). So I assume all information
in the manpage has been obtained by reading the kernel code of
userfaultfd.

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] New page describing userfaultfd(2) system call.
       [not found]         ` <20161227112038.GD16976-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-12-29  7:02           ` Mike Rapoport
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Rapoport @ 2016-12-29  7:02 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Michael Kerrisk (man-pages), linux-man

Hello,

On Tue, Dec 27, 2016 at 12:20:38PM +0100, Andrea Arcangeli wrote:
> Hello,
> 
> On Tue, Dec 27, 2016 at 09:16:25AM +0100, Michael Kerrisk (man-pages) wrote:
> > Hi Mike,
> > 
> > Thanks for working on this page. Just for background (since it helps
> > me fore review), how did you get the info that is documented in the
> > page?
> 
> Mike maintained and contributed to the non-cooperative support of
> userfaultfd (not documented yet in the manpage because not upstream
> yet) and he also wrote from scratch the shmem support of userfaultfd
> (also still queued for upstream merging). So I assume all information
> in the manpage has been obtained by reading the kernel code of
> userfaultfd.

Andrea, thanks for the nice summary :)
I'm working on post-copy migration in CRIU, so except reading the kernel
code, a had quite a few chances to experiment with userfaultfd :)
 
> Thanks,
> Andrea
>

--
Sincerely yours,
Mike. 

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-12-29  7:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-21  8:08 [PATCH] New page describing userfaultfd(2) system call Mike Rapoport
     [not found] ` <1482307713-21853-1-git-send-email-rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-12-27  8:14   ` Michael Kerrisk (man-pages)
2016-12-27  8:16   ` Michael Kerrisk (man-pages)
     [not found]     ` <CAKgNAkgb=qfTZA4fzUGVmLis-bK5kFU0PfYwHs4dhz+1RRa2TQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-27 11:20       ` Andrea Arcangeli
     [not found]         ` <20161227112038.GD16976-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-12-29  7:02           ` Mike Rapoport
2016-12-27 11:16   ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.