linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
@ 2015-03-10  1:49 Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 1/9] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

Changes from v3:

  - Add "size" field in epoll_wait_params. [Jon, Ingo, Seymour]
  - Input validation for ncmds in epoll_ctl_batch. [Dan]
  - Return -EFAULT if copy_to_user failed in epoll_ctl_batch. [Omar, Michael]
  - Change "timeout" in epoll_wait_params to pointer, to get the same
    convention of 'no wait', 'wait indefinitely' and 'wait for specified time'
    with epoll_pwait. [Seymour]
  - Add compat implementation of epoll_pwait1.

Justification
=============

QEMU, among many select/poll based applications, considers epoll as an
alternative when its event loop needs to handle a big number of FDs. However,
there are currently two concerns with epoll which prevents the switching:

The major one is the timeout precision. For example in QEMU, the main loop
takes care of calling callbacks at a specific timeout - the QEMU timer API. The
timeout value in ppoll depends on the next firing timer. epoll_pwait's
millisecond timeout is so coarse that rounding up the timeout will hurt
performance badly.

The minor one is the number of system call to update fd set. While epoll can
handle a large number of fds quickly, it still requires one epoll_ctl per fd
update, compared to the one-shot call to select/poll with an fd array. This may
as well make epoll inferior to ppoll in the cases where a small, but frequently
changing set of fds are polled by the event loop.

This series introduces two new epoll sys calls to address them respectively.
The idea of epoll_ctl_batch is suggested by Andy Lutomirski in [1], who also
suggested clockid as a parameter in epoll_pwait1.

[1]: http://lists.openwall.net/linux-kernel/2015/01/08/542

Benchmark for epoll_pwait1
==========================

By running fio tests inside VM with both original and modified QEMU, we can
compare their difference in performance.

With a small VM setup [t1], the original QEMU (ppoll based) has an 4k read
latency overhead around 37 us. In this setup, the main loop polls 10~20 fds.

With a slightly larger VM instance [t2] - attached a virtio-serial device so
that there are 80~90 fds in the main loop - the original QEMU has a latency
overhead around 49 us. By adding more such devices [t3], we can see the latency
go even higher - 83 us with ~200 FDs.

Now modify QEMU to use epoll_pwait1 and test again, the latency numbers are
repectively 36us, 37us, 47us for t1, t2 and t3.

Previous Changelogs
===================

Changes from v2 (https://lkml.org/lkml/2015/2/4/105)
----------------------------------------------------

  - Rename epoll_ctl_cmd.error_hint to "result". [Michael]

  - Add background introduction in cover letter. [Michael]

  - Expand the last struct of epoll_pwait1, add clockid and timespec.
  
  - Update man page in cover letter accordingly:

    * "error_hint" -> "result".
    * The result field's caveat in "RETURN VALUE" secion of epoll_ctl_batch.

  Please review!

Changes from v1 (https://lkml.org/lkml/2015/1/20/189)
-----------------------------------------------------

  - As discussed in previous thread [1], split the call to epoll_ctl_batch and
    epoll_pwait. [Michael]

  - Fix memory leaks. [Omar]

  - Add a short comment about the ignored copy_to_user failure. [Omar]

  - Cover letter rewritten.

Documentation of the new system calls
=====================================

1) epoll_ctl_batch
------------------

NAME
       epoll_ctl_batch - batch control interface for an epoll descriptor

SYNOPSIS

       #include <sys/epoll.h>

       int epoll_ctl_batch(int epfd, int flags,
                           int ncmds, struct epoll_ctl_cmd *cmds);

DESCRIPTION

       This system call is an extension of epoll_ctl(). The primary difference
       is that this system call allows you to batch multiple operations with
       the one system call. This provides a more efficient interface for
       updating events on this epoll file descriptor epfd.

       The flags argument is reserved and must be 0.

       The argument ncmds is the number of cmds entries being passed in.
       This number must be greater than 0.

       Each operation is specified as an element in the cmds array, defined as:

           struct epoll_ctl_cmd {

                  /* Reserved flags for future extension, must be 0. */
                  int flags;

                  /* The same as epoll_ctl() op parameter. */
                  int op;

                  /* The same as epoll_ctl() fd parameter. */
                  int fd;

                  /* The same as the "events" field in struct epoll_event. */
                  uint32_t events;

                  /* The same as the "data" field in struct epoll_event. */
                  uint64_t data;

                  /* Output field, will be set to the return code after this
                   * command is executed by kernel */
                  int result;
           };

       This system call is not atomic when updating the epoll descriptor.  All
       entries in cmds are executed in the provided order. If any cmds entry
       fails to be processed, no further entries are processed and the number
       of successfully processed entries is returned.

       Each single operation defined by a struct epoll_ctl_cmd has the same
       semantics as an epoll_ctl(2) call. See the epoll_ctl() manual page for
       more information about how to correctly setup the members of a struct
       epoll_ctl_cmd.

       Upon completion of the call the result member of each struct
       epoll_ctl_cmd may be set to 0 (sucessfully completed) or an error code
       depending on the result of the command. If the kernel fails to change
       the result (for example the location of the cmds argument is fully or
       partly read only) the result member of each struct epoll_ctl_cmd may be
       unchanged. 

RETURN VALUE

       epoll_ctl_batch() returns a number greater than 0 to indicate the number
       of cmnd entries processed. If all entries have been processed this will
       equal the ncmds parameter passed in.

       If one or more parameters are incorrect the value returned is -1 with
       errno set appropriately - no cmds entries have been processed when this
       happens.

       If processing any entry in the cmds argument results in an error, the
       number returned is the index of the failing entry - this number will be
       less than ncmds. Since ncmds must be greater than 0, a return value of 0
       indicates an error associated with the very first cmds entry. A return
       value of 0 does not indicate a successful system call.

       To correctly test the return value from epoll_ctl_batch() use code
       similar to the following:

		ret = epoll_ctl_batch(epfd, flags, ncmds, &cmds);
		if (ret < ncmds) {
			if (ret == -1) {
				/* An argument was invalid */
			} else {
				/* ret contains the number of successful entries
                                 * processed. If you (mis?)use it as a C index it
                                 * will index directly to the failing entry to
                                 * get the result use cmds[ret].result which may 
                                 * contain the errno value associated with the
                                 * entry.
                                 */
			}
		} else {
			/* Success */
		}

ERRORS

       EINVAL flags is non-zero; ncmds is less than or equal to zero, or
              greater than (INT_MAX / sizeof(struct epoll_ctl_cmd); cmds is
              NULL;

       ENOMEM There was insufficient memory to handle the requested op control
              operation.

       EFAULT The memory area pointed to by cmds is not accessible.

       In the event that the return value is not the same as the ncmds
       parameter, the result member of the failing struct epoll_ctl_cmd will
       contain a negative errno value related to the error, unless the memory
       area is not writable (EFAULT returned). The errno values that can be set
       are those documented on the epoll_ctl(2) manual page.
       

CONFORMING TO

       epoll_ctl_batch() is Linux-specific.

SEE ALSO

       epoll_create(2), epoll_ctl(2), epoll_wait(2), epoll_pwait(2), epoll(7)


2) epoll_pwait1
---------------

NAME
       epoll_pwait1 - wait for an I/O event on an epoll file descriptor

SYNOPSIS

       #include <sys/epoll.h>

       int epoll_pwait1(int epfd, int flags,
                        struct epoll_event *events,
                        int maxevents,
                        struct epoll_wait_params *params);

DESCRIPTION

       The epoll_pwait1() syscall has more elaborate parameters compared to
       epoll_pwait(), in order to allow fine control of the wait.

       The epfd, events and maxevents parameters are the same
       as in epoll_wait() and epoll_pwait(). The flags and params are new.

       The flags is reserved and must be zero.

       The params is a pointer to a struct epoll_wait_params which is
       defined as:

           struct epoll_wait_params {
               int clockid;
               struct timespec *timeout;
               sigset_t *sigmask;
               size_t sigsetsize;
           };

       The clockid member must be either CLOCK_REALTIME or CLOCK_MONOTONIC.
       This will choose the clock type to use for timeout. This differs to
       epoll_pwait(2) which has an implicit clock type of CLOCK_MONOTONIC.
       
       The timeout member specifies the minimum time that epoll_wait(2) will
       block. The time spent waiting will be rounded up to the clock
       granularity. Kernel scheduling delays mean that the blocking
       interval may overrun by a small amount. Specifying NULL will cause
       causes epoll_pwait1(2) to block indefinitely. Specifying a timeout
       equal to zero (both tv_sec and tv_nsec are zero) causes epoll_pwait1(2)
       to return immediately, even if no events are available.

       Both sigmask and sigsetsize have the same semantics as epoll_pwait(2).
       The sigmask field may be specified as NULL, in which case
       epoll_pwait1(2) will behave like epoll_wait(2).

   User visibility of sigsetsize

       In epoll_pwait(2) and other syscalls, sigsetsize is not visible to
       an application developer as glibc has a wrapper around epoll_pwait(2).
       Now we pack several parameters in epoll_wait_params. In
       order to hide sigsetsize from application code this system call also
       needs to be wrapped either by expanding parameters and building the
       structure in the wrapper function, or by only asking application to
       provide this part of the structure:

           struct epoll_wait_params_user {
               int clockid;
               struct timespec *timeout;
               sigset_t *sigmask;
           };

      In the wrapper function it would be copied to a full structure with
      sigsetsize filled in.

RETURN VALUE

       When successful, epoll_wait1() returns the number of file descriptors
       ready for the requested I/O, or zero if no file descriptor became ready
       during the requested timeout nanoseconds. When an error occurs,
       epoll_wait1() returns -1 and errno is set appropriately.

ERRORS

       This system call can set errno to the same values as epoll_pwait(2), 
       as well as the following additional reasons:

       EINVAL flags is not zero, or clockid is not one of CLOCK_REALTIME or
              CLOCK_MONOTONIC, or the timespec data pointed to by timeout is
              not valid.

       EFAULT The memory area pointed to by params, params.sigmask or
              params.timeout is not accessible.

CONFORMING TO

       epoll_pwait1() is Linux-specific.

SEE ALSO

       epoll_create(2), epoll_ctl(2), epoll_wait(2), epoll_pwait(2), epoll(7)

Fam Zheng (9):
  epoll: Extract epoll_wait_do and epoll_pwait_do
  epoll: Specify clockid explicitly
  epoll: Extract ep_ctl_do
  epoll: Add implementation for epoll_ctl_batch
  x86: Hook up epoll_ctl_batch syscall
  epoll: Add implementation for epoll_pwait1
  x86: Hook up epoll_pwait1 syscall
  epoll: Add compat version implementation of epoll_pwait1
  x86: Hook up 32 bit compat epoll_pwait1 syscall

 arch/x86/syscalls/syscall_32.tbl |   2 +
 arch/x86/syscalls/syscall_64.tbl |   2 +
 fs/eventpoll.c                   | 308 ++++++++++++++++++++++++++++-----------
 include/linux/compat.h           |   6 +
 include/linux/syscalls.h         |   9 ++
 include/uapi/linux/eventpoll.h   |  19 +++
 6 files changed, 262 insertions(+), 84 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/9] epoll: Extract epoll_wait_do and epoll_pwait_do
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
@ 2015-03-10  1:49 ` Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 2/9] epoll: Specify clockid explicitly Fam Zheng
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

In preparation of new epoll syscalls, this patch allows reusing the code from
epoll_pwait implementation. The new functions uses ktime_t for more accuracy.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 fs/eventpoll.c | 154 ++++++++++++++++++++++++++-------------------------------
 1 file changed, 71 insertions(+), 83 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e009ca..7dfabeb 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1554,17 +1554,6 @@ static int ep_send_events(struct eventpoll *ep,
 	return ep_scan_ready_list(ep, ep_send_events_proc, &esed, 0, false);
 }
 
-static inline struct timespec ep_set_mstimeout(long ms)
-{
-	struct timespec now, ts = {
-		.tv_sec = ms / MSEC_PER_SEC,
-		.tv_nsec = NSEC_PER_MSEC * (ms % MSEC_PER_SEC),
-	};
-
-	ktime_get_ts(&now);
-	return timespec_add_safe(now, ts);
-}
-
 /**
  * ep_poll - Retrieves ready events, and delivers them to the caller supplied
  *           event buffer.
@@ -1573,17 +1562,15 @@ static inline struct timespec ep_set_mstimeout(long ms)
  * @events: Pointer to the userspace buffer where the ready events should be
  *          stored.
  * @maxevents: Size (in terms of number of events) of the caller event buffer.
- * @timeout: Maximum timeout for the ready events fetch operation, in
- *           milliseconds. If the @timeout is zero, the function will not block,
- *           while if the @timeout is less than zero, the function will block
- *           until at least one event has been retrieved (or an error
- *           occurred).
+ * @timeout: Maximum timeout for the ready events fetch operation.  If 0, the
+ *           function will not block. If negative, the function will block until
+ *           at least one event has been retrieved (or an error occurred).
  *
  * Returns: Returns the number of ready events which have been fetched, or an
  *          error code, in case of error.
  */
 static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
-		   int maxevents, long timeout)
+		   int maxevents, const ktime_t timeout)
 {
 	int res = 0, eavail, timed_out = 0;
 	unsigned long flags;
@@ -1591,13 +1578,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 	wait_queue_t wait;
 	ktime_t expires, *to = NULL;
 
-	if (timeout > 0) {
-		struct timespec end_time = ep_set_mstimeout(timeout);
-
-		slack = select_estimate_accuracy(&end_time);
-		to = &expires;
-		*to = timespec_to_ktime(end_time);
-	} else if (timeout == 0) {
+	if (!ktime_to_ns(timeout)) {
 		/*
 		 * Avoid the unnecessary trip to the wait queue loop, if the
 		 * caller specified a non blocking operation.
@@ -1605,6 +1586,15 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		timed_out = 1;
 		spin_lock_irqsave(&ep->lock, flags);
 		goto check_events;
+	} else if (ktime_to_ns(timeout) > 0) {
+		struct timespec now, end_time;
+
+		ktime_get_ts(&now);
+		end_time = timespec_add_safe(now, ktime_to_timespec(timeout));
+
+		slack = select_estimate_accuracy(&end_time);
+		to = &expires;
+		*to = timespec_to_ktime(end_time);
 	}
 
 fetch_events:
@@ -1954,12 +1944,8 @@ error_return:
 	return error;
 }
 
-/*
- * Implement the event wait interface for the eventpoll file. It is the kernel
- * part of the user space epoll_wait(2).
- */
-SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
-		int, maxevents, int, timeout)
+static inline int epoll_wait_do(int epfd, struct epoll_event __user *events,
+				int maxevents, const ktime_t timeout)
 {
 	int error;
 	struct fd f;
@@ -2002,46 +1988,70 @@ error_fput:
 
 /*
  * Implement the event wait interface for the eventpoll file. It is the kernel
+ * part of the user space epoll_wait(2).
+ */
+SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
+		int, maxevents, int, timeout)
+{
+	ktime_t kt = ms_to_ktime(timeout);
+	return epoll_wait_do(epfd, events, maxevents, kt);
+}
+
+static inline int epoll_pwait_do(int epfd, struct epoll_event __user *events,
+				 int maxevents, ktime_t timeout,
+				 sigset_t *sigmask, size_t sigsetsize)
+{
+	int error;
+	sigset_t sigsaved;
+
+	/*
+	 * If the caller wants a certain signal mask to be set during the wait,
+	 * we apply it here.
+	 */
+	if (sigmask) {
+		sigsaved = current->blocked;
+		set_current_blocked(sigmask);
+	}
+
+	error = epoll_wait_do(epfd, events, maxevents, timeout);
+
+	/*
+	 * If we changed the signal mask, we need to restore the original one.
+	 * In case we've got a signal while waiting, we do not restore the
+	 * signal mask yet, and we allow do_signal() to deliver the signal on
+	 * the way back to userspace, before the signal mask is restored.
+	 */
+	if (sigmask) {
+		if (error == -EINTR) {
+			memcpy(&current->saved_sigmask, &sigsaved,
+			       sizeof(sigsaved));
+			set_restore_sigmask();
+		} else
+			set_current_blocked(&sigsaved);
+	}
+
+	return error;
+}
+
+/*
+ * Implement the event wait interface for the eventpoll file. It is the kernel
  * part of the user space epoll_pwait(2).
  */
 SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 		int, maxevents, int, timeout, const sigset_t __user *, sigmask,
 		size_t, sigsetsize)
 {
-	int error;
-	sigset_t ksigmask, sigsaved;
+	ktime_t kt = ms_to_ktime(timeout);
+	sigset_t ksigmask;
 
-	/*
-	 * If the caller wants a certain signal mask to be set during the wait,
-	 * we apply it here.
-	 */
 	if (sigmask) {
 		if (sigsetsize != sizeof(sigset_t))
 			return -EINVAL;
 		if (copy_from_user(&ksigmask, sigmask, sizeof(ksigmask)))
 			return -EFAULT;
-		sigsaved = current->blocked;
-		set_current_blocked(&ksigmask);
 	}
-
-	error = sys_epoll_wait(epfd, events, maxevents, timeout);
-
-	/*
-	 * If we changed the signal mask, we need to restore the original one.
-	 * In case we've got a signal while waiting, we do not restore the
-	 * signal mask yet, and we allow do_signal() to deliver the signal on
-	 * the way back to userspace, before the signal mask is restored.
-	 */
-	if (sigmask) {
-		if (error == -EINTR) {
-			memcpy(&current->saved_sigmask, &sigsaved,
-			       sizeof(sigsaved));
-			set_restore_sigmask();
-		} else
-			set_current_blocked(&sigsaved);
-	}
-
-	return error;
+	return epoll_pwait_do(epfd, events, maxevents, kt,
+			      sigmask ? &ksigmask : NULL, sigsetsize);
 }
 
 #ifdef CONFIG_COMPAT
@@ -2051,42 +2061,20 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 			const compat_sigset_t __user *, sigmask,
 			compat_size_t, sigsetsize)
 {
-	long err;
 	compat_sigset_t csigmask;
-	sigset_t ksigmask, sigsaved;
+	sigset_t ksigmask;
+	ktime_t kt = ms_to_ktime(timeout);
 
-	/*
-	 * If the caller wants a certain signal mask to be set during the wait,
-	 * we apply it here.
-	 */
 	if (sigmask) {
 		if (sigsetsize != sizeof(compat_sigset_t))
 			return -EINVAL;
 		if (copy_from_user(&csigmask, sigmask, sizeof(csigmask)))
 			return -EFAULT;
 		sigset_from_compat(&ksigmask, &csigmask);
-		sigsaved = current->blocked;
-		set_current_blocked(&ksigmask);
 	}
 
-	err = sys_epoll_wait(epfd, events, maxevents, timeout);
-
-	/*
-	 * If we changed the signal mask, we need to restore the original one.
-	 * In case we've got a signal while waiting, we do not restore the
-	 * signal mask yet, and we allow do_signal() to deliver the signal on
-	 * the way back to userspace, before the signal mask is restored.
-	 */
-	if (sigmask) {
-		if (err == -EINTR) {
-			memcpy(&current->saved_sigmask, &sigsaved,
-			       sizeof(sigsaved));
-			set_restore_sigmask();
-		} else
-			set_current_blocked(&sigsaved);
-	}
-
-	return err;
+	return epoll_pwait_do(epfd, events, maxevents, kt,
+			      sigmask ? &ksigmask : NULL, sigsetsize);
 }
 #endif
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/9] epoll: Specify clockid explicitly
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 1/9] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
@ 2015-03-10  1:49 ` Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 3/9] epoll: Extract ep_ctl_do Fam Zheng
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

Later we will add clockid in the interface, so let's start using explicit
clockid internally. Now we specify CLOCK_MONOTONIC, which is the same as before.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 fs/eventpoll.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 7dfabeb..957d1d0 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1570,7 +1570,7 @@ static int ep_send_events(struct eventpoll *ep,
  *          error code, in case of error.
  */
 static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
-		   int maxevents, const ktime_t timeout)
+		   int maxevents, int clockid, const ktime_t timeout)
 {
 	int res = 0, eavail, timed_out = 0;
 	unsigned long flags;
@@ -1578,6 +1578,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 	wait_queue_t wait;
 	ktime_t expires, *to = NULL;
 
+	if (clockid != CLOCK_MONOTONIC && clockid != CLOCK_REALTIME)
+		return -EINVAL;
 	if (!ktime_to_ns(timeout)) {
 		/*
 		 * Avoid the unnecessary trip to the wait queue loop, if the
@@ -1624,7 +1626,8 @@ fetch_events:
 			}
 
 			spin_unlock_irqrestore(&ep->lock, flags);
-			if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
+			if (!schedule_hrtimeout_range_clock(to, slack,
+						HRTIMER_MODE_ABS, clockid))
 				timed_out = 1;
 
 			spin_lock_irqsave(&ep->lock, flags);
@@ -1945,7 +1948,8 @@ error_return:
 }
 
 static inline int epoll_wait_do(int epfd, struct epoll_event __user *events,
-				int maxevents, const ktime_t timeout)
+				int maxevents, int clockid,
+				const ktime_t timeout)
 {
 	int error;
 	struct fd f;
@@ -1979,7 +1983,7 @@ static inline int epoll_wait_do(int epfd, struct epoll_event __user *events,
 	ep = f.file->private_data;
 
 	/* Time to fish for events ... */
-	error = ep_poll(ep, events, maxevents, timeout);
+	error = ep_poll(ep, events, maxevents, clockid, timeout);
 
 error_fput:
 	fdput(f);
@@ -1994,12 +1998,13 @@ SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
 		int, maxevents, int, timeout)
 {
 	ktime_t kt = ms_to_ktime(timeout);
-	return epoll_wait_do(epfd, events, maxevents, kt);
+	return epoll_wait_do(epfd, events, maxevents, CLOCK_MONOTONIC, kt);
 }
 
 static inline int epoll_pwait_do(int epfd, struct epoll_event __user *events,
-				 int maxevents, ktime_t timeout,
-				 sigset_t *sigmask, size_t sigsetsize)
+				 int maxevents,
+				 int clockid, ktime_t timeout,
+				 sigset_t *sigmask)
 {
 	int error;
 	sigset_t sigsaved;
@@ -2013,7 +2018,7 @@ static inline int epoll_pwait_do(int epfd, struct epoll_event __user *events,
 		set_current_blocked(sigmask);
 	}
 
-	error = epoll_wait_do(epfd, events, maxevents, timeout);
+	error = epoll_wait_do(epfd, events, maxevents, clockid, timeout);
 
 	/*
 	 * If we changed the signal mask, we need to restore the original one.
@@ -2050,8 +2055,8 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 		if (copy_from_user(&ksigmask, sigmask, sizeof(ksigmask)))
 			return -EFAULT;
 	}
-	return epoll_pwait_do(epfd, events, maxevents, kt,
-			      sigmask ? &ksigmask : NULL, sigsetsize);
+	return epoll_pwait_do(epfd, events, maxevents, CLOCK_MONOTONIC, kt,
+			      sigmask ? &ksigmask : NULL);
 }
 
 #ifdef CONFIG_COMPAT
@@ -2073,8 +2078,8 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 		sigset_from_compat(&ksigmask, &csigmask);
 	}
 
-	return epoll_pwait_do(epfd, events, maxevents, kt,
-			      sigmask ? &ksigmask : NULL, sigsetsize);
+	return epoll_pwait_do(epfd, events, maxevents, CLOCK_MONOTONIC, kt,
+			      sigmask ? &ksigmask : NULL);
 }
 #endif
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/9] epoll: Extract ep_ctl_do
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 1/9] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 2/9] epoll: Specify clockid explicitly Fam Zheng
@ 2015-03-10  1:49 ` Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 4/9] epoll: Add implementation for epoll_ctl_batch Fam Zheng
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

This is the common part from epoll_ctl implementation which will be
shared with the new syscall.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 fs/eventpoll.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 957d1d0..7909c88 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1810,22 +1810,15 @@ SYSCALL_DEFINE1(epoll_create, int, size)
  * the eventpoll file that enables the insertion/removal/change of
  * file descriptors inside the interest set.
  */
-SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
-		struct epoll_event __user *, event)
+int ep_ctl_do(int epfd, int op, int fd, struct epoll_event epds)
 {
 	int error;
 	int full_check = 0;
 	struct fd f, tf;
 	struct eventpoll *ep;
 	struct epitem *epi;
-	struct epoll_event epds;
 	struct eventpoll *tep = NULL;
 
-	error = -EFAULT;
-	if (ep_op_has_event(op) &&
-	    copy_from_user(&epds, event, sizeof(struct epoll_event)))
-		goto error_return;
-
 	error = -EBADF;
 	f = fdget(epfd);
 	if (!f.file)
@@ -1947,6 +1940,23 @@ error_return:
 	return error;
 }
 
+/*
+ * The following function implements the controller interface for
+ * the eventpoll file that enables the insertion/removal/change of
+ * file descriptors inside the interest set.
+ */
+SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
+		struct epoll_event __user *, event)
+{
+	struct epoll_event epds;
+
+	if (ep_op_has_event(op) &&
+	    copy_from_user(&epds, event, sizeof(struct epoll_event)))
+		return -EFAULT;
+
+	return ep_ctl_do(epfd, op, fd, epds);
+}
+
 static inline int epoll_wait_do(int epfd, struct epoll_event __user *events,
 				int maxevents, int clockid,
 				const ktime_t timeout)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/9] epoll: Add implementation for epoll_ctl_batch
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
                   ` (2 preceding siblings ...)
  2015-03-10  1:49 ` [PATCH v4 3/9] epoll: Extract ep_ctl_do Fam Zheng
@ 2015-03-10  1:49 ` Fam Zheng
  2015-03-10 13:59   ` Dan Rosenberg
  2015-03-10  1:49 ` [PATCH v4 5/9] x86: Hook up epoll_ctl_batch syscall Fam Zheng
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

This new syscall is a batched version of epoll_ctl. It will execute each
command as specified in cmds in given order, and stop at first failure
or upon completion of all commands.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 fs/eventpoll.c                 | 50 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/syscalls.h       |  4 ++++
 include/uapi/linux/eventpoll.h | 11 ++++++++++
 3 files changed, 65 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 7909c88..54dc63f 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -99,6 +99,8 @@
 
 #define EP_MAX_EVENTS (INT_MAX / sizeof(struct epoll_event))
 
+#define EP_MAX_BATCH (INT_MAX / sizeof(struct epoll_ctl_cmd))
+
 #define EP_UNACTIVE_PTR ((void *) -1L)
 
 #define EP_ITEM_COST (sizeof(struct epitem) + sizeof(struct eppoll_entry))
@@ -2069,6 +2071,54 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 			      sigmask ? &ksigmask : NULL);
 }
 
+SYSCALL_DEFINE4(epoll_ctl_batch, int, epfd, int, flags,
+		int, ncmds, struct epoll_ctl_cmd __user *, cmds)
+{
+	struct epoll_ctl_cmd *kcmds = NULL;
+	int i, ret = 0;
+	size_t cmd_size;
+
+	if (flags)
+		return -EINVAL;
+	if (!cmds || ncmds <= 0 || ncmds > EP_MAX_BATCH)
+		return -EINVAL;
+	cmd_size = sizeof(struct epoll_ctl_cmd) * ncmds;
+	/* TODO: optimize for small arguments like select/poll with a stack
+	 * allocated buffer */
+
+	kcmds = kmalloc(cmd_size, GFP_KERNEL);
+	if (!kcmds)
+		return -ENOMEM;
+	if (copy_from_user(kcmds, cmds, cmd_size)) {
+		ret = -EFAULT;
+		goto out;
+	}
+	for (i = 0; i < ncmds; i++) {
+		struct epoll_event ev = (struct epoll_event) {
+			.events = kcmds[i].events,
+			.data = kcmds[i].data,
+		};
+		if (kcmds[i].flags) {
+			kcmds[i].result = -EINVAL;
+			goto copy;
+		}
+		kcmds[i].result = ep_ctl_do(epfd, kcmds[i].op,
+					    kcmds[i].fd, ev);
+		if (kcmds[i].result)
+			goto copy;
+		ret++;
+	}
+copy:
+	/* We lose the number of succeeded commands in favor of returning
+	 * -EFAULT, but in this case the application will want to fix the
+	 *  memory bug first. */
+	if (copy_to_user(cmds, kcmds, cmd_size))
+		ret = -EFAULT;
+out:
+	kfree(kcmds);
+	return ret;
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 			struct epoll_event __user *, events,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 76d1e38..7d784e3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -12,6 +12,7 @@
 #define _LINUX_SYSCALLS_H
 
 struct epoll_event;
+struct epoll_ctl_cmd;
 struct iattr;
 struct inode;
 struct iocb;
@@ -634,6 +635,9 @@ asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events,
 				int maxevents, int timeout,
 				const sigset_t __user *sigmask,
 				size_t sigsetsize);
+asmlinkage long sys_epoll_ctl_batch(int epfd, int flags,
+				    int ncmds,
+				    struct epoll_ctl_cmd __user *cmds);
 asmlinkage long sys_gethostname(char __user *name, int len);
 asmlinkage long sys_sethostname(char __user *name, int len);
 asmlinkage long sys_setdomainname(char __user *name, int len);
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bc81fb2..4e18b17 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -18,6 +18,8 @@
 #include <linux/fcntl.h>
 #include <linux/types.h>
 
+#include <linux/signal.h>
+
 /* Flags for epoll_create1.  */
 #define EPOLL_CLOEXEC O_CLOEXEC
 
@@ -61,6 +63,15 @@ struct epoll_event {
 	__u64 data;
 } EPOLL_PACKED;
 
+struct epoll_ctl_cmd {
+	int flags;
+	int op;
+	int fd;
+	__u32 events;
+	__u64 data;
+	int result;
+} EPOLL_PACKED;
+
 #ifdef CONFIG_PM_SLEEP
 static inline void ep_take_care_of_epollwakeup(struct epoll_event *epev)
 {
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 5/9] x86: Hook up epoll_ctl_batch syscall
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
                   ` (3 preceding siblings ...)
  2015-03-10  1:49 ` [PATCH v4 4/9] epoll: Add implementation for epoll_ctl_batch Fam Zheng
@ 2015-03-10  1:49 ` Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 6/9] epoll: Add implementation for epoll_pwait1 Fam Zheng
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 arch/x86/syscalls/syscall_32.tbl | 1 +
 arch/x86/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index b3560ec..fe809f6 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
 356	i386	memfd_create		sys_memfd_create
 357	i386	bpf			sys_bpf
 358	i386	execveat		sys_execveat			stub32_execveat
+359	i386	epoll_ctl_batch		sys_epoll_ctl_batch
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 8d656fb..67b2ea4 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
 320	common	kexec_file_load		sys_kexec_file_load
 321	common	bpf			sys_bpf
 322	64	execveat		stub_execveat
+323	64	epoll_ctl_batch		sys_epoll_ctl_batch
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 6/9] epoll: Add implementation for epoll_pwait1
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
                   ` (4 preceding siblings ...)
  2015-03-10  1:49 ` [PATCH v4 5/9] x86: Hook up epoll_ctl_batch syscall Fam Zheng
@ 2015-03-10  1:49 ` Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 7/9] x86: Hook up epoll_pwait1 syscall Fam Zheng
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

This is the new implementation for poll which has a flags parameter and
packs a number of parameters into a structure.

The main advantage of it over existing epoll_pwait is about timeout:
epoll_pwait expects a relative millisecond value, while epoll_pwait1
accepts 1) a timespec which is in nanosecond granularity; 2) a clockid
to allow using a clock other than CLOCK_MONOTONIC.

The 'flags' field in params is reserved for now and must be zero. The
next step would be allowing absolute timeout value.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 fs/eventpoll.c                 | 39 ++++++++++++++++++++++++++++++++++++++-
 include/linux/syscalls.h       |  5 +++++
 include/uapi/linux/eventpoll.h |  8 ++++++++
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 54dc63f..06a59fc 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2085,7 +2085,6 @@ SYSCALL_DEFINE4(epoll_ctl_batch, int, epfd, int, flags,
 	cmd_size = sizeof(struct epoll_ctl_cmd) * ncmds;
 	/* TODO: optimize for small arguments like select/poll with a stack
 	 * allocated buffer */
-
 	kcmds = kmalloc(cmd_size, GFP_KERNEL);
 	if (!kcmds)
 		return -ENOMEM;
@@ -2119,6 +2118,44 @@ out:
 	return ret;
 }
 
+SYSCALL_DEFINE5(epoll_pwait1, int, epfd, int, flags,
+		struct epoll_event __user *, events,
+		int, maxevents,
+		struct epoll_wait_params __user *, params)
+{
+	struct epoll_wait_params p;
+	ktime_t kt = { 0 };
+	sigset_t sigmask;
+	struct timespec timeout;
+
+	if (flags)
+		return -EINVAL;
+	if (!params)
+		return -EINVAL;
+	if (copy_from_user(&p, params, sizeof(p)))
+		return -EFAULT;
+	if (p.size != sizeof(p))
+		return -EINVAL;
+	if (p.sigmask) {
+		if (copy_from_user(&sigmask, p.sigmask, sizeof(sigmask)))
+			return -EFAULT;
+		if (p.sigsetsize != sizeof(p.sigmask))
+			return -EINVAL;
+	}
+	if (p.timeout) {
+		if (copy_from_user(&timeout, p.timeout, sizeof(timeout)))
+			return -EFAULT;
+		if (!timespec_valid(&timeout))
+			return -EINVAL;
+		kt = timespec_to_ktime(timeout);
+	} else {
+		kt = ns_to_ktime(-1);
+	}
+
+	return epoll_pwait_do(epfd, events, maxevents, p.clockid,
+			      kt, p.sigmask ? &sigmask : NULL);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 			struct epoll_event __user *, events,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 7d784e3..a4823d9 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -13,6 +13,7 @@
 
 struct epoll_event;
 struct epoll_ctl_cmd;
+struct epoll_wait_params;
 struct iattr;
 struct inode;
 struct iocb;
@@ -635,6 +636,10 @@ asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events,
 				int maxevents, int timeout,
 				const sigset_t __user *sigmask,
 				size_t sigsetsize);
+asmlinkage long sys_epoll_pwait1(int epfd, int flags,
+				 struct epoll_event __user *events,
+				 int maxevents,
+				 struct epoll_wait_params __user *params);
 asmlinkage long sys_epoll_ctl_batch(int epfd, int flags,
 				    int ncmds,
 				    struct epoll_ctl_cmd __user *cmds);
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index 4e18b17..05ae035 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -72,6 +72,14 @@ struct epoll_ctl_cmd {
 	int result;
 } EPOLL_PACKED;
 
+struct epoll_wait_params {
+	int size;
+	int clockid;
+	struct timespec *timeout;
+	sigset_t *sigmask;
+	size_t sigsetsize;
+} EPOLL_PACKED;
+
 #ifdef CONFIG_PM_SLEEP
 static inline void ep_take_care_of_epollwakeup(struct epoll_event *epev)
 {
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 7/9] x86: Hook up epoll_pwait1 syscall
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
                   ` (5 preceding siblings ...)
  2015-03-10  1:49 ` [PATCH v4 6/9] epoll: Add implementation for epoll_pwait1 Fam Zheng
@ 2015-03-10  1:49 ` Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 8/9] epoll: Add compat version implementation of epoll_pwait1 Fam Zheng
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 arch/x86/syscalls/syscall_32.tbl | 1 +
 arch/x86/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index fe809f6..bf912d8 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -366,3 +366,4 @@
 357	i386	bpf			sys_bpf
 358	i386	execveat		sys_execveat			stub32_execveat
 359	i386	epoll_ctl_batch		sys_epoll_ctl_batch
+360	i386	epoll_pwait1		sys_epoll_pwait1
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 67b2ea4..9246ad5 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -330,6 +330,7 @@
 321	common	bpf			sys_bpf
 322	64	execveat		stub_execveat
 323	64	epoll_ctl_batch		sys_epoll_ctl_batch
+324	64	epoll_pwait1		sys_epoll_pwait1
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 8/9] epoll: Add compat version implementation of epoll_pwait1
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
                   ` (6 preceding siblings ...)
  2015-03-10  1:49 ` [PATCH v4 7/9] x86: Hook up epoll_pwait1 syscall Fam Zheng
@ 2015-03-10  1:49 ` Fam Zheng
  2015-03-10  1:49 ` [PATCH v4 9/9] x86: Hook up 32 bit compat epoll_pwait1 syscall Fam Zheng
  2015-03-12 15:02 ` [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Jason Baron
  9 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 fs/eventpoll.c         | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/compat.h |  6 ++++++
 2 files changed, 56 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 06a59fc..b837ea4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2178,6 +2178,56 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 	return epoll_pwait_do(epfd, events, maxevents, CLOCK_MONOTONIC, kt,
 			      sigmask ? &ksigmask : NULL);
 }
+
+struct compat_epoll_wait_params {
+	int size;
+	int clockid;
+	compat_uptr_t timeout;
+	compat_uptr_t sigmask;
+	compat_size_t sigsetsize;
+} EPOLL_PACKED;
+
+COMPAT_SYSCALL_DEFINE5(epoll_pwait1, int, epfd, int, flags,
+		       struct epoll_event __user *, events,
+		       int, maxevents,
+		       struct compat_epoll_wait_params __user *, params)
+{
+	struct compat_epoll_wait_params p;
+
+	ktime_t kt = { 0 };
+	sigset_t sigmask;
+	compat_sigset_t compat_sigmask;
+	struct timespec timeout;
+
+	if (flags)
+		return -EINVAL;
+	if (!params)
+		return -EINVAL;
+	if (copy_from_user(&p, params, sizeof(p)))
+		return -EFAULT;
+	if (p.size != sizeof(p))
+		return -EINVAL;
+	if (p.sigmask) {
+		if (copy_from_user(&compat_sigmask, compat_ptr(p.sigmask),
+				   sizeof(sigmask)))
+			return -EFAULT;
+		if (p.sigsetsize != sizeof(p.sigmask))
+			return -EINVAL;
+		sigset_from_compat(&sigmask, &compat_sigmask);
+	}
+	if (p.timeout) {
+		if (compat_get_timespec(&timeout, compat_ptr(p.timeout)))
+			return -EFAULT;
+		if (!timespec_valid(&timeout))
+			return -EINVAL;
+		kt = timespec_to_ktime(timeout);
+	} else {
+		kt = ns_to_ktime(-1);
+	}
+
+	return epoll_pwait_do(epfd, events, maxevents, p.clockid,
+			      kt, p.sigmask ? &sigmask : NULL);
+}
 #endif
 
 static int __init eventpoll_init(void)
diff --git a/include/linux/compat.h b/include/linux/compat.h
index ab25814..649c5b2 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -452,6 +452,12 @@ asmlinkage long compat_sys_epoll_pwait(int epfd,
 			const compat_sigset_t __user *sigmask,
 			compat_size_t sigsetsize);
 
+struct compat_epoll_wait_params;
+asmlinkage long compat_sys_epoll_pwait1(int epfd, int flags,
+			struct epoll_event __user *events,
+			int maxevents,
+			struct compat_epoll_wait_params __user *params);
+
 asmlinkage long compat_sys_utime(const char __user *filename,
 				 struct compat_utimbuf __user *t);
 asmlinkage long compat_sys_utimensat(unsigned int dfd,
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 9/9] x86: Hook up 32 bit compat epoll_pwait1 syscall
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
                   ` (7 preceding siblings ...)
  2015-03-10  1:49 ` [PATCH v4 8/9] epoll: Add compat version implementation of epoll_pwait1 Fam Zheng
@ 2015-03-10  1:49 ` Fam Zheng
  2015-03-12 15:02 ` [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Jason Baron
  9 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-10  1:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Fam Zheng, Peter Zijlstra, linux-fsdevel,
	linux-api, Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 arch/x86/syscalls/syscall_32.tbl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index bf912d8..5728c2e 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -366,4 +366,4 @@
 357	i386	bpf			sys_bpf
 358	i386	execveat		sys_execveat			stub32_execveat
 359	i386	epoll_ctl_batch		sys_epoll_ctl_batch
-360	i386	epoll_pwait1		sys_epoll_pwait1
+360	i386	epoll_pwait1		sys_epoll_pwait1		compat_sys_epoll_pwait1
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/9] epoll: Add implementation for epoll_ctl_batch
  2015-03-10  1:49 ` [PATCH v4 4/9] epoll: Add implementation for epoll_ctl_batch Fam Zheng
@ 2015-03-10 13:59   ` Dan Rosenberg
  2015-03-11  2:23     ` Fam Zheng
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Rosenberg @ 2015-03-10 13:59 UTC (permalink / raw)
  To: Fam Zheng, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Peter Zijlstra, linux-fsdevel, linux-api,
	Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour

On 03/09/2015 09:49 PM, Fam Zheng wrote:
> +	if (!cmds || ncmds <= 0 || ncmds > EP_MAX_BATCH)
> +		return -EINVAL;
> +	cmd_size = sizeof(struct epoll_ctl_cmd) * ncmds;
> +	/* TODO: optimize for small arguments like select/poll with a stack
> +	 * allocated buffer */
> +
> +	kcmds = kmalloc(cmd_size, GFP_KERNEL);
> +	if (!kcmds)
> +		return -ENOMEM;
You probably want to define EP_MAX_BATCH as some sane value much less
than INT_MAX/(sizeof(struct epoll_ctl_cmd)). While this avoids the
integer overflow from before, any user can cause the kernel to kmalloc
up to INT_MAX bytes. Probably not a huge deal because it's freed at the
end of the syscall, but generally not a great idea.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/9] epoll: Add implementation for epoll_ctl_batch
  2015-03-10 13:59   ` Dan Rosenberg
@ 2015-03-11  2:23     ` Fam Zheng
  0 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2015-03-11  2:23 UTC (permalink / raw)
  To: Dan Rosenberg; +Cc: linux-kernel, famz

On Tue, 03/10 09:59, Dan Rosenberg wrote:
> On 03/09/2015 09:49 PM, Fam Zheng wrote:
> > +	if (!cmds || ncmds <= 0 || ncmds > EP_MAX_BATCH)
> > +		return -EINVAL;
> > +	cmd_size = sizeof(struct epoll_ctl_cmd) * ncmds;
> > +	/* TODO: optimize for small arguments like select/poll with a stack
> > +	 * allocated buffer */
> > +
> > +	kcmds = kmalloc(cmd_size, GFP_KERNEL);
> > +	if (!kcmds)
> > +		return -ENOMEM;
> You probably want to define EP_MAX_BATCH as some sane value much less
> than INT_MAX/(sizeof(struct epoll_ctl_cmd)). While this avoids the
> integer overflow from before, any user can cause the kernel to kmalloc
> up to INT_MAX bytes. Probably not a huge deal because it's freed at the
> end of the syscall, but generally not a great idea.
> 

Yeah, makes sense, any suggested value?

Thanks,
Fam

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
  2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
                   ` (8 preceding siblings ...)
  2015-03-10  1:49 ` [PATCH v4 9/9] x86: Hook up 32 bit compat epoll_pwait1 syscall Fam Zheng
@ 2015-03-12 15:02 ` Jason Baron
  2015-03-13 11:31   ` Fam Zheng
  9 siblings, 1 reply; 16+ messages in thread
From: Jason Baron @ 2015-03-12 15:02 UTC (permalink / raw)
  To: Fam Zheng, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Peter Zijlstra, linux-fsdevel, linux-api,
	Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

On 03/09/2015 09:49 PM, Fam Zheng wrote:
>
> Benchmark for epoll_pwait1
> ==========================
>
> By running fio tests inside VM with both original and modified QEMU, we can
> compare their difference in performance.
>
> With a small VM setup [t1], the original QEMU (ppoll based) has an 4k read
> latency overhead around 37 us. In this setup, the main loop polls 10~20 fds.
>
> With a slightly larger VM instance [t2] - attached a virtio-serial device so
> that there are 80~90 fds in the main loop - the original QEMU has a latency
> overhead around 49 us. By adding more such devices [t3], we can see the latency
> go even higher - 83 us with ~200 FDs.
>
> Now modify QEMU to use epoll_pwait1 and test again, the latency numbers are
> repectively 36us, 37us, 47us for t1, t2 and t3.
>
>

Hi,

So it sounds like you are comparing original qemu code (which was using
ppoll) vs. using epoll with these new syscalls. Curious if you have numbers
comparing the existing epoll (with say the timerfd in your epoll set), so
we can see the improvement relative to epoll.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
  2015-03-12 15:02 ` [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Jason Baron
@ 2015-03-13 11:31   ` Fam Zheng
  2015-03-13 14:46     ` Jason Baron
  0 siblings, 1 reply; 16+ messages in thread
From: Fam Zheng @ 2015-03-13 11:31 UTC (permalink / raw)
  To: Jason Baron
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Peter Zijlstra, linux-fsdevel, linux-api,
	Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg

On Thu, 03/12 11:02, Jason Baron wrote:
> On 03/09/2015 09:49 PM, Fam Zheng wrote:
> >
> > Benchmark for epoll_pwait1
> > ==========================
> >
> > By running fio tests inside VM with both original and modified QEMU, we can
> > compare their difference in performance.
> >
> > With a small VM setup [t1], the original QEMU (ppoll based) has an 4k read
> > latency overhead around 37 us. In this setup, the main loop polls 10~20 fds.
> >
> > With a slightly larger VM instance [t2] - attached a virtio-serial device so
> > that there are 80~90 fds in the main loop - the original QEMU has a latency
> > overhead around 49 us. By adding more such devices [t3], we can see the latency
> > go even higher - 83 us with ~200 FDs.
> >
> > Now modify QEMU to use epoll_pwait1 and test again, the latency numbers are
> > repectively 36us, 37us, 47us for t1, t2 and t3.
> >
> >
> 
> Hi,
> 
> So it sounds like you are comparing original qemu code (which was using
> ppoll) vs. using epoll with these new syscalls. Curious if you have numbers
> comparing the existing epoll (with say the timerfd in your epoll set), so
> we can see the improvement relative to epoll.

I did compare them, but they are too close to see differences. The improvements
in epoll_pwait1 doesn't really help the hot path of guest IO, but it does
affect the program timer precision, that are used in various device emulations
in QEMU.

Although it's kind of subtle and difficult to summarize here, I can give an
example in the IO throttling implementation in QEMU, to show the significance:

The throttling algorithm computes a duration for the next IO, which is used to
arm a timer in order to delay the request a bit. As timers are always rounded
*UP* to the effective granularity, the timeout being 1ms in epoll_pwait is just
too coarse and will lead to severe inaccuracy. With epoll_pwait1, we can avoid
the rounding-up.

I think this idea could be pertty generally desired by other applications, too.

Regarding the epoll_ctl_batch improvement, again, it is not going to disrupt
the numbers in the small workload I managed to test.

Of course, if you have a specific application senario in mind, I will try it. :)

Thanks,
Fam

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
  2015-03-13 11:31   ` Fam Zheng
@ 2015-03-13 14:46     ` Jason Baron
  2015-03-13 14:56       ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Baron @ 2015-03-13 14:46 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Peter Zijlstra, linux-fsdevel, linux-api,
	Josh Triplett, Michael Kerrisk (man-pages),
	Paolo Bonzini, Omar Sandoval, Jonathan Corbet, shane.seymour,
	dan.j.rosenberg


On 03/13/2015 07:31 AM, Fam Zheng wrote:
> On Thu, 03/12 11:02, Jason Baron wrote:
>> On 03/09/2015 09:49 PM, Fam Zheng wrote:
>>
>> Hi,
>>
>> So it sounds like you are comparing original qemu code (which was using
>> ppoll) vs. using epoll with these new syscalls. Curious if you have numbers
>> comparing the existing epoll (with say the timerfd in your epoll set), so
>> we can see the improvement relative to epoll.
> I did compare them, but they are too close to see differences. The improvements
> in epoll_pwait1 doesn't really help the hot path of guest IO, but it does
> affect the program timer precision, that are used in various device emulations
> in QEMU.
>
> Although it's kind of subtle and difficult to summarize here, I can give an
> example in the IO throttling implementation in QEMU, to show the significance:
>
> The throttling algorithm computes a duration for the next IO, which is used to
> arm a timer in order to delay the request a bit. As timers are always rounded
> *UP* to the effective granularity, the timeout being 1ms in epoll_pwait is just
> too coarse and will lead to severe inaccuracy. With epoll_pwait1, we can avoid
> the rounding-up.

right, but we could use the timerfd here to get the desired precision.

> I think this idea could be pertty generally desired by other applications, too.
>
> Regarding the epoll_ctl_batch improvement, again, it is not going to disrupt
> the numbers in the small workload I managed to test.
>
> Of course, if you have a specific application senario in mind, I will try it. :)

I want to understand what new functionality these syscalls offer over
what we have now. I mean we could show a micro-benchmark where
these matter, but is that enough to justify these new syscalls given that
I think we could implement library wrappers around what we have now
to do what you are proposing here.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
  2015-03-13 14:46     ` Jason Baron
@ 2015-03-13 14:56       ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2015-03-13 14:56 UTC (permalink / raw)
  To: Jason Baron, Fam Zheng
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Kees Cook, Andy Lutomirski,
	David Herrmann, Alexei Starovoitov, Miklos Szeredi,
	David Drysdale, Oleg Nesterov, David S. Miller, Vivek Goyal,
	Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Peter Zijlstra, linux-fsdevel, linux-api,
	Josh Triplett, Michael Kerrisk (man-pages),
	Omar Sandoval, Jonathan Corbet, shane.seymour, dan.j.rosenberg



On 13/03/2015 15:46, Jason Baron wrote:
> > The throttling algorithm computes a duration for the next IO, which is used to
> > arm a timer in order to delay the request a bit. As timers are always rounded
> > *UP* to the effective granularity, the timeout being 1ms in epoll_pwait is just
> > too coarse and will lead to severe inaccuracy. With epoll_pwait1, we can avoid
> > the rounding-up.
> 
> right, but we could use the timerfd here to get the desired precision.

Fam, didn't you see slowdowns with few file descriptors
epoll_ctl+epoll_wait+timerfd compared to ppoll?

Do they disappear or improve with epoll_ctl_batch and epoll_pwait1?

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-03-13 14:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
2015-03-10  1:49 ` [PATCH v4 1/9] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
2015-03-10  1:49 ` [PATCH v4 2/9] epoll: Specify clockid explicitly Fam Zheng
2015-03-10  1:49 ` [PATCH v4 3/9] epoll: Extract ep_ctl_do Fam Zheng
2015-03-10  1:49 ` [PATCH v4 4/9] epoll: Add implementation for epoll_ctl_batch Fam Zheng
2015-03-10 13:59   ` Dan Rosenberg
2015-03-11  2:23     ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 5/9] x86: Hook up epoll_ctl_batch syscall Fam Zheng
2015-03-10  1:49 ` [PATCH v4 6/9] epoll: Add implementation for epoll_pwait1 Fam Zheng
2015-03-10  1:49 ` [PATCH v4 7/9] x86: Hook up epoll_pwait1 syscall Fam Zheng
2015-03-10  1:49 ` [PATCH v4 8/9] epoll: Add compat version implementation of epoll_pwait1 Fam Zheng
2015-03-10  1:49 ` [PATCH v4 9/9] x86: Hook up 32 bit compat epoll_pwait1 syscall Fam Zheng
2015-03-12 15:02 ` [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Jason Baron
2015-03-13 11:31   ` Fam Zheng
2015-03-13 14:46     ` Jason Baron
2015-03-13 14:56       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).