All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
       [not found] <1420705550-24245-1-git-send-email-famz@redhat.com>
  2015-01-08  9:12   ` Miklos Szeredi
@ 2015-01-08  9:12   ` Miklos Szeredi
  0 siblings, 0 replies; 39+ messages in thread
From: Miklos Szeredi @ 2015-01-08  9:12 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett,
	David S. Miller, linux-fsdevel, linux-api, Paolo Bonzini,
	Stefan Hajnoczi, Michael Kerrisk (man-pages)

On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> Applications could use epoll interface when then need to poll a big number of
> files in their main loops, to achieve better performance than ppoll(2). Except
> for one concern: epoll only takes timeout parameters in microseconds, rather
> than nanoseconds.
> 
> That is a drawback we should address. For a real case in QEMU, we run into a
> scalability issue with ppoll(2) when many devices are attached to guest, in
> which case many host fds, such as virtual disk images and sockets, need to be
> polled by the main loop. As a result we are looking at switching to epoll, but
> the coarse timeout precision is a trouble, as explained below. 
> 
> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> timers in the main loop; and we call ppoll(2) with the next firing timer as
> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> handling IO events, or fire a timer callback). This is natual and efficient,
> except that ppoll(2) itself is slow.
> 
> Now that we want to switch to epoll, to speed up the polling. However the timer
> slack setting will be effectively undone, because that way we will have to
> round up the timeout to microseconds honoring timer contract. But consequently,
> this hurts the general responsiveness.
> 
> Note: there are two alternatives, without changing kernel:
> 
> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> be slow as one fd is polled. No more scalability issue. And if there are
> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> timeout=0; otherwise, there can't be events for the epoll, skip the following
> epoll_wait and just continue with other work.
> 
> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> no other events have arrived. This will inheritly give us timerfd's precision.
> Note that for each poll, the desired timeout is different because the next
> timer is different, so that, before each epoll_wait(2), there will be a
> timerfd_settime syscall to set it to a proper value.
> 
> Unfortunately, both approaches require one more syscall per iteration, compared
> to the original single ppoll(2), cost of which is unneglectable when we talk
> about nanosecond granularity.

Please consider adding a "flags" argument to the new syscall (and
returning EINVAL if non-zero).  See this article, which shows that
extended syscalls almost always want flags, and they often get it only
on the second try:

http://lwn.net/Articles/585415/

Thanks,
Miklos

P.S. stray apostrophes in To: and Cc: lines seems to be causing trouble.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08  9:12   ` Miklos Szeredi
  0 siblings, 0 replies; 39+ messages in thread
From: Miklos Szeredi @ 2015-01-08  9:12 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett,

On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> Applications could use epoll interface when then need to poll a big number of
> files in their main loops, to achieve better performance than ppoll(2). Except
> for one concern: epoll only takes timeout parameters in microseconds, rather
> than nanoseconds.
> 
> That is a drawback we should address. For a real case in QEMU, we run into a
> scalability issue with ppoll(2) when many devices are attached to guest, in
> which case many host fds, such as virtual disk images and sockets, need to be
> polled by the main loop. As a result we are looking at switching to epoll, but
> the coarse timeout precision is a trouble, as explained below. 
> 
> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> timers in the main loop; and we call ppoll(2) with the next firing timer as
> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> handling IO events, or fire a timer callback). This is natual and efficient,
> except that ppoll(2) itself is slow.
> 
> Now that we want to switch to epoll, to speed up the polling. However the timer
> slack setting will be effectively undone, because that way we will have to
> round up the timeout to microseconds honoring timer contract. But consequently,
> this hurts the general responsiveness.
> 
> Note: there are two alternatives, without changing kernel:
> 
> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> be slow as one fd is polled. No more scalability issue. And if there are
> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> timeout=0; otherwise, there can't be events for the epoll, skip the following
> epoll_wait and just continue with other work.
> 
> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> no other events have arrived. This will inheritly give us timerfd's precision.
> Note that for each poll, the desired timeout is different because the next
> timer is different, so that, before each epoll_wait(2), there will be a
> timerfd_settime syscall to set it to a proper value.
> 
> Unfortunately, both approaches require one more syscall per iteration, compared
> to the original single ppoll(2), cost of which is unneglectable when we talk
> about nanosecond granularity.

Please consider adding a "flags" argument to the new syscall (and
returning EINVAL if non-zero).  See this article, which shows that
extended syscalls almost always want flags, and they often get it only
on the second try:

http://lwn.net/Articles/585415/

Thanks,
Miklos

P.S. stray apostrophes in To: and Cc: lines seems to be causing trouble.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08  9:12   ` Miklos Szeredi
  0 siblings, 0 replies; 39+ messages in thread
From: Miklos Szeredi @ 2015-01-08  9:12 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett

On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> Applications could use epoll interface when then need to poll a big number of
> files in their main loops, to achieve better performance than ppoll(2). Except
> for one concern: epoll only takes timeout parameters in microseconds, rather
> than nanoseconds.
> 
> That is a drawback we should address. For a real case in QEMU, we run into a
> scalability issue with ppoll(2) when many devices are attached to guest, in
> which case many host fds, such as virtual disk images and sockets, need to be
> polled by the main loop. As a result we are looking at switching to epoll, but
> the coarse timeout precision is a trouble, as explained below. 
> 
> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> timers in the main loop; and we call ppoll(2) with the next firing timer as
> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> handling IO events, or fire a timer callback). This is natual and efficient,
> except that ppoll(2) itself is slow.
> 
> Now that we want to switch to epoll, to speed up the polling. However the timer
> slack setting will be effectively undone, because that way we will have to
> round up the timeout to microseconds honoring timer contract. But consequently,
> this hurts the general responsiveness.
> 
> Note: there are two alternatives, without changing kernel:
> 
> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> be slow as one fd is polled. No more scalability issue. And if there are
> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> timeout=0; otherwise, there can't be events for the epoll, skip the following
> epoll_wait and just continue with other work.
> 
> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> no other events have arrived. This will inheritly give us timerfd's precision.
> Note that for each poll, the desired timeout is different because the next
> timer is different, so that, before each epoll_wait(2), there will be a
> timerfd_settime syscall to set it to a proper value.
> 
> Unfortunately, both approaches require one more syscall per iteration, compared
> to the original single ppoll(2), cost of which is unneglectable when we talk
> about nanosecond granularity.

Please consider adding a "flags" argument to the new syscall (and
returning EINVAL if non-zero).  See this article, which shows that
extended syscalls almost always want flags, and they often get it only
on the second try:

http://lwn.net/Articles/585415/

Thanks,
Miklos

P.S. stray apostrophes in To: and Cc: lines seems to be causing trouble.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 11:07     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-08 11:07 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Fam Zheng, lkml, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett,
	David S. Miller, linux-fsdevel, Linux API, Paolo Bonzini,
	Stefan Hajnoczi

On 8 January 2015 at 10:12, Miklos Szeredi <mszeredi@suse.cz> wrote:
> On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
>> Applications could use epoll interface when then need to poll a big number of
>> files in their main loops, to achieve better performance than ppoll(2). Except
>> for one concern: epoll only takes timeout parameters in microseconds, rather
>> than nanoseconds.
>>
>> That is a drawback we should address. For a real case in QEMU, we run into a
>> scalability issue with ppoll(2) when many devices are attached to guest, in
>> which case many host fds, such as virtual disk images and sockets, need to be
>> polled by the main loop. As a result we are looking at switching to epoll, but
>> the coarse timeout precision is a trouble, as explained below.
>>
>> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
>> timers in the main loop; and we call ppoll(2) with the next firing timer as
>> timeout, so when ppoll(2) returns, we know that we have more work to do (either
>> handling IO events, or fire a timer callback). This is natual and efficient,
>> except that ppoll(2) itself is slow.
>>
>> Now that we want to switch to epoll, to speed up the polling. However the timer
>> slack setting will be effectively undone, because that way we will have to
>> round up the timeout to microseconds honoring timer contract. But consequently,
>> this hurts the general responsiveness.
>>
>> Note: there are two alternatives, without changing kernel:
>>
>> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
>> be slow as one fd is polled. No more scalability issue. And if there are
>> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
>> timeout=0; otherwise, there can't be events for the epoll, skip the following
>> epoll_wait and just continue with other work.
>>
>> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
>> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
>> no other events have arrived. This will inheritly give us timerfd's precision.
>> Note that for each poll, the desired timeout is different because the next
>> timer is different, so that, before each epoll_wait(2), there will be a
>> timerfd_settime syscall to set it to a proper value.
>>
>> Unfortunately, both approaches require one more syscall per iteration, compared
>> to the original single ppoll(2), cost of which is unneglectable when we talk
>> about nanosecond granularity.
>
> Please consider adding a "flags" argument to the new syscall (and
> returning EINVAL if non-zero).  See this article, which shows that
> extended syscalls almost always want flags, and they often get it only
> on the second try:
>
> http://lwn.net/Articles/585415/

Yes, please ensure that the new call gets a flags argument. We've made
this mistake way too many times in the past.

Thanks,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 11:07     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-08 11:07 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Fam Zheng, lkml, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86-DgEjT+Ai2ygdnm+yROfE0A, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick

On 8 January 2015 at 10:12, Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> wrote:
> On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
>> Applications could use epoll interface when then need to poll a big number of
>> files in their main loops, to achieve better performance than ppoll(2). Except
>> for one concern: epoll only takes timeout parameters in microseconds, rather
>> than nanoseconds.
>>
>> That is a drawback we should address. For a real case in QEMU, we run into a
>> scalability issue with ppoll(2) when many devices are attached to guest, in
>> which case many host fds, such as virtual disk images and sockets, need to be
>> polled by the main loop. As a result we are looking at switching to epoll, but
>> the coarse timeout precision is a trouble, as explained below.
>>
>> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
>> timers in the main loop; and we call ppoll(2) with the next firing timer as
>> timeout, so when ppoll(2) returns, we know that we have more work to do (either
>> handling IO events, or fire a timer callback). This is natual and efficient,
>> except that ppoll(2) itself is slow.
>>
>> Now that we want to switch to epoll, to speed up the polling. However the timer
>> slack setting will be effectively undone, because that way we will have to
>> round up the timeout to microseconds honoring timer contract. But consequently,
>> this hurts the general responsiveness.
>>
>> Note: there are two alternatives, without changing kernel:
>>
>> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
>> be slow as one fd is polled. No more scalability issue. And if there are
>> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
>> timeout=0; otherwise, there can't be events for the epoll, skip the following
>> epoll_wait and just continue with other work.
>>
>> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
>> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
>> no other events have arrived. This will inheritly give us timerfd's precision.
>> Note that for each poll, the desired timeout is different because the next
>> timer is different, so that, before each epoll_wait(2), there will be a
>> timerfd_settime syscall to set it to a proper value.
>>
>> Unfortunately, both approaches require one more syscall per iteration, compared
>> to the original single ppoll(2), cost of which is unneglectable when we talk
>> about nanosecond granularity.
>
> Please consider adding a "flags" argument to the new syscall (and
> returning EINVAL if non-zero).  See this article, which shows that
> extended syscalls almost always want flags, and they often get it only
> on the second try:
>
> http://lwn.net/Articles/585415/

Yes, please ensure that the new call gets a flags argument. We've made
this mistake way too many times in the past.

Thanks,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 11:07     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-08 11:07 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Fam Zheng, lkml, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86-DgEjT+Ai2ygdnm+yROfE0A, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers

On 8 January 2015 at 10:12, Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> wrote:
> On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
>> Applications could use epoll interface when then need to poll a big number of
>> files in their main loops, to achieve better performance than ppoll(2). Except
>> for one concern: epoll only takes timeout parameters in microseconds, rather
>> than nanoseconds.
>>
>> That is a drawback we should address. For a real case in QEMU, we run into a
>> scalability issue with ppoll(2) when many devices are attached to guest, in
>> which case many host fds, such as virtual disk images and sockets, need to be
>> polled by the main loop. As a result we are looking at switching to epoll, but
>> the coarse timeout precision is a trouble, as explained below.
>>
>> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
>> timers in the main loop; and we call ppoll(2) with the next firing timer as
>> timeout, so when ppoll(2) returns, we know that we have more work to do (either
>> handling IO events, or fire a timer callback). This is natual and efficient,
>> except that ppoll(2) itself is slow.
>>
>> Now that we want to switch to epoll, to speed up the polling. However the timer
>> slack setting will be effectively undone, because that way we will have to
>> round up the timeout to microseconds honoring timer contract. But consequently,
>> this hurts the general responsiveness.
>>
>> Note: there are two alternatives, without changing kernel:
>>
>> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
>> be slow as one fd is polled. No more scalability issue. And if there are
>> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
>> timeout=0; otherwise, there can't be events for the epoll, skip the following
>> epoll_wait and just continue with other work.
>>
>> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
>> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
>> no other events have arrived. This will inheritly give us timerfd's precision.
>> Note that for each poll, the desired timeout is different because the next
>> timer is different, so that, before each epoll_wait(2), there will be a
>> timerfd_settime syscall to set it to a proper value.
>>
>> Unfortunately, both approaches require one more syscall per iteration, compared
>> to the original single ppoll(2), cost of which is unneglectable when we talk
>> about nanosecond granularity.
>
> Please consider adding a "flags" argument to the new syscall (and
> returning EINVAL if non-zero).  See this article, which shows that
> extended syscalls almost always want flags, and they often get it only
> on the second try:
>
> http://lwn.net/Articles/585415/

Yes, please ensure that the new call gets a flags argument. We've made
this mistake way too many times in the past.

Thanks,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
  2015-01-08  9:12   ` Miklos Szeredi
@ 2015-01-08 17:57     ` Andy Lutomirski
  -1 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2015-01-08 17:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Fam Zheng, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett,
	David S. Miller, Linux FS Devel, Linux API, Paolo Bonzini,
	Stefan Hajnoczi, Michael Kerrisk (man-pages)

On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi@suse.cz> wrote:
> On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
>> Applications could use epoll interface when then need to poll a big number of
>> files in their main loops, to achieve better performance than ppoll(2). Except
>> for one concern: epoll only takes timeout parameters in microseconds, rather
>> than nanoseconds.
>>
>> That is a drawback we should address. For a real case in QEMU, we run into a
>> scalability issue with ppoll(2) when many devices are attached to guest, in
>> which case many host fds, such as virtual disk images and sockets, need to be
>> polled by the main loop. As a result we are looking at switching to epoll, but
>> the coarse timeout precision is a trouble, as explained below.
>>
>> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
>> timers in the main loop; and we call ppoll(2) with the next firing timer as
>> timeout, so when ppoll(2) returns, we know that we have more work to do (either
>> handling IO events, or fire a timer callback). This is natual and efficient,
>> except that ppoll(2) itself is slow.
>>
>> Now that we want to switch to epoll, to speed up the polling. However the timer
>> slack setting will be effectively undone, because that way we will have to
>> round up the timeout to microseconds honoring timer contract. But consequently,
>> this hurts the general responsiveness.
>>
>> Note: there are two alternatives, without changing kernel:
>>
>> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
>> be slow as one fd is polled. No more scalability issue. And if there are
>> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
>> timeout=0; otherwise, there can't be events for the epoll, skip the following
>> epoll_wait and just continue with other work.
>>
>> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
>> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
>> no other events have arrived. This will inheritly give us timerfd's precision.
>> Note that for each poll, the desired timeout is different because the next
>> timer is different, so that, before each epoll_wait(2), there will be a
>> timerfd_settime syscall to set it to a proper value.
>>
>> Unfortunately, both approaches require one more syscall per iteration, compared
>> to the original single ppoll(2), cost of which is unneglectable when we talk
>> about nanosecond granularity.

I'd like to see a more ambitious change, since the timer isn't the
only problem like this.  Specifically, I'd like a syscall that does a
list of epoll-related things and then waits.  The list of things could
include, at least:

 - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
want to turn on and off their requests for events on a somewhat
regular basis.

 - timerfd_settime actions: this allows a single syscall to wait and
adjust *both* monotonic and real-time wakeups.

Would this make sense?  It could look like:

int epoll_mod_and_pwait(int epfd,
  struct epoll_event *events, int maxevents,
  struct epoll_command *commands, int ncommands,
  const sigset_t *sigmask);

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 17:57     ` Andy Lutomirski
  0 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2015-01-08 17:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Fam Zheng, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fab

On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi@suse.cz> wrote:
> On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
>> Applications could use epoll interface when then need to poll a big number of
>> files in their main loops, to achieve better performance than ppoll(2). Except
>> for one concern: epoll only takes timeout parameters in microseconds, rather
>> than nanoseconds.
>>
>> That is a drawback we should address. For a real case in QEMU, we run into a
>> scalability issue with ppoll(2) when many devices are attached to guest, in
>> which case many host fds, such as virtual disk images and sockets, need to be
>> polled by the main loop. As a result we are looking at switching to epoll, but
>> the coarse timeout precision is a trouble, as explained below.
>>
>> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
>> timers in the main loop; and we call ppoll(2) with the next firing timer as
>> timeout, so when ppoll(2) returns, we know that we have more work to do (either
>> handling IO events, or fire a timer callback). This is natual and efficient,
>> except that ppoll(2) itself is slow.
>>
>> Now that we want to switch to epoll, to speed up the polling. However the timer
>> slack setting will be effectively undone, because that way we will have to
>> round up the timeout to microseconds honoring timer contract. But consequently,
>> this hurts the general responsiveness.
>>
>> Note: there are two alternatives, without changing kernel:
>>
>> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
>> be slow as one fd is polled. No more scalability issue. And if there are
>> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
>> timeout=0; otherwise, there can't be events for the epoll, skip the following
>> epoll_wait and just continue with other work.
>>
>> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
>> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
>> no other events have arrived. This will inheritly give us timerfd's precision.
>> Note that for each poll, the desired timeout is different because the next
>> timer is different, so that, before each epoll_wait(2), there will be a
>> timerfd_settime syscall to set it to a proper value.
>>
>> Unfortunately, both approaches require one more syscall per iteration, compared
>> to the original single ppoll(2), cost of which is unneglectable when we talk
>> about nanosecond granularity.

I'd like to see a more ambitious change, since the timer isn't the
only problem like this.  Specifically, I'd like a syscall that does a
list of epoll-related things and then waits.  The list of things could
include, at least:

 - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
want to turn on and off their requests for events on a somewhat
regular basis.

 - timerfd_settime actions: this allows a single syscall to wait and
adjust *both* monotonic and real-time wakeups.

Would this make sense?  It could look like:

int epoll_mod_and_pwait(int epfd,
  struct epoll_event *events, int maxevents,
  struct epoll_command *commands, int ncommands,
  const sigset_t *sigmask);

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 18:42       ` josh-iaAMLnmF4UmaiuxdJuQwMA
  0 siblings, 0 replies; 39+ messages in thread
From: josh @ 2015-01-08 18:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, Fam Zheng, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, David S. Miller,
	Linux FS Devel, Linux API, Paolo Bonzini, Stefan Hajnoczi,
	Michael Kerrisk (man-pages)

On Thu, Jan 08, 2015 at 09:57:24AM -0800, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi@suse.cz> wrote:
> > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> >> Applications could use epoll interface when then need to poll a big number of
> >> files in their main loops, to achieve better performance than ppoll(2). Except
> >> for one concern: epoll only takes timeout parameters in microseconds, rather
> >> than nanoseconds.
> >>
> >> That is a drawback we should address. For a real case in QEMU, we run into a
> >> scalability issue with ppoll(2) when many devices are attached to guest, in
> >> which case many host fds, such as virtual disk images and sockets, need to be
> >> polled by the main loop. As a result we are looking at switching to epoll, but
> >> the coarse timeout precision is a trouble, as explained below.
> >>
> >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> >> timers in the main loop; and we call ppoll(2) with the next firing timer as
> >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> >> handling IO events, or fire a timer callback). This is natual and efficient,
> >> except that ppoll(2) itself is slow.
> >>
> >> Now that we want to switch to epoll, to speed up the polling. However the timer
> >> slack setting will be effectively undone, because that way we will have to
> >> round up the timeout to microseconds honoring timer contract. But consequently,
> >> this hurts the general responsiveness.
> >>
> >> Note: there are two alternatives, without changing kernel:
> >>
> >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> >> be slow as one fd is polled. No more scalability issue. And if there are
> >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> >> timeout=0; otherwise, there can't be events for the epoll, skip the following
> >> epoll_wait and just continue with other work.
> >>
> >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> >> no other events have arrived. This will inheritly give us timerfd's precision.
> >> Note that for each poll, the desired timeout is different because the next
> >> timer is different, so that, before each epoll_wait(2), there will be a
> >> timerfd_settime syscall to set it to a proper value.
> >>
> >> Unfortunately, both approaches require one more syscall per iteration, compared
> >> to the original single ppoll(2), cost of which is unneglectable when we talk
> >> about nanosecond granularity.
> 
> I'd like to see a more ambitious change, since the timer isn't the
> only problem like this.  Specifically, I'd like a syscall that does a
> list of epoll-related things and then waits.  The list of things could
> include, at least:
> 
>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> want to turn on and off their requests for events on a somewhat
> regular basis.
> 
>  - timerfd_settime actions: this allows a single syscall to wait and
> adjust *both* monotonic and real-time wakeups.
> 
> Would this make sense?  It could look like:
> 
> int epoll_mod_and_pwait(int epfd,
>   struct epoll_event *events, int maxevents,
>   struct epoll_command *commands, int ncommands,
>   const sigset_t *sigmask);

That's a complicated syscall.  (And it also doesn't have room for the
flags argument.)

At that point, why not just have a syscall like this:

struct syscall {
	unsigned long num;
	unsigned long params[6];
};

int sys_many(size_t count, struct syscall *syscalls, int *results, unsigned long flags);

I think that has been discussed in the past.

Or, these days, that might be better done via eBPF, which would avoid
the need for flags like "return on error"; an eBPF program could decide
how to proceed after each call.

- Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 18:42       ` josh-iaAMLnmF4UmaiuxdJuQwMA
  0 siblings, 0 replies; 39+ messages in thread
From: josh-iaAMLnmF4UmaiuxdJuQwMA @ 2015-01-08 18:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, Fam Zheng, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers

On Thu, Jan 08, 2015 at 09:57:24AM -0800, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> wrote:
> > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> >> Applications could use epoll interface when then need to poll a big number of
> >> files in their main loops, to achieve better performance than ppoll(2). Except
> >> for one concern: epoll only takes timeout parameters in microseconds, rather
> >> than nanoseconds.
> >>
> >> That is a drawback we should address. For a real case in QEMU, we run into a
> >> scalability issue with ppoll(2) when many devices are attached to guest, in
> >> which case many host fds, such as virtual disk images and sockets, need to be
> >> polled by the main loop. As a result we are looking at switching to epoll, but
> >> the coarse timeout precision is a trouble, as explained below.
> >>
> >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> >> timers in the main loop; and we call ppoll(2) with the next firing timer as
> >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> >> handling IO events, or fire a timer callback). This is natual and efficient,
> >> except that ppoll(2) itself is slow.
> >>
> >> Now that we want to switch to epoll, to speed up the polling. However the timer
> >> slack setting will be effectively undone, because that way we will have to
> >> round up the timeout to microseconds honoring timer contract. But consequently,
> >> this hurts the general responsiveness.
> >>
> >> Note: there are two alternatives, without changing kernel:
> >>
> >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> >> be slow as one fd is polled. No more scalability issue. And if there are
> >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> >> timeout=0; otherwise, there can't be events for the epoll, skip the following
> >> epoll_wait and just continue with other work.
> >>
> >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> >> no other events have arrived. This will inheritly give us timerfd's precision.
> >> Note that for each poll, the desired timeout is different because the next
> >> timer is different, so that, before each epoll_wait(2), there will be a
> >> timerfd_settime syscall to set it to a proper value.
> >>
> >> Unfortunately, both approaches require one more syscall per iteration, compared
> >> to the original single ppoll(2), cost of which is unneglectable when we talk
> >> about nanosecond granularity.
> 
> I'd like to see a more ambitious change, since the timer isn't the
> only problem like this.  Specifically, I'd like a syscall that does a
> list of epoll-related things and then waits.  The list of things could
> include, at least:
> 
>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> want to turn on and off their requests for events on a somewhat
> regular basis.
> 
>  - timerfd_settime actions: this allows a single syscall to wait and
> adjust *both* monotonic and real-time wakeups.
> 
> Would this make sense?  It could look like:
> 
> int epoll_mod_and_pwait(int epfd,
>   struct epoll_event *events, int maxevents,
>   struct epoll_command *commands, int ncommands,
>   const sigset_t *sigmask);

That's a complicated syscall.  (And it also doesn't have room for the
flags argument.)

At that point, why not just have a syscall like this:

struct syscall {
	unsigned long num;
	unsigned long params[6];
};

int sys_many(size_t count, struct syscall *syscalls, int *results, unsigned long flags);

I think that has been discussed in the past.

Or, these days, that might be better done via eBPF, which would avoid
the need for flags like "return on error"; an eBPF program could decide
how to proceed after each call.

- Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 18:42       ` josh-iaAMLnmF4UmaiuxdJuQwMA
  0 siblings, 0 replies; 39+ messages in thread
From: josh-iaAMLnmF4UmaiuxdJuQwMA @ 2015-01-08 18:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, Fam Zheng, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers

On Thu, Jan 08, 2015 at 09:57:24AM -0800, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> wrote:
> > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> >> Applications could use epoll interface when then need to poll a big number of
> >> files in their main loops, to achieve better performance than ppoll(2). Except
> >> for one concern: epoll only takes timeout parameters in microseconds, rather
> >> than nanoseconds.
> >>
> >> That is a drawback we should address. For a real case in QEMU, we run into a
> >> scalability issue with ppoll(2) when many devices are attached to guest, in
> >> which case many host fds, such as virtual disk images and sockets, need to be
> >> polled by the main loop. As a result we are looking at switching to epoll, but
> >> the coarse timeout precision is a trouble, as explained below.
> >>
> >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> >> timers in the main loop; and we call ppoll(2) with the next firing timer as
> >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> >> handling IO events, or fire a timer callback). This is natual and efficient,
> >> except that ppoll(2) itself is slow.
> >>
> >> Now that we want to switch to epoll, to speed up the polling. However the timer
> >> slack setting will be effectively undone, because that way we will have to
> >> round up the timeout to microseconds honoring timer contract. But consequently,
> >> this hurts the general responsiveness.
> >>
> >> Note: there are two alternatives, without changing kernel:
> >>
> >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> >> be slow as one fd is polled. No more scalability issue. And if there are
> >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> >> timeout=0; otherwise, there can't be events for the epoll, skip the following
> >> epoll_wait and just continue with other work.
> >>
> >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> >> no other events have arrived. This will inheritly give us timerfd's precision.
> >> Note that for each poll, the desired timeout is different because the next
> >> timer is different, so that, before each epoll_wait(2), there will be a
> >> timerfd_settime syscall to set it to a proper value.
> >>
> >> Unfortunately, both approaches require one more syscall per iteration, compared
> >> to the original single ppoll(2), cost of which is unneglectable when we talk
> >> about nanosecond granularity.
> 
> I'd like to see a more ambitious change, since the timer isn't the
> only problem like this.  Specifically, I'd like a syscall that does a
> list of epoll-related things and then waits.  The list of things could
> include, at least:
> 
>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> want to turn on and off their requests for events on a somewhat
> regular basis.
> 
>  - timerfd_settime actions: this allows a single syscall to wait and
> adjust *both* monotonic and real-time wakeups.
> 
> Would this make sense?  It could look like:
> 
> int epoll_mod_and_pwait(int epfd,
>   struct epoll_event *events, int maxevents,
>   struct epoll_command *commands, int ncommands,
>   const sigset_t *sigmask);

That's a complicated syscall.  (And it also doesn't have room for the
flags argument.)

At that point, why not just have a syscall like this:

struct syscall {
	unsigned long num;
	unsigned long params[6];
};

int sys_many(size_t count, struct syscall *syscalls, int *results, unsigned long flags);

I think that has been discussed in the past.

Or, these days, that might be better done via eBPF, which would avoid
the need for flags like "return on error"; an eBPF program could decide
how to proceed after each call.

- Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
  2015-01-08 18:42       ` josh-iaAMLnmF4UmaiuxdJuQwMA
  (?)
@ 2015-01-08 19:31         ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2015-01-08 19:31 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Miklos Szeredi, Fam Zheng, linux-kernel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, David S. Miller,
	Linux FS Devel, Linux API, Paolo Bonzini, Stefan Hajnoczi,
	Michael Kerrisk (man-pages)

On Thu, Jan 8, 2015 at 10:42 AM,  <josh@joshtriplett.org> wrote:
>> I'd like to see a more ambitious change, since the timer isn't the
>> only problem like this.  Specifically, I'd like a syscall that does a
>> list of epoll-related things and then waits.  The list of things could
>> include, at least:
>>
>>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
>> want to turn on and off their requests for events on a somewhat
>> regular basis.
>>
>>  - timerfd_settime actions: this allows a single syscall to wait and
>> adjust *both* monotonic and real-time wakeups.
>>
>> Would this make sense?  It could look like:
>>
>> int epoll_mod_and_pwait(int epfd,
>>   struct epoll_event *events, int maxevents,
>>   struct epoll_command *commands, int ncommands,
>>   const sigset_t *sigmask);
>
> That's a complicated syscall.  (And it also doesn't have room for the
> flags argument.)
>
> At that point, why not just have a syscall like this:
>
> struct syscall {
>         unsigned long num;
>         unsigned long params[6];
> };
>
> int sys_many(size_t count, struct syscall *syscalls, int *results, unsigned long flags);
>
> I think that has been discussed in the past.
>
> Or, these days, that might be better done via eBPF, which would avoid
> the need for flags like "return on error"; an eBPF program could decide
> how to proceed after each call.

I'm afraid that will break seccomp or will make it much more
complicated.
Also I think syscall latency with the latest improvements is
actually quite fast, so chaining of syscalls is probably an overkill.
Same goes to Andy's argument of doing immediate CTL_MOD.
Let user space do it. It makes sense to combine things only
when atomicity is needed. Here it's not the case.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 19:31         ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2015-01-08 19:31 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Miklos Szeredi, Fam Zheng, linux-kernel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers

On Thu, Jan 8, 2015 at 10:42 AM,  <josh@joshtriplett.org> wrote:
>> I'd like to see a more ambitious change, since the timer isn't the
>> only problem like this.  Specifically, I'd like a syscall that does a
>> list of epoll-related things and then waits.  The list of things could
>> include, at least:
>>
>>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
>> want to turn on and off their requests for events on a somewhat
>> regular basis.
>>
>>  - timerfd_settime actions: this allows a single syscall to wait and
>> adjust *both* monotonic and real-time wakeups.
>>
>> Would this make sense?  It could look like:
>>
>> int epoll_mod_and_pwait(int epfd,
>>   struct epoll_event *events, int maxevents,
>>   struct epoll_command *commands, int ncommands,
>>   const sigset_t *sigmask);
>
> That's a complicated syscall.  (And it also doesn't have room for the
> flags argument.)
>
> At that point, why not just have a syscall like this:
>
> struct syscall {
>         unsigned long num;
>         unsigned long params[6];
> };
>
> int sys_many(size_t count, struct syscall *syscalls, int *results, unsigned long flags);
>
> I think that has been discussed in the past.
>
> Or, these days, that might be better done via eBPF, which would avoid
> the need for flags like "return on error"; an eBPF program could decide
> how to proceed after each call.

I'm afraid that will break seccomp or will make it much more
complicated.
Also I think syscall latency with the latest improvements is
actually quite fast, so chaining of syscalls is probably an overkill.
Same goes to Andy's argument of doing immediate CTL_MOD.
Let user space do it. It makes sense to combine things only
when atomicity is needed. Here it's not the case.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 19:31         ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2015-01-08 19:31 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Miklos Szeredi, Fam Zheng, linux-kernel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov

On Thu, Jan 8, 2015 at 10:42 AM,  <josh@joshtriplett.org> wrote:
>> I'd like to see a more ambitious change, since the timer isn't the
>> only problem like this.  Specifically, I'd like a syscall that does a
>> list of epoll-related things and then waits.  The list of things could
>> include, at least:
>>
>>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
>> want to turn on and off their requests for events on a somewhat
>> regular basis.
>>
>>  - timerfd_settime actions: this allows a single syscall to wait and
>> adjust *both* monotonic and real-time wakeups.
>>
>> Would this make sense?  It could look like:
>>
>> int epoll_mod_and_pwait(int epfd,
>>   struct epoll_event *events, int maxevents,
>>   struct epoll_command *commands, int ncommands,
>>   const sigset_t *sigmask);
>
> That's a complicated syscall.  (And it also doesn't have room for the
> flags argument.)
>
> At that point, why not just have a syscall like this:
>
> struct syscall {
>         unsigned long num;
>         unsigned long params[6];
> };
>
> int sys_many(size_t count, struct syscall *syscalls, int *results, unsigned long flags);
>
> I think that has been discussed in the past.
>
> Or, these days, that might be better done via eBPF, which would avoid
> the need for flags like "return on error"; an eBPF program could decide
> how to proceed after each call.

I'm afraid that will break seccomp or will make it much more
complicated.
Also I think syscall latency with the latest improvements is
actually quite fast, so chaining of syscalls is probably an overkill.
Same goes to Andy's argument of doing immediate CTL_MOD.
Let user space do it. It makes sense to combine things only
when atomicity is needed. Here it's not the case.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
  2015-01-08 18:42       ` josh-iaAMLnmF4UmaiuxdJuQwMA
  (?)
@ 2015-01-08 19:42         ` Andy Lutomirski
  -1 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2015-01-08 19:42 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Michael Kerrisk, Mathieu Desnoyers, Zach Brown, Fam Zheng,
	Vivek Goyal, Ingo Molnar, linux-kernel, Alexander Viro,
	Mike Frysinger, Linux FS Devel, Alexei Starovoitov,
	Rasmus Villemoes, Stefan Hajnoczi, Dario Faggioli, Paolo Bonzini,
	Juri Lelli, X86 ML, Oleg Nesterov, Heiko Carstens,
	H. Peter Anvin, Andrew Morton, Linux API, David S. Miller,
	Fabian Frederick, Thomas Gleixner, Peter Zijlstra,
	Miklos Szeredi, Theodore Ts'o, David Herrmann,
	David Drysdale, Kees Cook

On Jan 8, 2015 10:42 AM, <josh@joshtriplett.org> wrote:
>
> On Thu, Jan 08, 2015 at 09:57:24AM -0800, Andy Lutomirski wrote:
> > On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi@suse.cz> wrote:
> > > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> > >> Applications could use epoll interface when then need to poll a big number of
> > >> files in their main loops, to achieve better performance than ppoll(2). Except
> > >> for one concern: epoll only takes timeout parameters in microseconds, rather
> > >> than nanoseconds.
> > >>
> > >> That is a drawback we should address. For a real case in QEMU, we run into a
> > >> scalability issue with ppoll(2) when many devices are attached to guest, in
> > >> which case many host fds, such as virtual disk images and sockets, need to be
> > >> polled by the main loop. As a result we are looking at switching to epoll, but
> > >> the coarse timeout precision is a trouble, as explained below.
> > >>
> > >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> > >> timers in the main loop; and we call ppoll(2) with the next firing timer as
> > >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> > >> handling IO events, or fire a timer callback). This is natual and efficient,
> > >> except that ppoll(2) itself is slow.
> > >>
> > >> Now that we want to switch to epoll, to speed up the polling. However the timer
> > >> slack setting will be effectively undone, because that way we will have to
> > >> round up the timeout to microseconds honoring timer contract. But consequently,
> > >> this hurts the general responsiveness.
> > >>
> > >> Note: there are two alternatives, without changing kernel:
> > >>
> > >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> > >> be slow as one fd is polled. No more scalability issue. And if there are
> > >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> > >> timeout=0; otherwise, there can't be events for the epoll, skip the following
> > >> epoll_wait and just continue with other work.
> > >>
> > >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> > >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> > >> no other events have arrived. This will inheritly give us timerfd's precision.
> > >> Note that for each poll, the desired timeout is different because the next
> > >> timer is different, so that, before each epoll_wait(2), there will be a
> > >> timerfd_settime syscall to set it to a proper value.
> > >>
> > >> Unfortunately, both approaches require one more syscall per iteration, compared
> > >> to the original single ppoll(2), cost of which is unneglectable when we talk
> > >> about nanosecond granularity.
> >
> > I'd like to see a more ambitious change, since the timer isn't the
> > only problem like this.  Specifically, I'd like a syscall that does a
> > list of epoll-related things and then waits.  The list of things could
> > include, at least:
> >
> >  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > want to turn on and off their requests for events on a somewhat
> > regular basis.
> >
> >  - timerfd_settime actions: this allows a single syscall to wait and
> > adjust *both* monotonic and real-time wakeups.
> >
> > Would this make sense?  It could look like:
> >
> > int epoll_mod_and_pwait(int epfd,
> >   struct epoll_event *events, int maxevents,
> >   struct epoll_command *commands, int ncommands,
> >   const sigset_t *sigmask);
>
> That's a complicated syscall.  (And it also doesn't have room for the
> flags argument.)
>
> At that point, why not just have a syscall like this:
>
> struct syscall {
>         unsigned long num;
>         unsigned long params[6];
> };

Too minimize locking and such.  The user / kernel transitions are very
fast in some configurations, and I suspect that all the fd lookups,
epoll data structure locks, etc are important, too.

This particular pattern (EPOLL_CTL_MOD, then timerfd_settime, then
epoll_pwait) is so common that optimizing it from three syscalls to
one (as opposed to two as in your patch) seems like it could be
worthwhile.

syscall entry + exit latency is pretty good (~180 cycles total, maybe)
these days, but only in some configurations.  In others, it blows up
pretty badly.  I'm slowly working on improving that.

Also, simplicity is a win. The multi-syscall thing has never gone
smoothly, and, as Alexei pointed out, it's a big mess for seccomp.

>
> int sys_many(size_t count, struct syscall *syscalls, int *results, unsigned long flags);
>
> I think that has been discussed in the past.
>
> Or, these days, that might be better done via eBPF, which would avoid
> the need for flags like "return on error"; an eBPF program could decide
> how to proceed after each call.
>
> - Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 19:42         ` Andy Lutomirski
  0 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2015-01-08 19:42 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Michael Kerrisk, Mathieu Desnoyers, Zach Brown, Fam Zheng,
	Vivek Goyal, Ingo Molnar, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Alexander Viro, Mike Frysinger, Linux FS Devel,
	Alexei Starovoitov, Rasmus Villemoes, Stefan Hajnoczi,
	Dario Faggioli, Paolo Bonzini, Juri Lelli, X86 ML, Oleg Nesterov,
	Heiko Carstens, H. Peter Anvin, Andrew Morton, Linux API,
	David S. Miller

On Jan 8, 2015 10:42 AM, <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org> wrote:
>
> On Thu, Jan 08, 2015 at 09:57:24AM -0800, Andy Lutomirski wrote:
> > On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> wrote:
> > > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> > >> Applications could use epoll interface when then need to poll a big number of
> > >> files in their main loops, to achieve better performance than ppoll(2). Except
> > >> for one concern: epoll only takes timeout parameters in microseconds, rather
> > >> than nanoseconds.
> > >>
> > >> That is a drawback we should address. For a real case in QEMU, we run into a
> > >> scalability issue with ppoll(2) when many devices are attached to guest, in
> > >> which case many host fds, such as virtual disk images and sockets, need to be
> > >> polled by the main loop. As a result we are looking at switching to epoll, but
> > >> the coarse timeout precision is a trouble, as explained below.
> > >>
> > >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> > >> timers in the main loop; and we call ppoll(2) with the next firing timer as
> > >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> > >> handling IO events, or fire a timer callback). This is natual and efficient,
> > >> except that ppoll(2) itself is slow.
> > >>
> > >> Now that we want to switch to epoll, to speed up the polling. However the timer
> > >> slack setting will be effectively undone, because that way we will have to
> > >> round up the timeout to microseconds honoring timer contract. But consequently,
> > >> this hurts the general responsiveness.
> > >>
> > >> Note: there are two alternatives, without changing kernel:
> > >>
> > >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> > >> be slow as one fd is polled. No more scalability issue. And if there are
> > >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> > >> timeout=0; otherwise, there can't be events for the epoll, skip the following
> > >> epoll_wait and just continue with other work.
> > >>
> > >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> > >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> > >> no other events have arrived. This will inheritly give us timerfd's precision.
> > >> Note that for each poll, the desired timeout is different because the next
> > >> timer is different, so that, before each epoll_wait(2), there will be a
> > >> timerfd_settime syscall to set it to a proper value.
> > >>
> > >> Unfortunately, both approaches require one more syscall per iteration, compared
> > >> to the original single ppoll(2), cost of which is unneglectable when we talk
> > >> about nanosecond granularity.
> >
> > I'd like to see a more ambitious change, since the timer isn't the
> > only problem like this.  Specifically, I'd like a syscall that does a
> > list of epoll-related things and then waits.  The list of things could
> > include, at least:
> >
> >  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > want to turn on and off their requests for events on a somewhat
> > regular basis.
> >
> >  - timerfd_settime actions: this allows a single syscall to wait and
> > adjust *both* monotonic and real-time wakeups.
> >
> > Would this make sense?  It could look like:
> >
> > int epoll_mod_and_pwait(int epfd,
> >   struct epoll_event *events, int maxevents,
> >   struct epoll_command *commands, int ncommands,
> >   const sigset_t *sigmask);
>
> That's a complicated syscall.  (And it also doesn't have room for the
> flags argument.)
>
> At that point, why not just have a syscall like this:
>
> struct syscall {
>         unsigned long num;
>         unsigned long params[6];
> };

Too minimize locking and such.  The user / kernel transitions are very
fast in some configurations, and I suspect that all the fd lookups,
epoll data structure locks, etc are important, too.

This particular pattern (EPOLL_CTL_MOD, then timerfd_settime, then
epoll_pwait) is so common that optimizing it from three syscalls to
one (as opposed to two as in your patch) seems like it could be
worthwhile.

syscall entry + exit latency is pretty good (~180 cycles total, maybe)
these days, but only in some configurations.  In others, it blows up
pretty badly.  I'm slowly working on improving that.

Also, simplicity is a win. The multi-syscall thing has never gone
smoothly, and, as Alexei pointed out, it's a big mess for seccomp.

>
> int sys_many(size_t count, struct syscall *syscalls, int *results, unsigned long flags);
>
> I think that has been discussed in the past.
>
> Or, these days, that might be better done via eBPF, which would avoid
> the need for flags like "return on error"; an eBPF program could decide
> how to proceed after each call.
>
> - Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08 19:42         ` Andy Lutomirski
  0 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2015-01-08 19:42 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Michael Kerrisk, Mathieu Desnoyers, Zach Brown, Fam Zheng,
	Vivek Goyal, Ingo Molnar, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Alexander Viro, Mike Frysinger, Linux FS Devel,
	Alexei Starovoitov, Rasmus Villemoes, Stefan Hajnoczi,
	Dario Faggioli, Paolo Bonzini, Juri Lelli, X86 ML, Oleg Nesterov,
	Heiko Carstens, H. Peter Anvin, Andrew Morton, Linux API,
	David S. Miller

On Jan 8, 2015 10:42 AM, <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org> wrote:
>
> On Thu, Jan 08, 2015 at 09:57:24AM -0800, Andy Lutomirski wrote:
> > On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> wrote:
> > > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> > >> Applications could use epoll interface when then need to poll a big number of
> > >> files in their main loops, to achieve better performance than ppoll(2). Except
> > >> for one concern: epoll only takes timeout parameters in microseconds, rather
> > >> than nanoseconds.
> > >>
> > >> That is a drawback we should address. For a real case in QEMU, we run into a
> > >> scalability issue with ppoll(2) when many devices are attached to guest, in
> > >> which case many host fds, such as virtual disk images and sockets, need to be
> > >> polled by the main loop. As a result we are looking at switching to epoll, but
> > >> the coarse timeout precision is a trouble, as explained below.
> > >>
> > >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> > >> timers in the main loop; and we call ppoll(2) with the next firing timer as
> > >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> > >> handling IO events, or fire a timer callback). This is natual and efficient,
> > >> except that ppoll(2) itself is slow.
> > >>
> > >> Now that we want to switch to epoll, to speed up the polling. However the timer
> > >> slack setting will be effectively undone, because that way we will have to
> > >> round up the timeout to microseconds honoring timer contract. But consequently,
> > >> this hurts the general responsiveness.
> > >>
> > >> Note: there are two alternatives, without changing kernel:
> > >>
> > >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> > >> be slow as one fd is polled. No more scalability issue. And if there are
> > >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> > >> timeout=0; otherwise, there can't be events for the epoll, skip the following
> > >> epoll_wait and just continue with other work.
> > >>
> > >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> > >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> > >> no other events have arrived. This will inheritly give us timerfd's precision.
> > >> Note that for each poll, the desired timeout is different because the next
> > >> timer is different, so that, before each epoll_wait(2), there will be a
> > >> timerfd_settime syscall to set it to a proper value.
> > >>
> > >> Unfortunately, both approaches require one more syscall per iteration, compared
> > >> to the original single ppoll(2), cost of which is unneglectable when we talk
> > >> about nanosecond granularity.
> >
> > I'd like to see a more ambitious change, since the timer isn't the
> > only problem like this.  Specifically, I'd like a syscall that does a
> > list of epoll-related things and then waits.  The list of things could
> > include, at least:
> >
> >  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > want to turn on and off their requests for events on a somewhat
> > regular basis.
> >
> >  - timerfd_settime actions: this allows a single syscall to wait and
> > adjust *both* monotonic and real-time wakeups.
> >
> > Would this make sense?  It could look like:
> >
> > int epoll_mod_and_pwait(int epfd,
> >   struct epoll_event *events, int maxevents,
> >   struct epoll_command *commands, int ncommands,
> >   const sigset_t *sigmask);
>
> That's a complicated syscall.  (And it also doesn't have room for the
> flags argument.)
>
> At that point, why not just have a syscall like this:
>
> struct syscall {
>         unsigned long num;
>         unsigned long params[6];
> };

Too minimize locking and such.  The user / kernel transitions are very
fast in some configurations, and I suspect that all the fd lookups,
epoll data structure locks, etc are important, too.

This particular pattern (EPOLL_CTL_MOD, then timerfd_settime, then
epoll_pwait) is so common that optimizing it from three syscalls to
one (as opposed to two as in your patch) seems like it could be
worthwhile.

syscall entry + exit latency is pretty good (~180 cycles total, maybe)
these days, but only in some configurations.  In others, it blows up
pretty badly.  I'm slowly working on improving that.

Also, simplicity is a win. The multi-syscall thing has never gone
smoothly, and, as Alexei pointed out, it's a big mess for seccomp.

>
> int sys_many(size_t count, struct syscall *syscalls, int *results, unsigned long flags);
>
> I think that has been discussed in the past.
>
> Or, these days, that might be better done via eBPF, which would avoid
> the need for flags like "return on error"; an eBPF program could decide
> how to proceed after each call.
>
> - Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  1:25       ` Fam Zheng
  0 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-09  1:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett,
	David S. Miller, Linux FS Devel, Linux API, Paolo Bonzini,
	Stefan Hajnoczi, Michael Kerrisk (man-pages)

On Thu, 01/08 09:57, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi@suse.cz> wrote:
> > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> >> Applications could use epoll interface when then need to poll a big number of
> >> files in their main loops, to achieve better performance than ppoll(2). Except
> >> for one concern: epoll only takes timeout parameters in microseconds, rather
> >> than nanoseconds.
> >>
> >> That is a drawback we should address. For a real case in QEMU, we run into a
> >> scalability issue with ppoll(2) when many devices are attached to guest, in
> >> which case many host fds, such as virtual disk images and sockets, need to be
> >> polled by the main loop. As a result we are looking at switching to epoll, but
> >> the coarse timeout precision is a trouble, as explained below.
> >>
> >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> >> timers in the main loop; and we call ppoll(2) with the next firing timer as
> >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> >> handling IO events, or fire a timer callback). This is natual and efficient,
> >> except that ppoll(2) itself is slow.
> >>
> >> Now that we want to switch to epoll, to speed up the polling. However the timer
> >> slack setting will be effectively undone, because that way we will have to
> >> round up the timeout to microseconds honoring timer contract. But consequently,
> >> this hurts the general responsiveness.
> >>
> >> Note: there are two alternatives, without changing kernel:
> >>
> >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> >> be slow as one fd is polled. No more scalability issue. And if there are
> >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> >> timeout=0; otherwise, there can't be events for the epoll, skip the following
> >> epoll_wait and just continue with other work.
> >>
> >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> >> no other events have arrived. This will inheritly give us timerfd's precision.
> >> Note that for each poll, the desired timeout is different because the next
> >> timer is different, so that, before each epoll_wait(2), there will be a
> >> timerfd_settime syscall to set it to a proper value.
> >>
> >> Unfortunately, both approaches require one more syscall per iteration, compared
> >> to the original single ppoll(2), cost of which is unneglectable when we talk
> >> about nanosecond granularity.
> 
> I'd like to see a more ambitious change, since the timer isn't the
> only problem like this.  Specifically, I'd like a syscall that does a
> list of epoll-related things and then waits.  The list of things could
> include, at least:
> 
>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> want to turn on and off their requests for events on a somewhat
> regular basis.

This sounds good to me.

> 
>  - timerfd_settime actions: this allows a single syscall to wait and
> adjust *both* monotonic and real-time wakeups.

I'm not sure, doesn't this break orthogonality between epoll and timerfd?

> 
> Would this make sense?  It could look like:
> 
> int epoll_mod_and_pwait(int epfd,
>   struct epoll_event *events, int maxevents,
>   struct epoll_command *commands, int ncommands,
>   const sigset_t *sigmask);

What about flags?

Fam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  1:25       ` Fam Zheng
  0 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-09  1:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers

On Thu, 01/08 09:57, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> wrote:
> > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
> >> Applications could use epoll interface when then need to poll a big number of
> >> files in their main loops, to achieve better performance than ppoll(2). Except
> >> for one concern: epoll only takes timeout parameters in microseconds, rather
> >> than nanoseconds.
> >>
> >> That is a drawback we should address. For a real case in QEMU, we run into a
> >> scalability issue with ppoll(2) when many devices are attached to guest, in
> >> which case many host fds, such as virtual disk images and sockets, need to be
> >> polled by the main loop. As a result we are looking at switching to epoll, but
> >> the coarse timeout precision is a trouble, as explained below.
> >>
> >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
> >> timers in the main loop; and we call ppoll(2) with the next firing timer as
> >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
> >> handling IO events, or fire a timer callback). This is natual and efficient,
> >> except that ppoll(2) itself is slow.
> >>
> >> Now that we want to switch to epoll, to speed up the polling. However the timer
> >> slack setting will be effectively undone, because that way we will have to
> >> round up the timeout to microseconds honoring timer contract. But consequently,
> >> this hurts the general responsiveness.
> >>
> >> Note: there are two alternatives, without changing kernel:
> >>
> >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
> >> be slow as one fd is polled. No more scalability issue. And if there are
> >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
> >> timeout=0; otherwise, there can't be events for the epoll, skip the following
> >> epoll_wait and just continue with other work.
> >>
> >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
> >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
> >> no other events have arrived. This will inheritly give us timerfd's precision.
> >> Note that for each poll, the desired timeout is different because the next
> >> timer is different, so that, before each epoll_wait(2), there will be a
> >> timerfd_settime syscall to set it to a proper value.
> >>
> >> Unfortunately, both approaches require one more syscall per iteration, compared
> >> to the original single ppoll(2), cost of which is unneglectable when we talk
> >> about nanosecond granularity.
> 
> I'd like to see a more ambitious change, since the timer isn't the
> only problem like this.  Specifically, I'd like a syscall that does a
> list of epoll-related things and then waits.  The list of things could
> include, at least:
> 
>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> want to turn on and off their requests for events on a somewhat
> regular basis.

This sounds good to me.

> 
>  - timerfd_settime actions: this allows a single syscall to wait and
> adjust *both* monotonic and real-time wakeups.

I'm not sure, doesn't this break orthogonality between epoll and timerfd?

> 
> Would this make sense?  It could look like:
> 
> int epoll_mod_and_pwait(int epfd,
>   struct epoll_event *events, int maxevents,
>   struct epoll_command *commands, int ncommands,
>   const sigset_t *sigmask);

What about flags?

Fam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  1:28         ` Andy Lutomirski
  0 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2015-01-09  1:28 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Miklos Szeredi, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett,
	David S. Miller, Linux FS Devel, Linux API, Paolo Bonzini,
	Stefan Hajnoczi, Michael Kerrisk (man-pages)

On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> On Thu, 01/08 09:57, Andy Lutomirski wrote:
>> On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi@suse.cz> wrote:
>> > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
>> >> Applications could use epoll interface when then need to poll a big number of
>> >> files in their main loops, to achieve better performance than ppoll(2). Except
>> >> for one concern: epoll only takes timeout parameters in microseconds, rather
>> >> than nanoseconds.
>> >>
>> >> That is a drawback we should address. For a real case in QEMU, we run into a
>> >> scalability issue with ppoll(2) when many devices are attached to guest, in
>> >> which case many host fds, such as virtual disk images and sockets, need to be
>> >> polled by the main loop. As a result we are looking at switching to epoll, but
>> >> the coarse timeout precision is a trouble, as explained below.
>> >>
>> >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
>> >> timers in the main loop; and we call ppoll(2) with the next firing timer as
>> >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
>> >> handling IO events, or fire a timer callback). This is natual and efficient,
>> >> except that ppoll(2) itself is slow.
>> >>
>> >> Now that we want to switch to epoll, to speed up the polling. However the timer
>> >> slack setting will be effectively undone, because that way we will have to
>> >> round up the timeout to microseconds honoring timer contract. But consequently,
>> >> this hurts the general responsiveness.
>> >>
>> >> Note: there are two alternatives, without changing kernel:
>> >>
>> >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
>> >> be slow as one fd is polled. No more scalability issue. And if there are
>> >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
>> >> timeout=0; otherwise, there can't be events for the epoll, skip the following
>> >> epoll_wait and just continue with other work.
>> >>
>> >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
>> >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
>> >> no other events have arrived. This will inheritly give us timerfd's precision.
>> >> Note that for each poll, the desired timeout is different because the next
>> >> timer is different, so that, before each epoll_wait(2), there will be a
>> >> timerfd_settime syscall to set it to a proper value.
>> >>
>> >> Unfortunately, both approaches require one more syscall per iteration, compared
>> >> to the original single ppoll(2), cost of which is unneglectable when we talk
>> >> about nanosecond granularity.
>>
>> I'd like to see a more ambitious change, since the timer isn't the
>> only problem like this.  Specifically, I'd like a syscall that does a
>> list of epoll-related things and then waits.  The list of things could
>> include, at least:
>>
>>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
>> want to turn on and off their requests for events on a somewhat
>> regular basis.
>
> This sounds good to me.
>
>>
>>  - timerfd_settime actions: this allows a single syscall to wait and
>> adjust *both* monotonic and real-time wakeups.
>
> I'm not sure, doesn't this break orthogonality between epoll and timerfd?

Yes.  It's not very elegant, and more elegant ideas are welcome.

>
>>
>> Would this make sense?  It could look like:
>>
>> int epoll_mod_and_pwait(int epfd,
>>   struct epoll_event *events, int maxevents,
>>   struct epoll_command *commands, int ncommands,
>>   const sigset_t *sigmask);
>
> What about flags?
>

No room.  Maybe it should just be a struct for everything instead of
separate args.

> Fam



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  1:28         ` Andy Lutomirski
  0 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2015-01-09  1:28 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Miklos Szeredi, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers

On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, 01/08 09:57, Andy Lutomirski wrote:
>> On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> wrote:
>> > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
>> >> Applications could use epoll interface when then need to poll a big number of
>> >> files in their main loops, to achieve better performance than ppoll(2). Except
>> >> for one concern: epoll only takes timeout parameters in microseconds, rather
>> >> than nanoseconds.
>> >>
>> >> That is a drawback we should address. For a real case in QEMU, we run into a
>> >> scalability issue with ppoll(2) when many devices are attached to guest, in
>> >> which case many host fds, such as virtual disk images and sockets, need to be
>> >> polled by the main loop. As a result we are looking at switching to epoll, but
>> >> the coarse timeout precision is a trouble, as explained below.
>> >>
>> >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
>> >> timers in the main loop; and we call ppoll(2) with the next firing timer as
>> >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
>> >> handling IO events, or fire a timer callback). This is natual and efficient,
>> >> except that ppoll(2) itself is slow.
>> >>
>> >> Now that we want to switch to epoll, to speed up the polling. However the timer
>> >> slack setting will be effectively undone, because that way we will have to
>> >> round up the timeout to microseconds honoring timer contract. But consequently,
>> >> this hurts the general responsiveness.
>> >>
>> >> Note: there are two alternatives, without changing kernel:
>> >>
>> >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
>> >> be slow as one fd is polled. No more scalability issue. And if there are
>> >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
>> >> timeout=0; otherwise, there can't be events for the epoll, skip the following
>> >> epoll_wait and just continue with other work.
>> >>
>> >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
>> >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
>> >> no other events have arrived. This will inheritly give us timerfd's precision.
>> >> Note that for each poll, the desired timeout is different because the next
>> >> timer is different, so that, before each epoll_wait(2), there will be a
>> >> timerfd_settime syscall to set it to a proper value.
>> >>
>> >> Unfortunately, both approaches require one more syscall per iteration, compared
>> >> to the original single ppoll(2), cost of which is unneglectable when we talk
>> >> about nanosecond granularity.
>>
>> I'd like to see a more ambitious change, since the timer isn't the
>> only problem like this.  Specifically, I'd like a syscall that does a
>> list of epoll-related things and then waits.  The list of things could
>> include, at least:
>>
>>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
>> want to turn on and off their requests for events on a somewhat
>> regular basis.
>
> This sounds good to me.
>
>>
>>  - timerfd_settime actions: this allows a single syscall to wait and
>> adjust *both* monotonic and real-time wakeups.
>
> I'm not sure, doesn't this break orthogonality between epoll and timerfd?

Yes.  It's not very elegant, and more elegant ideas are welcome.

>
>>
>> Would this make sense?  It could look like:
>>
>> int epoll_mod_and_pwait(int epfd,
>>   struct epoll_event *events, int maxevents,
>>   struct epoll_command *commands, int ncommands,
>>   const sigset_t *sigmask);
>
> What about flags?
>

No room.  Maybe it should just be a struct for everything instead of
separate args.

> Fam



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
  2015-01-09  1:28         ` Andy Lutomirski
@ 2015-01-09  1:52           ` Fam Zheng
  -1 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-09  1:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett,
	David S. Miller, Linux FS Devel, Linux API, Paolo Bonzini,
	Stefan Hajnoczi, Michael Kerrisk (man-pages)

On Thu, 01/08 17:28, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> >> I'd like to see a more ambitious change, since the timer isn't the
> >> only problem like this.  Specifically, I'd like a syscall that does a
> >> list of epoll-related things and then waits.  The list of things could
> >> include, at least:
> >>
> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> >> want to turn on and off their requests for events on a somewhat
> >> regular basis.
> >
> > This sounds good to me.
> >
> >>
> >>  - timerfd_settime actions: this allows a single syscall to wait and
> >> adjust *both* monotonic and real-time wakeups.
> >
> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> 
> Yes.  It's not very elegant, and more elegant ideas are welcome.

What is the purpose of embedding timerfd operation here? Modifying timerfd
for each poll doesn't sound a common pattern to me.

> 
> >
> >>
> >> Would this make sense?  It could look like:
> >>
> >> int epoll_mod_and_pwait(int epfd,
> >>   struct epoll_event *events, int maxevents,
> >>   struct epoll_command *commands, int ncommands,
> >>   const sigset_t *sigmask);
> >
> > What about flags?
> >
> 
> No room.  Maybe it should just be a struct for everything instead of
> separate args.

Also no room for timeout. A single struct sounds the only way to go.

Fam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  1:52           ` Fam Zheng
  0 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-09  1:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers

On Thu, 01/08 17:28, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> >> I'd like to see a more ambitious change, since the timer isn't the
> >> only problem like this.  Specifically, I'd like a syscall that does a
> >> list of epoll-related things and then waits.  The list of things could
> >> include, at least:
> >>
> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> >> want to turn on and off their requests for events on a somewhat
> >> regular basis.
> >
> > This sounds good to me.
> >
> >>
> >>  - timerfd_settime actions: this allows a single syscall to wait and
> >> adjust *both* monotonic and real-time wakeups.
> >
> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> 
> Yes.  It's not very elegant, and more elegant ideas are welcome.

What is the purpose of embedding timerfd operation here? Modifying timerfd
for each poll doesn't sound a common pattern to me.

> 
> >
> >>
> >> Would this make sense?  It could look like:
> >>
> >> int epoll_mod_and_pwait(int epfd,
> >>   struct epoll_event *events, int maxevents,
> >>   struct epoll_command *commands, int ncommands,
> >>   const sigset_t *sigmask);
> >
> > What about flags?
> >
> 
> No room.  Maybe it should just be a struct for everything instead of
> separate args.

Also no room for timeout. A single struct sounds the only way to go.

Fam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  2:24             ` Andy Lutomirski
  0 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2015-01-09  2:24 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Miklos Szeredi, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett,
	David S. Miller, Linux FS Devel, Linux API, Paolo Bonzini,
	Stefan Hajnoczi, Michael Kerrisk (man-pages)

On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> On Thu, 01/08 17:28, Andy Lutomirski wrote:
>> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
>> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
>> >> I'd like to see a more ambitious change, since the timer isn't the
>> >> only problem like this.  Specifically, I'd like a syscall that does a
>> >> list of epoll-related things and then waits.  The list of things could
>> >> include, at least:
>> >>
>> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
>> >> want to turn on and off their requests for events on a somewhat
>> >> regular basis.
>> >
>> > This sounds good to me.
>> >
>> >>
>> >>  - timerfd_settime actions: this allows a single syscall to wait and
>> >> adjust *both* monotonic and real-time wakeups.
>> >
>> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
>>
>> Yes.  It's not very elegant, and more elegant ideas are welcome.
>
> What is the purpose of embedding timerfd operation here? Modifying timerfd
> for each poll doesn't sound a common pattern to me.

Setting a timeout is definitely a common pattern, hence this thread.
But the current timeout interface sucks, and people should really use
absolute time.  (My epoll software uses absolute time.)  But then
users need to decide whether to have their timeout based on the
monotonic clock or the realtime clock (or something else entirely).
Some bigger programs may want both -- they may have internal events
queued for certain times and for certain timeouts, and those should
use realtime and monotonic respectively.  Heck, users may also want
separate slack values on those.

Timerfd is the only thing we have right now that is anywhere near
flexible enough.  Obviously if epoll became fancy enough, then we
could do away with the timerfd entirely here.

>
>>
>> >
>> >>
>> >> Would this make sense?  It could look like:
>> >>
>> >> int epoll_mod_and_pwait(int epfd,
>> >>   struct epoll_event *events, int maxevents,
>> >>   struct epoll_command *commands, int ncommands,
>> >>   const sigset_t *sigmask);
>> >
>> > What about flags?
>> >
>>
>> No room.  Maybe it should just be a struct for everything instead of
>> separate args.
>
> Also no room for timeout. A single struct sounds the only way to go.

That's what timerfd is for.  I think it would be a bit weird to
support "timeout" and detailed timerfd control.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  2:24             ` Andy Lutomirski
  0 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2015-01-09  2:24 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Miklos Szeredi, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers

On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, 01/08 17:28, Andy Lutomirski wrote:
>> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
>> >> I'd like to see a more ambitious change, since the timer isn't the
>> >> only problem like this.  Specifically, I'd like a syscall that does a
>> >> list of epoll-related things and then waits.  The list of things could
>> >> include, at least:
>> >>
>> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
>> >> want to turn on and off their requests for events on a somewhat
>> >> regular basis.
>> >
>> > This sounds good to me.
>> >
>> >>
>> >>  - timerfd_settime actions: this allows a single syscall to wait and
>> >> adjust *both* monotonic and real-time wakeups.
>> >
>> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
>>
>> Yes.  It's not very elegant, and more elegant ideas are welcome.
>
> What is the purpose of embedding timerfd operation here? Modifying timerfd
> for each poll doesn't sound a common pattern to me.

Setting a timeout is definitely a common pattern, hence this thread.
But the current timeout interface sucks, and people should really use
absolute time.  (My epoll software uses absolute time.)  But then
users need to decide whether to have their timeout based on the
monotonic clock or the realtime clock (or something else entirely).
Some bigger programs may want both -- they may have internal events
queued for certain times and for certain timeouts, and those should
use realtime and monotonic respectively.  Heck, users may also want
separate slack values on those.

Timerfd is the only thing we have right now that is anywhere near
flexible enough.  Obviously if epoll became fancy enough, then we
could do away with the timerfd entirely here.

>
>>
>> >
>> >>
>> >> Would this make sense?  It could look like:
>> >>
>> >> int epoll_mod_and_pwait(int epfd,
>> >>   struct epoll_event *events, int maxevents,
>> >>   struct epoll_command *commands, int ncommands,
>> >>   const sigset_t *sigmask);
>> >
>> > What about flags?
>> >
>>
>> No room.  Maybe it should just be a struct for everything instead of
>> separate args.
>
> Also no room for timeout. A single struct sounds the only way to go.

That's what timerfd is for.  I think it would be a bit weird to
support "timeout" and detailed timerfd control.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
  2015-01-09  2:24             ` Andy Lutomirski
@ 2015-01-09  4:49               ` Fam Zheng
  -1 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-09  4:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett,
	David S. Miller, Linux FS Devel, Linux API, Paolo Bonzini,
	Stefan Hajnoczi, Michael Kerrisk (man-pages)

On Thu, 01/08 18:24, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> >> >> I'd like to see a more ambitious change, since the timer isn't the
> >> >> only problem like this.  Specifically, I'd like a syscall that does a
> >> >> list of epoll-related things and then waits.  The list of things could
> >> >> include, at least:
> >> >>
> >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> >> >> want to turn on and off their requests for events on a somewhat
> >> >> regular basis.
> >> >
> >> > This sounds good to me.
> >> >
> >> >>
> >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> >> >> adjust *both* monotonic and real-time wakeups.
> >> >
> >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> >>
> >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> >
> > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > for each poll doesn't sound a common pattern to me.
> 
> Setting a timeout is definitely a common pattern, hence this thread.
> But the current timeout interface sucks, and people should really use
> absolute time.  (My epoll software uses absolute time.)  But then
> users need to decide whether to have their timeout based on the
> monotonic clock or the realtime clock (or something else entirely).
> Some bigger programs may want both -- they may have internal events
> queued for certain times and for certain timeouts, and those should
> use realtime and monotonic respectively.  Heck, users may also want
> separate slack values on those.
> 
> Timerfd is the only thing we have right now that is anywhere near
> flexible enough.  Obviously if epoll became fancy enough, then we
> could do away with the timerfd entirely here.
> 
> >
> >>
> >> >
> >> >>
> >> >> Would this make sense?  It could look like:
> >> >>
> >> >> int epoll_mod_and_pwait(int epfd,
> >> >>   struct epoll_event *events, int maxevents,
> >> >>   struct epoll_command *commands, int ncommands,
> >> >>   const sigset_t *sigmask);
> >> >
> >> > What about flags?
> >> >
> >>
> >> No room.  Maybe it should just be a struct for everything instead of
> >> separate args.
> >
> > Also no room for timeout. A single struct sounds the only way to go.
> 
> That's what timerfd is for.  I think it would be a bit weird to
> support "timeout" and detailed timerfd control.

I see what you mean. Thanks.

I still don't like hooking timerfd in the interface. Besides the unclean
interface, it also feels cubersome and overkill to let users setup and add a
dedicated timerfd to implement timeout.

How about this:

int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);

struct epoll_mod_wait_data {
	struct epoll_event *events;
	int maxevents;
	struct epoll_mod_cmd {
		int op,
		int fd;
		void *data;
	} *cmds;
	int ncmds;
	int flags;
	sigset_t *sigmask;
};

Commands ops are:

	EPOLL_CTL_ADD
		@fd is the fd to modify; @data is epoll_event.
	EPOLL_CTL_MOD
		@fd is the fd to modify; @data is epoll_event.
	EPOLL_CTL_DEL
		@fd is the fd to modify; @data is epoll_event.

	EPOLL_CTL_SET_TIMEOUT
		@fd is ignored, @data is timespec.
		Clock type and relative/absolute are selected by flags as below.

Flags are given to override timeout defaults:
	EPOLL_FL_MONOTONIC_CLOCK
		If set, don't use realtime clock, use monotonic clock.
	EPOLL_FL_ABSOLUTE_TIMEOUT
		If set, don't use relative timeout, use absolute timeout.

Thanks,
Fam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  4:49               ` Fam Zheng
  0 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-09  4:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Alexander Viro, Andrew Morton,
	Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers

On Thu, 01/08 18:24, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> >> >> I'd like to see a more ambitious change, since the timer isn't the
> >> >> only problem like this.  Specifically, I'd like a syscall that does a
> >> >> list of epoll-related things and then waits.  The list of things could
> >> >> include, at least:
> >> >>
> >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> >> >> want to turn on and off their requests for events on a somewhat
> >> >> regular basis.
> >> >
> >> > This sounds good to me.
> >> >
> >> >>
> >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> >> >> adjust *both* monotonic and real-time wakeups.
> >> >
> >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> >>
> >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> >
> > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > for each poll doesn't sound a common pattern to me.
> 
> Setting a timeout is definitely a common pattern, hence this thread.
> But the current timeout interface sucks, and people should really use
> absolute time.  (My epoll software uses absolute time.)  But then
> users need to decide whether to have their timeout based on the
> monotonic clock or the realtime clock (or something else entirely).
> Some bigger programs may want both -- they may have internal events
> queued for certain times and for certain timeouts, and those should
> use realtime and monotonic respectively.  Heck, users may also want
> separate slack values on those.
> 
> Timerfd is the only thing we have right now that is anywhere near
> flexible enough.  Obviously if epoll became fancy enough, then we
> could do away with the timerfd entirely here.
> 
> >
> >>
> >> >
> >> >>
> >> >> Would this make sense?  It could look like:
> >> >>
> >> >> int epoll_mod_and_pwait(int epfd,
> >> >>   struct epoll_event *events, int maxevents,
> >> >>   struct epoll_command *commands, int ncommands,
> >> >>   const sigset_t *sigmask);
> >> >
> >> > What about flags?
> >> >
> >>
> >> No room.  Maybe it should just be a struct for everything instead of
> >> separate args.
> >
> > Also no room for timeout. A single struct sounds the only way to go.
> 
> That's what timerfd is for.  I think it would be a bit weird to
> support "timeout" and detailed timerfd control.

I see what you mean. Thanks.

I still don't like hooking timerfd in the interface. Besides the unclean
interface, it also feels cubersome and overkill to let users setup and add a
dedicated timerfd to implement timeout.

How about this:

int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);

struct epoll_mod_wait_data {
	struct epoll_event *events;
	int maxevents;
	struct epoll_mod_cmd {
		int op,
		int fd;
		void *data;
	} *cmds;
	int ncmds;
	int flags;
	sigset_t *sigmask;
};

Commands ops are:

	EPOLL_CTL_ADD
		@fd is the fd to modify; @data is epoll_event.
	EPOLL_CTL_MOD
		@fd is the fd to modify; @data is epoll_event.
	EPOLL_CTL_DEL
		@fd is the fd to modify; @data is epoll_event.

	EPOLL_CTL_SET_TIMEOUT
		@fd is ignored, @data is timespec.
		Clock type and relative/absolute are selected by flags as below.

Flags are given to override timeout defaults:
	EPOLL_FL_MONOTONIC_CLOCK
		If set, don't use realtime clock, use monotonic clock.
	EPOLL_FL_ABSOLUTE_TIMEOUT
		If set, don't use relative timeout, use absolute timeout.

Thanks,
Fam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
  2015-01-09  4:49               ` Fam Zheng
  (?)
@ 2015-01-09  5:21                 ` Josh Triplett
  -1 siblings, 0 replies; 39+ messages in thread
From: Josh Triplett @ 2015-01-09  5:21 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, David S. Miller,
	Linux FS Devel, Linux API, Paolo Bonzini, Stefan Hajnoczi,
	Michael Kerrisk (man-pages)

On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > >> >> list of epoll-related things and then waits.  The list of things could
> > >> >> include, at least:
> > >> >>
> > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > >> >> want to turn on and off their requests for events on a somewhat
> > >> >> regular basis.
> > >> >
> > >> > This sounds good to me.
> > >> >
> > >> >>
> > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > >> >> adjust *both* monotonic and real-time wakeups.
> > >> >
> > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > >>
> > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > >
> > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > for each poll doesn't sound a common pattern to me.
> > 
> > Setting a timeout is definitely a common pattern, hence this thread.
> > But the current timeout interface sucks, and people should really use
> > absolute time.  (My epoll software uses absolute time.)  But then
> > users need to decide whether to have their timeout based on the
> > monotonic clock or the realtime clock (or something else entirely).
> > Some bigger programs may want both -- they may have internal events
> > queued for certain times and for certain timeouts, and those should
> > use realtime and monotonic respectively.  Heck, users may also want
> > separate slack values on those.
> > 
> > Timerfd is the only thing we have right now that is anywhere near
> > flexible enough.  Obviously if epoll became fancy enough, then we
> > could do away with the timerfd entirely here.
> > 
> > >
> > >>
> > >> >
> > >> >>
> > >> >> Would this make sense?  It could look like:
> > >> >>
> > >> >> int epoll_mod_and_pwait(int epfd,
> > >> >>   struct epoll_event *events, int maxevents,
> > >> >>   struct epoll_command *commands, int ncommands,
> > >> >>   const sigset_t *sigmask);
> > >> >
> > >> > What about flags?
> > >> >
> > >>
> > >> No room.  Maybe it should just be a struct for everything instead of
> > >> separate args.
> > >
> > > Also no room for timeout. A single struct sounds the only way to go.
> > 
> > That's what timerfd is for.  I think it would be a bit weird to
> > support "timeout" and detailed timerfd control.
> 
> I see what you mean. Thanks.
> 
> I still don't like hooking timerfd in the interface. Besides the unclean
> interface, it also feels cubersome and overkill to let users setup and add a
> dedicated timerfd to implement timeout.
> 
> How about this:
> 
> int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> 
> struct epoll_mod_wait_data {
> 	struct epoll_event *events;
> 	int maxevents;
> 	struct epoll_mod_cmd {
> 		int op,
> 		int fd;
> 		void *data;
> 	} *cmds;
> 	int ncmds;
> 	int flags;
> 	sigset_t *sigmask;
> };
> 
> Commands ops are:
> 
> 	EPOLL_CTL_ADD
> 		@fd is the fd to modify; @data is epoll_event.
> 	EPOLL_CTL_MOD
> 		@fd is the fd to modify; @data is epoll_event.
> 	EPOLL_CTL_DEL
> 		@fd is the fd to modify; @data is epoll_event.
> 
> 	EPOLL_CTL_SET_TIMEOUT
> 		@fd is ignored, @data is timespec.
> 		Clock type and relative/absolute are selected by flags as below.
> 
> Flags are given to override timeout defaults:
> 	EPOLL_FL_MONOTONIC_CLOCK
> 		If set, don't use realtime clock, use monotonic clock.
> 	EPOLL_FL_ABSOLUTE_TIMEOUT
> 		If set, don't use relative timeout, use absolute timeout.

I'd suggest using an "int clockid" field instead, like timerfd_settime;
even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
extending in the future, it'd be painful to have to remap new CLOCK_*
constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
in favor of timerfds makes things more nicely orthogonal, but epoll_wait
already has a timeout parameter, so *shrug*.)

Also, I think that structure has too many levels of indirection; it'd
produce many unnecessary cache misses; considering you're trying to
eliminate the overhead of one or two extra syscalls, you don't want to
introduce a pile of unnecessary cache misses in the processes.  I'd
suggest inlining cmds as an array at the end of the structure, and
turning "void *data" into an inline epoll_event.  (Or, you could use
"events" as an in/out parameter.)

You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
timespec directly in the top-level structure.

And I'd suggest either making flags a top-level parameter or putting it
at the start of the structure, to make future extension easier.

</bikeshed>

- Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  5:21                 ` Josh Triplett
  0 siblings, 0 replies; 39+ messages in thread
From: Josh Triplett @ 2015-01-09  5:21 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov, Mathieu

On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > >> >> list of epoll-related things and then waits.  The list of things could
> > >> >> include, at least:
> > >> >>
> > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > >> >> want to turn on and off their requests for events on a somewhat
> > >> >> regular basis.
> > >> >
> > >> > This sounds good to me.
> > >> >
> > >> >>
> > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > >> >> adjust *both* monotonic and real-time wakeups.
> > >> >
> > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > >>
> > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > >
> > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > for each poll doesn't sound a common pattern to me.
> > 
> > Setting a timeout is definitely a common pattern, hence this thread.
> > But the current timeout interface sucks, and people should really use
> > absolute time.  (My epoll software uses absolute time.)  But then
> > users need to decide whether to have their timeout based on the
> > monotonic clock or the realtime clock (or something else entirely).
> > Some bigger programs may want both -- they may have internal events
> > queued for certain times and for certain timeouts, and those should
> > use realtime and monotonic respectively.  Heck, users may also want
> > separate slack values on those.
> > 
> > Timerfd is the only thing we have right now that is anywhere near
> > flexible enough.  Obviously if epoll became fancy enough, then we
> > could do away with the timerfd entirely here.
> > 
> > >
> > >>
> > >> >
> > >> >>
> > >> >> Would this make sense?  It could look like:
> > >> >>
> > >> >> int epoll_mod_and_pwait(int epfd,
> > >> >>   struct epoll_event *events, int maxevents,
> > >> >>   struct epoll_command *commands, int ncommands,
> > >> >>   const sigset_t *sigmask);
> > >> >
> > >> > What about flags?
> > >> >
> > >>
> > >> No room.  Maybe it should just be a struct for everything instead of
> > >> separate args.
> > >
> > > Also no room for timeout. A single struct sounds the only way to go.
> > 
> > That's what timerfd is for.  I think it would be a bit weird to
> > support "timeout" and detailed timerfd control.
> 
> I see what you mean. Thanks.
> 
> I still don't like hooking timerfd in the interface. Besides the unclean
> interface, it also feels cubersome and overkill to let users setup and add a
> dedicated timerfd to implement timeout.
> 
> How about this:
> 
> int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> 
> struct epoll_mod_wait_data {
> 	struct epoll_event *events;
> 	int maxevents;
> 	struct epoll_mod_cmd {
> 		int op,
> 		int fd;
> 		void *data;
> 	} *cmds;
> 	int ncmds;
> 	int flags;
> 	sigset_t *sigmask;
> };
> 
> Commands ops are:
> 
> 	EPOLL_CTL_ADD
> 		@fd is the fd to modify; @data is epoll_event.
> 	EPOLL_CTL_MOD
> 		@fd is the fd to modify; @data is epoll_event.
> 	EPOLL_CTL_DEL
> 		@fd is the fd to modify; @data is epoll_event.
> 
> 	EPOLL_CTL_SET_TIMEOUT
> 		@fd is ignored, @data is timespec.
> 		Clock type and relative/absolute are selected by flags as below.
> 
> Flags are given to override timeout defaults:
> 	EPOLL_FL_MONOTONIC_CLOCK
> 		If set, don't use realtime clock, use monotonic clock.
> 	EPOLL_FL_ABSOLUTE_TIMEOUT
> 		If set, don't use relative timeout, use absolute timeout.

I'd suggest using an "int clockid" field instead, like timerfd_settime;
even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
extending in the future, it'd be painful to have to remap new CLOCK_*
constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
in favor of timerfds makes things more nicely orthogonal, but epoll_wait
already has a timeout parameter, so *shrug*.)

Also, I think that structure has too many levels of indirection; it'd
produce many unnecessary cache misses; considering you're trying to
eliminate the overhead of one or two extra syscalls, you don't want to
introduce a pile of unnecessary cache misses in the processes.  I'd
suggest inlining cmds as an array at the end of the structure, and
turning "void *data" into an inline epoll_event.  (Or, you could use
"events" as an in/out parameter.)

You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
timespec directly in the top-level structure.

And I'd suggest either making flags a top-level parameter or putting it
at the start of the structure, to make future extension easier.

</bikeshed>

- Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-09  5:21                 ` Josh Triplett
  0 siblings, 0 replies; 39+ messages in thread
From: Josh Triplett @ 2015-01-09  5:21 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov, Mathieu

On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > >> >> list of epoll-related things and then waits.  The list of things could
> > >> >> include, at least:
> > >> >>
> > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > >> >> want to turn on and off their requests for events on a somewhat
> > >> >> regular basis.
> > >> >
> > >> > This sounds good to me.
> > >> >
> > >> >>
> > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > >> >> adjust *both* monotonic and real-time wakeups.
> > >> >
> > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > >>
> > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > >
> > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > for each poll doesn't sound a common pattern to me.
> > 
> > Setting a timeout is definitely a common pattern, hence this thread.
> > But the current timeout interface sucks, and people should really use
> > absolute time.  (My epoll software uses absolute time.)  But then
> > users need to decide whether to have their timeout based on the
> > monotonic clock or the realtime clock (or something else entirely).
> > Some bigger programs may want both -- they may have internal events
> > queued for certain times and for certain timeouts, and those should
> > use realtime and monotonic respectively.  Heck, users may also want
> > separate slack values on those.
> > 
> > Timerfd is the only thing we have right now that is anywhere near
> > flexible enough.  Obviously if epoll became fancy enough, then we
> > could do away with the timerfd entirely here.
> > 
> > >
> > >>
> > >> >
> > >> >>
> > >> >> Would this make sense?  It could look like:
> > >> >>
> > >> >> int epoll_mod_and_pwait(int epfd,
> > >> >>   struct epoll_event *events, int maxevents,
> > >> >>   struct epoll_command *commands, int ncommands,
> > >> >>   const sigset_t *sigmask);
> > >> >
> > >> > What about flags?
> > >> >
> > >>
> > >> No room.  Maybe it should just be a struct for everything instead of
> > >> separate args.
> > >
> > > Also no room for timeout. A single struct sounds the only way to go.
> > 
> > That's what timerfd is for.  I think it would be a bit weird to
> > support "timeout" and detailed timerfd control.
> 
> I see what you mean. Thanks.
> 
> I still don't like hooking timerfd in the interface. Besides the unclean
> interface, it also feels cubersome and overkill to let users setup and add a
> dedicated timerfd to implement timeout.
> 
> How about this:
> 
> int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> 
> struct epoll_mod_wait_data {
> 	struct epoll_event *events;
> 	int maxevents;
> 	struct epoll_mod_cmd {
> 		int op,
> 		int fd;
> 		void *data;
> 	} *cmds;
> 	int ncmds;
> 	int flags;
> 	sigset_t *sigmask;
> };
> 
> Commands ops are:
> 
> 	EPOLL_CTL_ADD
> 		@fd is the fd to modify; @data is epoll_event.
> 	EPOLL_CTL_MOD
> 		@fd is the fd to modify; @data is epoll_event.
> 	EPOLL_CTL_DEL
> 		@fd is the fd to modify; @data is epoll_event.
> 
> 	EPOLL_CTL_SET_TIMEOUT
> 		@fd is ignored, @data is timespec.
> 		Clock type and relative/absolute are selected by flags as below.
> 
> Flags are given to override timeout defaults:
> 	EPOLL_FL_MONOTONIC_CLOCK
> 		If set, don't use realtime clock, use monotonic clock.
> 	EPOLL_FL_ABSOLUTE_TIMEOUT
> 		If set, don't use relative timeout, use absolute timeout.

I'd suggest using an "int clockid" field instead, like timerfd_settime;
even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
extending in the future, it'd be painful to have to remap new CLOCK_*
constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
in favor of timerfds makes things more nicely orthogonal, but epoll_wait
already has a timeout parameter, so *shrug*.)

Also, I think that structure has too many levels of indirection; it'd
produce many unnecessary cache misses; considering you're trying to
eliminate the overhead of one or two extra syscalls, you don't want to
introduce a pile of unnecessary cache misses in the processes.  I'd
suggest inlining cmds as an array at the end of the structure, and
turning "void *data" into an inline epoll_event.  (Or, you could use
"events" as an in/out parameter.)

You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
timespec directly in the top-level structure.

And I'd suggest either making flags a top-level parameter or putting it
at the start of the structure, to make future extension easier.

</bikeshed>

- Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
  2015-01-09  5:21                 ` Josh Triplett
  (?)
@ 2015-01-12  8:24                   ` Fam Zheng
  -1 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-12  8:24 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, David S. Miller,
	Linux FS Devel, Linux API, Paolo Bonzini, Stefan Hajnoczi,
	Michael Kerrisk (man-pages)

On Thu, 01/08 21:21, Josh Triplett wrote:
> On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> > On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > > >> >> list of epoll-related things and then waits.  The list of things could
> > > >> >> include, at least:
> > > >> >>
> > > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > > >> >> want to turn on and off their requests for events on a somewhat
> > > >> >> regular basis.
> > > >> >
> > > >> > This sounds good to me.
> > > >> >
> > > >> >>
> > > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > > >> >> adjust *both* monotonic and real-time wakeups.
> > > >> >
> > > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > > >>
> > > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > > >
> > > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > > for each poll doesn't sound a common pattern to me.
> > > 
> > > Setting a timeout is definitely a common pattern, hence this thread.
> > > But the current timeout interface sucks, and people should really use
> > > absolute time.  (My epoll software uses absolute time.)  But then
> > > users need to decide whether to have their timeout based on the
> > > monotonic clock or the realtime clock (or something else entirely).
> > > Some bigger programs may want both -- they may have internal events
> > > queued for certain times and for certain timeouts, and those should
> > > use realtime and monotonic respectively.  Heck, users may also want
> > > separate slack values on those.
> > > 
> > > Timerfd is the only thing we have right now that is anywhere near
> > > flexible enough.  Obviously if epoll became fancy enough, then we
> > > could do away with the timerfd entirely here.
> > > 
> > > >
> > > >>
> > > >> >
> > > >> >>
> > > >> >> Would this make sense?  It could look like:
> > > >> >>
> > > >> >> int epoll_mod_and_pwait(int epfd,
> > > >> >>   struct epoll_event *events, int maxevents,
> > > >> >>   struct epoll_command *commands, int ncommands,
> > > >> >>   const sigset_t *sigmask);
> > > >> >
> > > >> > What about flags?
> > > >> >
> > > >>
> > > >> No room.  Maybe it should just be a struct for everything instead of
> > > >> separate args.
> > > >
> > > > Also no room for timeout. A single struct sounds the only way to go.
> > > 
> > > That's what timerfd is for.  I think it would be a bit weird to
> > > support "timeout" and detailed timerfd control.
> > 
> > I see what you mean. Thanks.
> > 
> > I still don't like hooking timerfd in the interface. Besides the unclean
> > interface, it also feels cubersome and overkill to let users setup and add a
> > dedicated timerfd to implement timeout.
> > 
> > How about this:
> > 
> > int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> > 
> > struct epoll_mod_wait_data {
> > 	struct epoll_event *events;
> > 	int maxevents;
> > 	struct epoll_mod_cmd {
> > 		int op,
> > 		int fd;
> > 		void *data;
> > 	} *cmds;
> > 	int ncmds;
> > 	int flags;
> > 	sigset_t *sigmask;
> > };
> > 
> > Commands ops are:
> > 
> > 	EPOLL_CTL_ADD
> > 		@fd is the fd to modify; @data is epoll_event.
> > 	EPOLL_CTL_MOD
> > 		@fd is the fd to modify; @data is epoll_event.
> > 	EPOLL_CTL_DEL
> > 		@fd is the fd to modify; @data is epoll_event.
> > 
> > 	EPOLL_CTL_SET_TIMEOUT
> > 		@fd is ignored, @data is timespec.
> > 		Clock type and relative/absolute are selected by flags as below.
> > 
> > Flags are given to override timeout defaults:
> > 	EPOLL_FL_MONOTONIC_CLOCK
> > 		If set, don't use realtime clock, use monotonic clock.
> > 	EPOLL_FL_ABSOLUTE_TIMEOUT
> > 		If set, don't use relative timeout, use absolute timeout.
> 
> I'd suggest using an "int clockid" field instead, like timerfd_settime;
> even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
> extending in the future, it'd be painful to have to remap new CLOCK_*
> constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
> in favor of timerfds makes things more nicely orthogonal, but epoll_wait
> already has a timeout parameter, so *shrug*.)
> 
> Also, I think that structure has too many levels of indirection; it'd
> produce many unnecessary cache misses; considering you're trying to
> eliminate the overhead of one or two extra syscalls, you don't want to
> introduce a pile of unnecessary cache misses in the processes.  I'd
> suggest inlining cmds as an array at the end of the structure, and
> turning "void *data" into an inline epoll_event.  (Or, you could use
> "events" as an in/out parameter.)
> 
> You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
> timespec directly in the top-level structure.
> 
> And I'd suggest either making flags a top-level parameter or putting it
> at the start of the structure, to make future extension easier.

Makes sense to me, thanks.

Also the number of cmds are undecided until we do a copy_from_user for the
header fields before another one for specified number of cmds. So I think it's
better to move ncmds and cmds to top level parameter.

Fam

> 
> </bikeshed>
> 
> - Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-12  8:24                   ` Fam Zheng
  0 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-12  8:24 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov, Mathieu

On Thu, 01/08 21:21, Josh Triplett wrote:
> On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> > On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > > >> >> list of epoll-related things and then waits.  The list of things could
> > > >> >> include, at least:
> > > >> >>
> > > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > > >> >> want to turn on and off their requests for events on a somewhat
> > > >> >> regular basis.
> > > >> >
> > > >> > This sounds good to me.
> > > >> >
> > > >> >>
> > > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > > >> >> adjust *both* monotonic and real-time wakeups.
> > > >> >
> > > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > > >>
> > > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > > >
> > > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > > for each poll doesn't sound a common pattern to me.
> > > 
> > > Setting a timeout is definitely a common pattern, hence this thread.
> > > But the current timeout interface sucks, and people should really use
> > > absolute time.  (My epoll software uses absolute time.)  But then
> > > users need to decide whether to have their timeout based on the
> > > monotonic clock or the realtime clock (or something else entirely).
> > > Some bigger programs may want both -- they may have internal events
> > > queued for certain times and for certain timeouts, and those should
> > > use realtime and monotonic respectively.  Heck, users may also want
> > > separate slack values on those.
> > > 
> > > Timerfd is the only thing we have right now that is anywhere near
> > > flexible enough.  Obviously if epoll became fancy enough, then we
> > > could do away with the timerfd entirely here.
> > > 
> > > >
> > > >>
> > > >> >
> > > >> >>
> > > >> >> Would this make sense?  It could look like:
> > > >> >>
> > > >> >> int epoll_mod_and_pwait(int epfd,
> > > >> >>   struct epoll_event *events, int maxevents,
> > > >> >>   struct epoll_command *commands, int ncommands,
> > > >> >>   const sigset_t *sigmask);
> > > >> >
> > > >> > What about flags?
> > > >> >
> > > >>
> > > >> No room.  Maybe it should just be a struct for everything instead of
> > > >> separate args.
> > > >
> > > > Also no room for timeout. A single struct sounds the only way to go.
> > > 
> > > That's what timerfd is for.  I think it would be a bit weird to
> > > support "timeout" and detailed timerfd control.
> > 
> > I see what you mean. Thanks.
> > 
> > I still don't like hooking timerfd in the interface. Besides the unclean
> > interface, it also feels cubersome and overkill to let users setup and add a
> > dedicated timerfd to implement timeout.
> > 
> > How about this:
> > 
> > int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> > 
> > struct epoll_mod_wait_data {
> > 	struct epoll_event *events;
> > 	int maxevents;
> > 	struct epoll_mod_cmd {
> > 		int op,
> > 		int fd;
> > 		void *data;
> > 	} *cmds;
> > 	int ncmds;
> > 	int flags;
> > 	sigset_t *sigmask;
> > };
> > 
> > Commands ops are:
> > 
> > 	EPOLL_CTL_ADD
> > 		@fd is the fd to modify; @data is epoll_event.
> > 	EPOLL_CTL_MOD
> > 		@fd is the fd to modify; @data is epoll_event.
> > 	EPOLL_CTL_DEL
> > 		@fd is the fd to modify; @data is epoll_event.
> > 
> > 	EPOLL_CTL_SET_TIMEOUT
> > 		@fd is ignored, @data is timespec.
> > 		Clock type and relative/absolute are selected by flags as below.
> > 
> > Flags are given to override timeout defaults:
> > 	EPOLL_FL_MONOTONIC_CLOCK
> > 		If set, don't use realtime clock, use monotonic clock.
> > 	EPOLL_FL_ABSOLUTE_TIMEOUT
> > 		If set, don't use relative timeout, use absolute timeout.
> 
> I'd suggest using an "int clockid" field instead, like timerfd_settime;
> even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
> extending in the future, it'd be painful to have to remap new CLOCK_*
> constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
> in favor of timerfds makes things more nicely orthogonal, but epoll_wait
> already has a timeout parameter, so *shrug*.)
> 
> Also, I think that structure has too many levels of indirection; it'd
> produce many unnecessary cache misses; considering you're trying to
> eliminate the overhead of one or two extra syscalls, you don't want to
> introduce a pile of unnecessary cache misses in the processes.  I'd
> suggest inlining cmds as an array at the end of the structure, and
> turning "void *data" into an inline epoll_event.  (Or, you could use
> "events" as an in/out parameter.)
> 
> You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
> timespec directly in the top-level structure.
> 
> And I'd suggest either making flags a top-level parameter or putting it
> at the start of the structure, to make future extension easier.

Makes sense to me, thanks.

Also the number of cmds are undecided until we do a copy_from_user for the
header fields before another one for specified number of cmds. So I think it's
better to move ncmds and cmds to top level parameter.

Fam

> 
> </bikeshed>
> 
> - Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-12  8:24                   ` Fam Zheng
  0 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-12  8:24 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov, Mathieu

On Thu, 01/08 21:21, Josh Triplett wrote:
> On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> > On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > > >> >> list of epoll-related things and then waits.  The list of things could
> > > >> >> include, at least:
> > > >> >>
> > > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > > >> >> want to turn on and off their requests for events on a somewhat
> > > >> >> regular basis.
> > > >> >
> > > >> > This sounds good to me.
> > > >> >
> > > >> >>
> > > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > > >> >> adjust *both* monotonic and real-time wakeups.
> > > >> >
> > > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > > >>
> > > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > > >
> > > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > > for each poll doesn't sound a common pattern to me.
> > > 
> > > Setting a timeout is definitely a common pattern, hence this thread.
> > > But the current timeout interface sucks, and people should really use
> > > absolute time.  (My epoll software uses absolute time.)  But then
> > > users need to decide whether to have their timeout based on the
> > > monotonic clock or the realtime clock (or something else entirely).
> > > Some bigger programs may want both -- they may have internal events
> > > queued for certain times and for certain timeouts, and those should
> > > use realtime and monotonic respectively.  Heck, users may also want
> > > separate slack values on those.
> > > 
> > > Timerfd is the only thing we have right now that is anywhere near
> > > flexible enough.  Obviously if epoll became fancy enough, then we
> > > could do away with the timerfd entirely here.
> > > 
> > > >
> > > >>
> > > >> >
> > > >> >>
> > > >> >> Would this make sense?  It could look like:
> > > >> >>
> > > >> >> int epoll_mod_and_pwait(int epfd,
> > > >> >>   struct epoll_event *events, int maxevents,
> > > >> >>   struct epoll_command *commands, int ncommands,
> > > >> >>   const sigset_t *sigmask);
> > > >> >
> > > >> > What about flags?
> > > >> >
> > > >>
> > > >> No room.  Maybe it should just be a struct for everything instead of
> > > >> separate args.
> > > >
> > > > Also no room for timeout. A single struct sounds the only way to go.
> > > 
> > > That's what timerfd is for.  I think it would be a bit weird to
> > > support "timeout" and detailed timerfd control.
> > 
> > I see what you mean. Thanks.
> > 
> > I still don't like hooking timerfd in the interface. Besides the unclean
> > interface, it also feels cubersome and overkill to let users setup and add a
> > dedicated timerfd to implement timeout.
> > 
> > How about this:
> > 
> > int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> > 
> > struct epoll_mod_wait_data {
> > 	struct epoll_event *events;
> > 	int maxevents;
> > 	struct epoll_mod_cmd {
> > 		int op,
> > 		int fd;
> > 		void *data;
> > 	} *cmds;
> > 	int ncmds;
> > 	int flags;
> > 	sigset_t *sigmask;
> > };
> > 
> > Commands ops are:
> > 
> > 	EPOLL_CTL_ADD
> > 		@fd is the fd to modify; @data is epoll_event.
> > 	EPOLL_CTL_MOD
> > 		@fd is the fd to modify; @data is epoll_event.
> > 	EPOLL_CTL_DEL
> > 		@fd is the fd to modify; @data is epoll_event.
> > 
> > 	EPOLL_CTL_SET_TIMEOUT
> > 		@fd is ignored, @data is timespec.
> > 		Clock type and relative/absolute are selected by flags as below.
> > 
> > Flags are given to override timeout defaults:
> > 	EPOLL_FL_MONOTONIC_CLOCK
> > 		If set, don't use realtime clock, use monotonic clock.
> > 	EPOLL_FL_ABSOLUTE_TIMEOUT
> > 		If set, don't use relative timeout, use absolute timeout.
> 
> I'd suggest using an "int clockid" field instead, like timerfd_settime;
> even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
> extending in the future, it'd be painful to have to remap new CLOCK_*
> constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
> in favor of timerfds makes things more nicely orthogonal, but epoll_wait
> already has a timeout parameter, so *shrug*.)
> 
> Also, I think that structure has too many levels of indirection; it'd
> produce many unnecessary cache misses; considering you're trying to
> eliminate the overhead of one or two extra syscalls, you don't want to
> introduce a pile of unnecessary cache misses in the processes.  I'd
> suggest inlining cmds as an array at the end of the structure, and
> turning "void *data" into an inline epoll_event.  (Or, you could use
> "events" as an in/out parameter.)
> 
> You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
> timespec directly in the top-level structure.
> 
> And I'd suggest either making flags a top-level parameter or putting it
> at the start of the structure, to make future extension easier.

Makes sense to me, thanks.

Also the number of cmds are undecided until we do a copy_from_user for the
header fields before another one for specified number of cmds. So I think it's
better to move ncmds and cmds to top level parameter.

Fam

> 
> </bikeshed>
> 
> - Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
  2015-01-12  8:24                   ` Fam Zheng
  (?)
@ 2015-01-12 10:08                     ` Josh Triplett
  -1 siblings, 0 replies; 39+ messages in thread
From: Josh Triplett @ 2015-01-12 10:08 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, David S. Miller,
	Linux FS Devel, Linux API, Paolo Bonzini, Stefan Hajnoczi,
	Michael Kerrisk (man-pages)

On Mon, Jan 12, 2015 at 04:24:00PM +0800, Fam Zheng wrote:
> On Thu, 01/08 21:21, Josh Triplett wrote:
> > On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> > > On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > > > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > > > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > > > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > > > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > > > >> >> list of epoll-related things and then waits.  The list of things could
> > > > >> >> include, at least:
> > > > >> >>
> > > > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > > > >> >> want to turn on and off their requests for events on a somewhat
> > > > >> >> regular basis.
> > > > >> >
> > > > >> > This sounds good to me.
> > > > >> >
> > > > >> >>
> > > > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > > > >> >> adjust *both* monotonic and real-time wakeups.
> > > > >> >
> > > > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > > > >>
> > > > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > > > >
> > > > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > > > for each poll doesn't sound a common pattern to me.
> > > > 
> > > > Setting a timeout is definitely a common pattern, hence this thread.
> > > > But the current timeout interface sucks, and people should really use
> > > > absolute time.  (My epoll software uses absolute time.)  But then
> > > > users need to decide whether to have their timeout based on the
> > > > monotonic clock or the realtime clock (or something else entirely).
> > > > Some bigger programs may want both -- they may have internal events
> > > > queued for certain times and for certain timeouts, and those should
> > > > use realtime and monotonic respectively.  Heck, users may also want
> > > > separate slack values on those.
> > > > 
> > > > Timerfd is the only thing we have right now that is anywhere near
> > > > flexible enough.  Obviously if epoll became fancy enough, then we
> > > > could do away with the timerfd entirely here.
> > > > 
> > > > >
> > > > >>
> > > > >> >
> > > > >> >>
> > > > >> >> Would this make sense?  It could look like:
> > > > >> >>
> > > > >> >> int epoll_mod_and_pwait(int epfd,
> > > > >> >>   struct epoll_event *events, int maxevents,
> > > > >> >>   struct epoll_command *commands, int ncommands,
> > > > >> >>   const sigset_t *sigmask);
> > > > >> >
> > > > >> > What about flags?
> > > > >> >
> > > > >>
> > > > >> No room.  Maybe it should just be a struct for everything instead of
> > > > >> separate args.
> > > > >
> > > > > Also no room for timeout. A single struct sounds the only way to go.
> > > > 
> > > > That's what timerfd is for.  I think it would be a bit weird to
> > > > support "timeout" and detailed timerfd control.
> > > 
> > > I see what you mean. Thanks.
> > > 
> > > I still don't like hooking timerfd in the interface. Besides the unclean
> > > interface, it also feels cubersome and overkill to let users setup and add a
> > > dedicated timerfd to implement timeout.
> > > 
> > > How about this:
> > > 
> > > int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> > > 
> > > struct epoll_mod_wait_data {
> > > 	struct epoll_event *events;
> > > 	int maxevents;
> > > 	struct epoll_mod_cmd {
> > > 		int op,
> > > 		int fd;
> > > 		void *data;
> > > 	} *cmds;
> > > 	int ncmds;
> > > 	int flags;
> > > 	sigset_t *sigmask;
> > > };
> > > 
> > > Commands ops are:
> > > 
> > > 	EPOLL_CTL_ADD
> > > 		@fd is the fd to modify; @data is epoll_event.
> > > 	EPOLL_CTL_MOD
> > > 		@fd is the fd to modify; @data is epoll_event.
> > > 	EPOLL_CTL_DEL
> > > 		@fd is the fd to modify; @data is epoll_event.
> > > 
> > > 	EPOLL_CTL_SET_TIMEOUT
> > > 		@fd is ignored, @data is timespec.
> > > 		Clock type and relative/absolute are selected by flags as below.
> > > 
> > > Flags are given to override timeout defaults:
> > > 	EPOLL_FL_MONOTONIC_CLOCK
> > > 		If set, don't use realtime clock, use monotonic clock.
> > > 	EPOLL_FL_ABSOLUTE_TIMEOUT
> > > 		If set, don't use relative timeout, use absolute timeout.
> > 
> > I'd suggest using an "int clockid" field instead, like timerfd_settime;
> > even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
> > extending in the future, it'd be painful to have to remap new CLOCK_*
> > constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
> > in favor of timerfds makes things more nicely orthogonal, but epoll_wait
> > already has a timeout parameter, so *shrug*.)
> > 
> > Also, I think that structure has too many levels of indirection; it'd
> > produce many unnecessary cache misses; considering you're trying to
> > eliminate the overhead of one or two extra syscalls, you don't want to
> > introduce a pile of unnecessary cache misses in the processes.  I'd
> > suggest inlining cmds as an array at the end of the structure, and
> > turning "void *data" into an inline epoll_event.  (Or, you could use
> > "events" as an in/out parameter.)
> > 
> > You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
> > timespec directly in the top-level structure.
> > 
> > And I'd suggest either making flags a top-level parameter or putting it
> > at the start of the structure, to make future extension easier.
> 
> Makes sense to me, thanks.
> 
> Also the number of cmds are undecided until we do a copy_from_user for the
> header fields before another one for specified number of cmds. So I think it's
> better to move ncmds and cmds to top level parameter.

That seems like an even better idea, yeah.

- Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-12 10:08                     ` Josh Triplett
  0 siblings, 0 replies; 39+ messages in thread
From: Josh Triplett @ 2015-01-12 10:08 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov, Mathieu

On Mon, Jan 12, 2015 at 04:24:00PM +0800, Fam Zheng wrote:
> On Thu, 01/08 21:21, Josh Triplett wrote:
> > On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> > > On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > > > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > > > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > > > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > > > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > > > >> >> list of epoll-related things and then waits.  The list of things could
> > > > >> >> include, at least:
> > > > >> >>
> > > > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > > > >> >> want to turn on and off their requests for events on a somewhat
> > > > >> >> regular basis.
> > > > >> >
> > > > >> > This sounds good to me.
> > > > >> >
> > > > >> >>
> > > > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > > > >> >> adjust *both* monotonic and real-time wakeups.
> > > > >> >
> > > > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > > > >>
> > > > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > > > >
> > > > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > > > for each poll doesn't sound a common pattern to me.
> > > > 
> > > > Setting a timeout is definitely a common pattern, hence this thread.
> > > > But the current timeout interface sucks, and people should really use
> > > > absolute time.  (My epoll software uses absolute time.)  But then
> > > > users need to decide whether to have their timeout based on the
> > > > monotonic clock or the realtime clock (or something else entirely).
> > > > Some bigger programs may want both -- they may have internal events
> > > > queued for certain times and for certain timeouts, and those should
> > > > use realtime and monotonic respectively.  Heck, users may also want
> > > > separate slack values on those.
> > > > 
> > > > Timerfd is the only thing we have right now that is anywhere near
> > > > flexible enough.  Obviously if epoll became fancy enough, then we
> > > > could do away with the timerfd entirely here.
> > > > 
> > > > >
> > > > >>
> > > > >> >
> > > > >> >>
> > > > >> >> Would this make sense?  It could look like:
> > > > >> >>
> > > > >> >> int epoll_mod_and_pwait(int epfd,
> > > > >> >>   struct epoll_event *events, int maxevents,
> > > > >> >>   struct epoll_command *commands, int ncommands,
> > > > >> >>   const sigset_t *sigmask);
> > > > >> >
> > > > >> > What about flags?
> > > > >> >
> > > > >>
> > > > >> No room.  Maybe it should just be a struct for everything instead of
> > > > >> separate args.
> > > > >
> > > > > Also no room for timeout. A single struct sounds the only way to go.
> > > > 
> > > > That's what timerfd is for.  I think it would be a bit weird to
> > > > support "timeout" and detailed timerfd control.
> > > 
> > > I see what you mean. Thanks.
> > > 
> > > I still don't like hooking timerfd in the interface. Besides the unclean
> > > interface, it also feels cubersome and overkill to let users setup and add a
> > > dedicated timerfd to implement timeout.
> > > 
> > > How about this:
> > > 
> > > int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> > > 
> > > struct epoll_mod_wait_data {
> > > 	struct epoll_event *events;
> > > 	int maxevents;
> > > 	struct epoll_mod_cmd {
> > > 		int op,
> > > 		int fd;
> > > 		void *data;
> > > 	} *cmds;
> > > 	int ncmds;
> > > 	int flags;
> > > 	sigset_t *sigmask;
> > > };
> > > 
> > > Commands ops are:
> > > 
> > > 	EPOLL_CTL_ADD
> > > 		@fd is the fd to modify; @data is epoll_event.
> > > 	EPOLL_CTL_MOD
> > > 		@fd is the fd to modify; @data is epoll_event.
> > > 	EPOLL_CTL_DEL
> > > 		@fd is the fd to modify; @data is epoll_event.
> > > 
> > > 	EPOLL_CTL_SET_TIMEOUT
> > > 		@fd is ignored, @data is timespec.
> > > 		Clock type and relative/absolute are selected by flags as below.
> > > 
> > > Flags are given to override timeout defaults:
> > > 	EPOLL_FL_MONOTONIC_CLOCK
> > > 		If set, don't use realtime clock, use monotonic clock.
> > > 	EPOLL_FL_ABSOLUTE_TIMEOUT
> > > 		If set, don't use relative timeout, use absolute timeout.
> > 
> > I'd suggest using an "int clockid" field instead, like timerfd_settime;
> > even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
> > extending in the future, it'd be painful to have to remap new CLOCK_*
> > constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
> > in favor of timerfds makes things more nicely orthogonal, but epoll_wait
> > already has a timeout parameter, so *shrug*.)
> > 
> > Also, I think that structure has too many levels of indirection; it'd
> > produce many unnecessary cache misses; considering you're trying to
> > eliminate the overhead of one or two extra syscalls, you don't want to
> > introduce a pile of unnecessary cache misses in the processes.  I'd
> > suggest inlining cmds as an array at the end of the structure, and
> > turning "void *data" into an inline epoll_event.  (Or, you could use
> > "events" as an in/out parameter.)
> > 
> > You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
> > timespec directly in the top-level structure.
> > 
> > And I'd suggest either making flags a top-level parameter or putting it
> > at the start of the structure, to make future extension easier.
> 
> Makes sense to me, thanks.
> 
> Also the number of cmds are undecided until we do a copy_from_user for the
> header fields before another one for specified number of cmds. So I think it's
> better to move ncmds and cmds to top level parameter.

That seems like an even better idea, yeah.

- Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-12 10:08                     ` Josh Triplett
  0 siblings, 0 replies; 39+ messages in thread
From: Josh Triplett @ 2015-01-12 10:08 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov, Mathieu

On Mon, Jan 12, 2015 at 04:24:00PM +0800, Fam Zheng wrote:
> On Thu, 01/08 21:21, Josh Triplett wrote:
> > On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> > > On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > > > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > > > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > > > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > > > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > > > >> >> list of epoll-related things and then waits.  The list of things could
> > > > >> >> include, at least:
> > > > >> >>
> > > > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > > > >> >> want to turn on and off their requests for events on a somewhat
> > > > >> >> regular basis.
> > > > >> >
> > > > >> > This sounds good to me.
> > > > >> >
> > > > >> >>
> > > > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > > > >> >> adjust *both* monotonic and real-time wakeups.
> > > > >> >
> > > > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > > > >>
> > > > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > > > >
> > > > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > > > for each poll doesn't sound a common pattern to me.
> > > > 
> > > > Setting a timeout is definitely a common pattern, hence this thread.
> > > > But the current timeout interface sucks, and people should really use
> > > > absolute time.  (My epoll software uses absolute time.)  But then
> > > > users need to decide whether to have their timeout based on the
> > > > monotonic clock or the realtime clock (or something else entirely).
> > > > Some bigger programs may want both -- they may have internal events
> > > > queued for certain times and for certain timeouts, and those should
> > > > use realtime and monotonic respectively.  Heck, users may also want
> > > > separate slack values on those.
> > > > 
> > > > Timerfd is the only thing we have right now that is anywhere near
> > > > flexible enough.  Obviously if epoll became fancy enough, then we
> > > > could do away with the timerfd entirely here.
> > > > 
> > > > >
> > > > >>
> > > > >> >
> > > > >> >>
> > > > >> >> Would this make sense?  It could look like:
> > > > >> >>
> > > > >> >> int epoll_mod_and_pwait(int epfd,
> > > > >> >>   struct epoll_event *events, int maxevents,
> > > > >> >>   struct epoll_command *commands, int ncommands,
> > > > >> >>   const sigset_t *sigmask);
> > > > >> >
> > > > >> > What about flags?
> > > > >> >
> > > > >>
> > > > >> No room.  Maybe it should just be a struct for everything instead of
> > > > >> separate args.
> > > > >
> > > > > Also no room for timeout. A single struct sounds the only way to go.
> > > > 
> > > > That's what timerfd is for.  I think it would be a bit weird to
> > > > support "timeout" and detailed timerfd control.
> > > 
> > > I see what you mean. Thanks.
> > > 
> > > I still don't like hooking timerfd in the interface. Besides the unclean
> > > interface, it also feels cubersome and overkill to let users setup and add a
> > > dedicated timerfd to implement timeout.
> > > 
> > > How about this:
> > > 
> > > int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> > > 
> > > struct epoll_mod_wait_data {
> > > 	struct epoll_event *events;
> > > 	int maxevents;
> > > 	struct epoll_mod_cmd {
> > > 		int op,
> > > 		int fd;
> > > 		void *data;
> > > 	} *cmds;
> > > 	int ncmds;
> > > 	int flags;
> > > 	sigset_t *sigmask;
> > > };
> > > 
> > > Commands ops are:
> > > 
> > > 	EPOLL_CTL_ADD
> > > 		@fd is the fd to modify; @data is epoll_event.
> > > 	EPOLL_CTL_MOD
> > > 		@fd is the fd to modify; @data is epoll_event.
> > > 	EPOLL_CTL_DEL
> > > 		@fd is the fd to modify; @data is epoll_event.
> > > 
> > > 	EPOLL_CTL_SET_TIMEOUT
> > > 		@fd is ignored, @data is timespec.
> > > 		Clock type and relative/absolute are selected by flags as below.
> > > 
> > > Flags are given to override timeout defaults:
> > > 	EPOLL_FL_MONOTONIC_CLOCK
> > > 		If set, don't use realtime clock, use monotonic clock.
> > > 	EPOLL_FL_ABSOLUTE_TIMEOUT
> > > 		If set, don't use relative timeout, use absolute timeout.
> > 
> > I'd suggest using an "int clockid" field instead, like timerfd_settime;
> > even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
> > extending in the future, it'd be painful to have to remap new CLOCK_*
> > constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
> > in favor of timerfds makes things more nicely orthogonal, but epoll_wait
> > already has a timeout parameter, so *shrug*.)
> > 
> > Also, I think that structure has too many levels of indirection; it'd
> > produce many unnecessary cache misses; considering you're trying to
> > eliminate the overhead of one or two extra syscalls, you don't want to
> > introduce a pile of unnecessary cache misses in the processes.  I'd
> > suggest inlining cmds as an array at the end of the structure, and
> > turning "void *data" into an inline epoll_event.  (Or, you could use
> > "events" as an in/out parameter.)
> > 
> > You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
> > timespec directly in the top-level structure.
> > 
> > And I'd suggest either making flags a top-level parameter or putting it
> > at the start of the structure, to make future extension easier.
> 
> Makes sense to me, thanks.
> 
> Also the number of cmds are undecided until we do a copy_from_user for the
> header fields before another one for specified number of cmds. So I think it's
> better to move ncmds and cmds to top level parameter.

That seems like an even better idea, yeah.

- Josh Triplett

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
  2015-01-12 10:08                     ` Josh Triplett
  (?)
@ 2015-01-12 13:23                       ` Fam Zheng
  -1 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-12 13:23 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, David S. Miller,
	Linux FS Devel, Linux API, Paolo Bonzini, Stefan Hajnoczi,
	Michael Kerrisk (man-pages)

On Mon, 01/12 02:08, Josh Triplett wrote:
> On Mon, Jan 12, 2015 at 04:24:00PM +0800, Fam Zheng wrote:
> > On Thu, 01/08 21:21, Josh Triplett wrote:
> > > On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> > > > On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > > > > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > > > > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > > > > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > > > > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > > > > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > > > > >> >> list of epoll-related things and then waits.  The list of things could
> > > > > >> >> include, at least:
> > > > > >> >>
> > > > > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > > > > >> >> want to turn on and off their requests for events on a somewhat
> > > > > >> >> regular basis.
> > > > > >> >
> > > > > >> > This sounds good to me.
> > > > > >> >
> > > > > >> >>
> > > > > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > > > > >> >> adjust *both* monotonic and real-time wakeups.
> > > > > >> >
> > > > > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > > > > >>
> > > > > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > > > > >
> > > > > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > > > > for each poll doesn't sound a common pattern to me.
> > > > > 
> > > > > Setting a timeout is definitely a common pattern, hence this thread.
> > > > > But the current timeout interface sucks, and people should really use
> > > > > absolute time.  (My epoll software uses absolute time.)  But then
> > > > > users need to decide whether to have their timeout based on the
> > > > > monotonic clock or the realtime clock (or something else entirely).
> > > > > Some bigger programs may want both -- they may have internal events
> > > > > queued for certain times and for certain timeouts, and those should
> > > > > use realtime and monotonic respectively.  Heck, users may also want
> > > > > separate slack values on those.
> > > > > 
> > > > > Timerfd is the only thing we have right now that is anywhere near
> > > > > flexible enough.  Obviously if epoll became fancy enough, then we
> > > > > could do away with the timerfd entirely here.
> > > > > 
> > > > > >
> > > > > >>
> > > > > >> >
> > > > > >> >>
> > > > > >> >> Would this make sense?  It could look like:
> > > > > >> >>
> > > > > >> >> int epoll_mod_and_pwait(int epfd,
> > > > > >> >>   struct epoll_event *events, int maxevents,
> > > > > >> >>   struct epoll_command *commands, int ncommands,
> > > > > >> >>   const sigset_t *sigmask);
> > > > > >> >
> > > > > >> > What about flags?
> > > > > >> >
> > > > > >>
> > > > > >> No room.  Maybe it should just be a struct for everything instead of
> > > > > >> separate args.
> > > > > >
> > > > > > Also no room for timeout. A single struct sounds the only way to go.
> > > > > 
> > > > > That's what timerfd is for.  I think it would be a bit weird to
> > > > > support "timeout" and detailed timerfd control.
> > > > 
> > > > I see what you mean. Thanks.
> > > > 
> > > > I still don't like hooking timerfd in the interface. Besides the unclean
> > > > interface, it also feels cubersome and overkill to let users setup and add a
> > > > dedicated timerfd to implement timeout.
> > > > 
> > > > How about this:
> > > > 
> > > > int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> > > > 
> > > > struct epoll_mod_wait_data {
> > > > 	struct epoll_event *events;
> > > > 	int maxevents;
> > > > 	struct epoll_mod_cmd {
> > > > 		int op,
> > > > 		int fd;
> > > > 		void *data;
> > > > 	} *cmds;
> > > > 	int ncmds;
> > > > 	int flags;
> > > > 	sigset_t *sigmask;
> > > > };
> > > > 
> > > > Commands ops are:
> > > > 
> > > > 	EPOLL_CTL_ADD
> > > > 		@fd is the fd to modify; @data is epoll_event.
> > > > 	EPOLL_CTL_MOD
> > > > 		@fd is the fd to modify; @data is epoll_event.
> > > > 	EPOLL_CTL_DEL
> > > > 		@fd is the fd to modify; @data is epoll_event.
> > > > 
> > > > 	EPOLL_CTL_SET_TIMEOUT
> > > > 		@fd is ignored, @data is timespec.
> > > > 		Clock type and relative/absolute are selected by flags as below.
> > > > 
> > > > Flags are given to override timeout defaults:
> > > > 	EPOLL_FL_MONOTONIC_CLOCK
> > > > 		If set, don't use realtime clock, use monotonic clock.
> > > > 	EPOLL_FL_ABSOLUTE_TIMEOUT
> > > > 		If set, don't use relative timeout, use absolute timeout.
> > > 
> > > I'd suggest using an "int clockid" field instead, like timerfd_settime;
> > > even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
> > > extending in the future, it'd be painful to have to remap new CLOCK_*
> > > constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
> > > in favor of timerfds makes things more nicely orthogonal, but epoll_wait
> > > already has a timeout parameter, so *shrug*.)
> > > 
> > > Also, I think that structure has too many levels of indirection; it'd
> > > produce many unnecessary cache misses; considering you're trying to
> > > eliminate the overhead of one or two extra syscalls, you don't want to
> > > introduce a pile of unnecessary cache misses in the processes.  I'd
> > > suggest inlining cmds as an array at the end of the structure, and
> > > turning "void *data" into an inline epoll_event.  (Or, you could use
> > > "events" as an in/out parameter.)
> > > 
> > > You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
> > > timespec directly in the top-level structure.
> > > 
> > > And I'd suggest either making flags a top-level parameter or putting it
> > > at the start of the structure, to make future extension easier.
> > 
> > Makes sense to me, thanks.
> > 
> > Also the number of cmds are undecided until we do a copy_from_user for the
> > header fields before another one for specified number of cmds. So I think it's
> > better to move ncmds and cmds to top level parameter.
> 
> That seems like an even better idea, yeah.
> 

One more question I'm not sure regarding the semantics: should we make the
syscall atomic?  I.e if one of the cmds failed even before wait, or if all the
cmds are executed, but the eventual wait failed, should we revert the commands'
effect? Or return overall result and result for each cmd if failed?  Or just
claim that it's possible for a first few cmds to be effective even on error?

It will be way more complicated to make it atomic, so I'd like to be clear what
we should do. Ideas?

Thanks,
Fam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-12 13:23                       ` Fam Zheng
  0 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-12 13:23 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Miklos Szeredi,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov, Mathieu

On Mon, 01/12 02:08, Josh Triplett wrote:
> On Mon, Jan 12, 2015 at 04:24:00PM +0800, Fam Zheng wrote:
> > On Thu, 01/08 21:21, Josh Triplett wrote:
> > > On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> > > > On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > > > > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > > > > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > > > > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > > > > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > > > > >> >> list of epoll-related things and then waits.  The list of things could
> > > > > >> >> include, at least:
> > > > > >> >>
> > > > > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > > > > >> >> want to turn on and off their requests for events on a somewhat
> > > > > >> >> regular basis.
> > > > > >> >
> > > > > >> > This sounds good to me.
> > > > > >> >
> > > > > >> >>
> > > > > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > > > > >> >> adjust *both* monotonic and real-time wakeups.
> > > > > >> >
> > > > > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > > > > >>
> > > > > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > > > > >
> > > > > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > > > > for each poll doesn't sound a common pattern to me.
> > > > > 
> > > > > Setting a timeout is definitely a common pattern, hence this thread.
> > > > > But the current timeout interface sucks, and people should really use
> > > > > absolute time.  (My epoll software uses absolute time.)  But then
> > > > > users need to decide whether to have their timeout based on the
> > > > > monotonic clock or the realtime clock (or something else entirely).
> > > > > Some bigger programs may want both -- they may have internal events
> > > > > queued for certain times and for certain timeouts, and those should
> > > > > use realtime and monotonic respectively.  Heck, users may also want
> > > > > separate slack values on those.
> > > > > 
> > > > > Timerfd is the only thing we have right now that is anywhere near
> > > > > flexible enough.  Obviously if epoll became fancy enough, then we
> > > > > could do away with the timerfd entirely here.
> > > > > 
> > > > > >
> > > > > >>
> > > > > >> >
> > > > > >> >>
> > > > > >> >> Would this make sense?  It could look like:
> > > > > >> >>
> > > > > >> >> int epoll_mod_and_pwait(int epfd,
> > > > > >> >>   struct epoll_event *events, int maxevents,
> > > > > >> >>   struct epoll_command *commands, int ncommands,
> > > > > >> >>   const sigset_t *sigmask);
> > > > > >> >
> > > > > >> > What about flags?
> > > > > >> >
> > > > > >>
> > > > > >> No room.  Maybe it should just be a struct for everything instead of
> > > > > >> separate args.
> > > > > >
> > > > > > Also no room for timeout. A single struct sounds the only way to go.
> > > > > 
> > > > > That's what timerfd is for.  I think it would be a bit weird to
> > > > > support "timeout" and detailed timerfd control.
> > > > 
> > > > I see what you mean. Thanks.
> > > > 
> > > > I still don't like hooking timerfd in the interface. Besides the unclean
> > > > interface, it also feels cubersome and overkill to let users setup and add a
> > > > dedicated timerfd to implement timeout.
> > > > 
> > > > How about this:
> > > > 
> > > > int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> > > > 
> > > > struct epoll_mod_wait_data {
> > > > 	struct epoll_event *events;
> > > > 	int maxevents;
> > > > 	struct epoll_mod_cmd {
> > > > 		int op,
> > > > 		int fd;
> > > > 		void *data;
> > > > 	} *cmds;
> > > > 	int ncmds;
> > > > 	int flags;
> > > > 	sigset_t *sigmask;
> > > > };
> > > > 
> > > > Commands ops are:
> > > > 
> > > > 	EPOLL_CTL_ADD
> > > > 		@fd is the fd to modify; @data is epoll_event.
> > > > 	EPOLL_CTL_MOD
> > > > 		@fd is the fd to modify; @data is epoll_event.
> > > > 	EPOLL_CTL_DEL
> > > > 		@fd is the fd to modify; @data is epoll_event.
> > > > 
> > > > 	EPOLL_CTL_SET_TIMEOUT
> > > > 		@fd is ignored, @data is timespec.
> > > > 		Clock type and relative/absolute are selected by flags as below.
> > > > 
> > > > Flags are given to override timeout defaults:
> > > > 	EPOLL_FL_MONOTONIC_CLOCK
> > > > 		If set, don't use realtime clock, use monotonic clock.
> > > > 	EPOLL_FL_ABSOLUTE_TIMEOUT
> > > > 		If set, don't use relative timeout, use absolute timeout.
> > > 
> > > I'd suggest using an "int clockid" field instead, like timerfd_settime;
> > > even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
> > > extending in the future, it'd be painful to have to remap new CLOCK_*
> > > constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
> > > in favor of timerfds makes things more nicely orthogonal, but epoll_wait
> > > already has a timeout parameter, so *shrug*.)
> > > 
> > > Also, I think that structure has too many levels of indirection; it'd
> > > produce many unnecessary cache misses; considering you're trying to
> > > eliminate the overhead of one or two extra syscalls, you don't want to
> > > introduce a pile of unnecessary cache misses in the processes.  I'd
> > > suggest inlining cmds as an array at the end of the structure, and
> > > turning "void *data" into an inline epoll_event.  (Or, you could use
> > > "events" as an in/out parameter.)
> > > 
> > > You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
> > > timespec directly in the top-level structure.
> > > 
> > > And I'd suggest either making flags a top-level parameter or putting it
> > > at the start of the structure, to make future extension easier.
> > 
> > Makes sense to me, thanks.
> > 
> > Also the number of cmds are undecided until we do a copy_from_user for the
> > header fields before another one for specified number of cmds. So I think it's
> > better to move ncmds and cmds to top level parameter.
> 
> That seems like an even better idea, yeah.
> 

One more question I'm not sure regarding the semantics: should we make the
syscall atomic?  I.e if one of the cmds failed even before wait, or if all the
cmds are executed, but the eventual wait failed, should we revert the commands'
effect? Or return overall result and result for each cmd if failed?  Or just
claim that it's possible for a first few cmds to be effective even on error?

It will be way more complicated to make it atomic, so I'd like to be clear what
we should do. Ideas?

Thanks,
Fam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-12 13:23                       ` Fam Zheng
  0 siblings, 0 replies; 39+ messages in thread
From: Fam Zheng @ 2015-01-12 13:23 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Miklos Szeredi,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov, Mathieu

On Mon, 01/12 02:08, Josh Triplett wrote:
> On Mon, Jan 12, 2015 at 04:24:00PM +0800, Fam Zheng wrote:
> > On Thu, 01/08 21:21, Josh Triplett wrote:
> > > On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> > > > On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > > > > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > > > > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > > > > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > > > > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > > > > >> >> list of epoll-related things and then waits.  The list of things could
> > > > > >> >> include, at least:
> > > > > >> >>
> > > > > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > > > > >> >> want to turn on and off their requests for events on a somewhat
> > > > > >> >> regular basis.
> > > > > >> >
> > > > > >> > This sounds good to me.
> > > > > >> >
> > > > > >> >>
> > > > > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > > > > >> >> adjust *both* monotonic and real-time wakeups.
> > > > > >> >
> > > > > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > > > > >>
> > > > > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > > > > >
> > > > > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > > > > for each poll doesn't sound a common pattern to me.
> > > > > 
> > > > > Setting a timeout is definitely a common pattern, hence this thread.
> > > > > But the current timeout interface sucks, and people should really use
> > > > > absolute time.  (My epoll software uses absolute time.)  But then
> > > > > users need to decide whether to have their timeout based on the
> > > > > monotonic clock or the realtime clock (or something else entirely).
> > > > > Some bigger programs may want both -- they may have internal events
> > > > > queued for certain times and for certain timeouts, and those should
> > > > > use realtime and monotonic respectively.  Heck, users may also want
> > > > > separate slack values on those.
> > > > > 
> > > > > Timerfd is the only thing we have right now that is anywhere near
> > > > > flexible enough.  Obviously if epoll became fancy enough, then we
> > > > > could do away with the timerfd entirely here.
> > > > > 
> > > > > >
> > > > > >>
> > > > > >> >
> > > > > >> >>
> > > > > >> >> Would this make sense?  It could look like:
> > > > > >> >>
> > > > > >> >> int epoll_mod_and_pwait(int epfd,
> > > > > >> >>   struct epoll_event *events, int maxevents,
> > > > > >> >>   struct epoll_command *commands, int ncommands,
> > > > > >> >>   const sigset_t *sigmask);
> > > > > >> >
> > > > > >> > What about flags?
> > > > > >> >
> > > > > >>
> > > > > >> No room.  Maybe it should just be a struct for everything instead of
> > > > > >> separate args.
> > > > > >
> > > > > > Also no room for timeout. A single struct sounds the only way to go.
> > > > > 
> > > > > That's what timerfd is for.  I think it would be a bit weird to
> > > > > support "timeout" and detailed timerfd control.
> > > > 
> > > > I see what you mean. Thanks.
> > > > 
> > > > I still don't like hooking timerfd in the interface. Besides the unclean
> > > > interface, it also feels cubersome and overkill to let users setup and add a
> > > > dedicated timerfd to implement timeout.
> > > > 
> > > > How about this:
> > > > 
> > > > int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> > > > 
> > > > struct epoll_mod_wait_data {
> > > > 	struct epoll_event *events;
> > > > 	int maxevents;
> > > > 	struct epoll_mod_cmd {
> > > > 		int op,
> > > > 		int fd;
> > > > 		void *data;
> > > > 	} *cmds;
> > > > 	int ncmds;
> > > > 	int flags;
> > > > 	sigset_t *sigmask;
> > > > };
> > > > 
> > > > Commands ops are:
> > > > 
> > > > 	EPOLL_CTL_ADD
> > > > 		@fd is the fd to modify; @data is epoll_event.
> > > > 	EPOLL_CTL_MOD
> > > > 		@fd is the fd to modify; @data is epoll_event.
> > > > 	EPOLL_CTL_DEL
> > > > 		@fd is the fd to modify; @data is epoll_event.
> > > > 
> > > > 	EPOLL_CTL_SET_TIMEOUT
> > > > 		@fd is ignored, @data is timespec.
> > > > 		Clock type and relative/absolute are selected by flags as below.
> > > > 
> > > > Flags are given to override timeout defaults:
> > > > 	EPOLL_FL_MONOTONIC_CLOCK
> > > > 		If set, don't use realtime clock, use monotonic clock.
> > > > 	EPOLL_FL_ABSOLUTE_TIMEOUT
> > > > 		If set, don't use relative timeout, use absolute timeout.
> > > 
> > > I'd suggest using an "int clockid" field instead, like timerfd_settime;
> > > even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
> > > extending in the future, it'd be painful to have to remap new CLOCK_*
> > > constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
> > > in favor of timerfds makes things more nicely orthogonal, but epoll_wait
> > > already has a timeout parameter, so *shrug*.)
> > > 
> > > Also, I think that structure has too many levels of indirection; it'd
> > > produce many unnecessary cache misses; considering you're trying to
> > > eliminate the overhead of one or two extra syscalls, you don't want to
> > > introduce a pile of unnecessary cache misses in the processes.  I'd
> > > suggest inlining cmds as an array at the end of the structure, and
> > > turning "void *data" into an inline epoll_event.  (Or, you could use
> > > "events" as an in/out parameter.)
> > > 
> > > You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
> > > timespec directly in the top-level structure.
> > > 
> > > And I'd suggest either making flags a top-level parameter or putting it
> > > at the start of the structure, to make future extension easier.
> > 
> > Makes sense to me, thanks.
> > 
> > Also the number of cmds are undecided until we do a copy_from_user for the
> > header fields before another one for specified number of cmds. So I think it's
> > better to move ncmds and cmds to top level parameter.
> 
> That seems like an even better idea, yeah.
> 

One more question I'm not sure regarding the semantics: should we make the
syscall atomic?  I.e if one of the cmds failed even before wait, or if all the
cmds are executed, but the eventual wait failed, should we revert the commands'
effect? Or return overall result and result for each cmd if failed?  Or just
claim that it's possible for a first few cmds to be effective even on error?

It will be way more complicated to make it atomic, so I'd like to be clear what
we should do. Ideas?

Thanks,
Fam

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-01-12 13:24 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1420705550-24245-1-git-send-email-famz@redhat.com>
2015-01-08  9:12 ` [PATCH 0/3] epoll: Add epoll_pwait1 syscall Miklos Szeredi
2015-01-08  9:12   ` Miklos Szeredi
2015-01-08  9:12   ` Miklos Szeredi
2015-01-08 11:07   ` Michael Kerrisk (man-pages)
2015-01-08 11:07     ` Michael Kerrisk (man-pages)
2015-01-08 11:07     ` Michael Kerrisk (man-pages)
2015-01-08 17:57   ` Andy Lutomirski
2015-01-08 17:57     ` Andy Lutomirski
2015-01-08 18:42     ` josh
2015-01-08 18:42       ` josh-iaAMLnmF4UmaiuxdJuQwMA
2015-01-08 18:42       ` josh-iaAMLnmF4UmaiuxdJuQwMA
2015-01-08 19:31       ` Alexei Starovoitov
2015-01-08 19:31         ` Alexei Starovoitov
2015-01-08 19:31         ` Alexei Starovoitov
2015-01-08 19:42       ` Andy Lutomirski
2015-01-08 19:42         ` Andy Lutomirski
2015-01-08 19:42         ` Andy Lutomirski
2015-01-09  1:25     ` Fam Zheng
2015-01-09  1:25       ` Fam Zheng
2015-01-09  1:28       ` Andy Lutomirski
2015-01-09  1:28         ` Andy Lutomirski
2015-01-09  1:52         ` Fam Zheng
2015-01-09  1:52           ` Fam Zheng
2015-01-09  2:24           ` Andy Lutomirski
2015-01-09  2:24             ` Andy Lutomirski
2015-01-09  4:49             ` Fam Zheng
2015-01-09  4:49               ` Fam Zheng
2015-01-09  5:21               ` Josh Triplett
2015-01-09  5:21                 ` Josh Triplett
2015-01-09  5:21                 ` Josh Triplett
2015-01-12  8:24                 ` Fam Zheng
2015-01-12  8:24                   ` Fam Zheng
2015-01-12  8:24                   ` Fam Zheng
2015-01-12 10:08                   ` Josh Triplett
2015-01-12 10:08                     ` Josh Triplett
2015-01-12 10:08                     ` Josh Triplett
2015-01-12 13:23                     ` Fam Zheng
2015-01-12 13:23                       ` Fam Zheng
2015-01-12 13:23                       ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.