All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] epoll: add exclusive wakeups flag
@ 2015-12-08  3:23 Jason Baron
  2015-12-08  3:23   ` Jason Baron
  2016-01-28  7:16   ` Michael Kerrisk (man-pages)
  0 siblings, 2 replies; 31+ messages in thread
From: Jason Baron @ 2015-12-08  3:23 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, viro, mtk.manpages, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Hi,

Re-post of an old series addressing thundering herd issues when sharing
an event source fd amongst multiple epoll fds. Last posting was here
for reference: https://lkml.org/lkml/2015/2/25/56
 
The patch herein drops the core scheduler 'rotate' changes I had previously
proposed as this patch seems performant without those.

I was prompted to re-post this because Madars Vitolins reported some good
speedups with this patch using Enduro/X application. His writeup is here:
https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/

Thanks,

-Jason

Sample epoll_clt text:

EPOLLEXCLUSIVE
        Sets an exclusive wakeup mode for the epfd file descriptor that is
	being attached to the target file descriptor, fd. Thus, when an
	event occurs and multiple epfd file descriptors are attached to the
	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
	an event with epoll_wait(2). The default in this scenario (when
	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.

Jason Baron (1):
  epoll: add EPOLLEXCLUSIVE flag

 fs/eventpoll.c                 | 24 +++++++++++++++++++++---
 include/uapi/linux/eventpoll.h |  3 +++
 2 files changed, 24 insertions(+), 3 deletions(-)

-- 
2.6.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] epoll: add EPOLLEXCLUSIVE flag
@ 2015-12-08  3:23   ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2015-12-08  3:23 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, viro, mtk.manpages, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Currently, epoll file descriptors or epfds (the fd returned from
epoll_create[1]()) that are added to a shared wakeup source are always
added in a non-exclusive manner. This means that when we have multiple
epfds attached to a shared fd source they are all woken up. This creates
thundering herd type behavior.

Introduce a new 'EPOLLEXCLUSIVE' flag that can be passed as part of the
'event' argument during an epoll_ctl() EPOLL_CTL_ADD operation. This new
flag allows for exclusive wakeups when there are multiple epfds attached to
a shared fd event source.

The implementation walks the list of exclusive waiters, and queues an
event to each epfd, until it finds the first waiter that has threads
blocked on it via epoll_wait(). The idea is to search for threads which are
idle and ready to process the wakeup events. Thus, we queue an event to at
least 1 epfd, but may still potentially queue an event to all epfds that
are attached to the shared fd source.

Performance testing was done by Madars Vitolins using a modified version of
Enduro/X. The use of the 'EPOLLEXCLUSIVE' flag reduce the length of this
particular workload from 860s down to 24s.

Tested-by: Madars Vitolins <m@silodev.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
---
 fs/eventpoll.c                 | 24 +++++++++++++++++++++---
 include/uapi/linux/eventpoll.h |  3 +++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e009ca..ae1dbcf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -92,7 +92,7 @@
  */
 
 /* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)
 
 /* Maximum number of nesting allowed inside epoll sets */
 #define EP_MAX_NESTS 4
@@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	unsigned long flags;
 	struct epitem *epi = ep_item_from_wait(wait);
 	struct eventpoll *ep = epi->ep;
+	int ewake = 0;
 
 	if ((unsigned long)key & POLLFREE) {
 		ep_pwq_from_wait(wait)->whead = NULL;
@@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	 * Wake up ( if active ) both the eventpoll wait list and the ->poll()
 	 * wait list.
 	 */
-	if (waitqueue_active(&ep->wq))
+	if (waitqueue_active(&ep->wq)) {
+		ewake = 1;
 		wake_up_locked(&ep->wq);
+	}
 	if (waitqueue_active(&ep->poll_wait))
 		pwake++;
 
@@ -1078,6 +1081,9 @@ out_unlock:
 	if (pwake)
 		ep_poll_safewake(&ep->poll_wait);
 
+	if (epi->event.events & EPOLLEXCLUSIVE)
+		return ewake;
+
 	return 1;
 }
 
@@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 		init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
 		pwq->whead = whead;
 		pwq->base = epi;
-		add_wait_queue(whead, &pwq->wait);
+		if (epi->event.events & EPOLLEXCLUSIVE)
+			add_wait_queue_exclusive(whead, &pwq->wait);
+		else
+			add_wait_queue(whead, &pwq->wait);
 		list_add_tail(&pwq->llink, &epi->pwqlist);
 		epi->nwait++;
 	} else {
@@ -1862,6 +1871,15 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 		goto error_tgt_fput;
 
 	/*
+	 * epoll adds to the wakeup queue at EPOLL_CTL_ADD time only,
+	 * so EPOLLEXCLUSIVE is not allowed for a EPOLL_CTL_MOD operation.
+	 * Also, we do not currently supported nested exclusive wakeups.
+	 */
+	if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
+		(op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
+		goto error_tgt_fput;
+
+	/*
 	 * At this point it is safe to assume that the "private_data" contains
 	 * our own data structure.
 	 */
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bc81fb2..1c31549 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -26,6 +26,9 @@
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
 
+/* Set exclusive wakeup mode for the target file descriptor */
+#define EPOLLEXCLUSIVE (1 << 28)
+
 /*
  * Request the handling of system wakeup events so as to prevent system suspends
  * from happening while those events are being processed.
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH] epoll: add EPOLLEXCLUSIVE flag
@ 2015-12-08  3:23   ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2015-12-08  3:23 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	normalperson-rMlxZR9MS24, m, corbet-T1hC0tSOHrs,
	luto-kltTT9wpgjJwATOyAt5JVQ,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, hagen-GvnIQ6b/HdU,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Currently, epoll file descriptors or epfds (the fd returned from
epoll_create[1]()) that are added to a shared wakeup source are always
added in a non-exclusive manner. This means that when we have multiple
epfds attached to a shared fd source they are all woken up. This creates
thundering herd type behavior.

Introduce a new 'EPOLLEXCLUSIVE' flag that can be passed as part of the
'event' argument during an epoll_ctl() EPOLL_CTL_ADD operation. This new
flag allows for exclusive wakeups when there are multiple epfds attached to
a shared fd event source.

The implementation walks the list of exclusive waiters, and queues an
event to each epfd, until it finds the first waiter that has threads
blocked on it via epoll_wait(). The idea is to search for threads which are
idle and ready to process the wakeup events. Thus, we queue an event to at
least 1 epfd, but may still potentially queue an event to all epfds that
are attached to the shared fd source.

Performance testing was done by Madars Vitolins using a modified version of
Enduro/X. The use of the 'EPOLLEXCLUSIVE' flag reduce the length of this
particular workload from 860s down to 24s.

Tested-by: Madars Vitolins <m@silodev.com>
Signed-off-by: Jason Baron <jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
---
 fs/eventpoll.c                 | 24 +++++++++++++++++++++---
 include/uapi/linux/eventpoll.h |  3 +++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e009ca..ae1dbcf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -92,7 +92,7 @@
  */
 
 /* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)
 
 /* Maximum number of nesting allowed inside epoll sets */
 #define EP_MAX_NESTS 4
@@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	unsigned long flags;
 	struct epitem *epi = ep_item_from_wait(wait);
 	struct eventpoll *ep = epi->ep;
+	int ewake = 0;
 
 	if ((unsigned long)key & POLLFREE) {
 		ep_pwq_from_wait(wait)->whead = NULL;
@@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	 * Wake up ( if active ) both the eventpoll wait list and the ->poll()
 	 * wait list.
 	 */
-	if (waitqueue_active(&ep->wq))
+	if (waitqueue_active(&ep->wq)) {
+		ewake = 1;
 		wake_up_locked(&ep->wq);
+	}
 	if (waitqueue_active(&ep->poll_wait))
 		pwake++;
 
@@ -1078,6 +1081,9 @@ out_unlock:
 	if (pwake)
 		ep_poll_safewake(&ep->poll_wait);
 
+	if (epi->event.events & EPOLLEXCLUSIVE)
+		return ewake;
+
 	return 1;
 }
 
@@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 		init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
 		pwq->whead = whead;
 		pwq->base = epi;
-		add_wait_queue(whead, &pwq->wait);
+		if (epi->event.events & EPOLLEXCLUSIVE)
+			add_wait_queue_exclusive(whead, &pwq->wait);
+		else
+			add_wait_queue(whead, &pwq->wait);
 		list_add_tail(&pwq->llink, &epi->pwqlist);
 		epi->nwait++;
 	} else {
@@ -1862,6 +1871,15 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 		goto error_tgt_fput;
 
 	/*
+	 * epoll adds to the wakeup queue at EPOLL_CTL_ADD time only,
+	 * so EPOLLEXCLUSIVE is not allowed for a EPOLL_CTL_MOD operation.
+	 * Also, we do not currently supported nested exclusive wakeups.
+	 */
+	if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
+		(op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
+		goto error_tgt_fput;
+
+	/*
 	 * At this point it is safe to assume that the "private_data" contains
 	 * our own data structure.
 	 */
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bc81fb2..1c31549 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -26,6 +26,9 @@
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
 
+/* Set exclusive wakeup mode for the target file descriptor */
+#define EPOLLEXCLUSIVE (1 << 28)
+
 /*
  * Request the handling of system wakeup events so as to prevent system suspends
  * from happening while those events are being processed.
-- 
2.6.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-01-28  7:16   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-01-28  7:16 UTC (permalink / raw)
  To: Jason Baron, akpm
  Cc: mtk.manpages, mingo, peterz, viro, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Hi Jason,

On 12/08/2015 04:23 AM, Jason Baron wrote:
> Hi,
> 
> Re-post of an old series addressing thundering herd issues when sharing
> an event source fd amongst multiple epoll fds. Last posting was here
> for reference: https://lkml.org/lkml/2015/2/25/56
>  
> The patch herein drops the core scheduler 'rotate' changes I had previously
> proposed as this patch seems performant without those.
> 
> I was prompted to re-post this because Madars Vitolins reported some good
> speedups with this patch using Enduro/X application. His writeup is here:
> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
> 
> Thanks,
> 
> -Jason
> 
> Sample epoll_clt text:

Thanks for the proposed text. I have some questions about points
that are not quite clear to me.

> EPOLLEXCLUSIVE
>         Sets an exclusive wakeup mode for the epfd file descriptor that is
> 	being attached to the target file descriptor, fd. Thus, when an
> 	event occurs and multiple epfd file descriptors are attached to the
> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
> 	an event with epoll_wait(2). The default in this scenario (when
> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.

So, assuming an FD is present in the interest list of multiple (say 6)
epoll FDs, and some (say 3) of those attachments were done using
EPOLLEXCLUSVE. Which of the following statements are correct:

(a) It's guaranteed that *none* of the epoll FDs that did NOT specify
    EPOLLEXCLUSIVE will receive an event.

(b) It's guaranteed that *all* of the epoll FDs that did NOT specify
    EPOLLEXCLUSIVE will receive an event.

(c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
    will receive an event.

(d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
    an event, and it is indeterminate which one.

I suppose one point I'm trying to uncover in the above is: what is
the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
FD, or is it setting an attribute in the epoll "interest list" record
for that FD that affects notification behavior across all processes?

And then:

(1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
    disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
    the 'events' field set to 0)?

(2) The source code contains a comment "we do not currently supported 
    nested exclusive wakeups". Could you elaborate on this point? It
    sounds like something that should be documented.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-01-28  7:16   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-01-28  7:16 UTC (permalink / raw)
  To: Jason Baron, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, normalperson-rMlxZR9MS24, m,
	corbet-T1hC0tSOHrs, luto-kltTT9wpgjJwATOyAt5JVQ,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, hagen-GvnIQ6b/HdU,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi Jason,

On 12/08/2015 04:23 AM, Jason Baron wrote:
> Hi,
> 
> Re-post of an old series addressing thundering herd issues when sharing
> an event source fd amongst multiple epoll fds. Last posting was here
> for reference: https://lkml.org/lkml/2015/2/25/56
>  
> The patch herein drops the core scheduler 'rotate' changes I had previously
> proposed as this patch seems performant without those.
> 
> I was prompted to re-post this because Madars Vitolins reported some good
> speedups with this patch using Enduro/X application. His writeup is here:
> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
> 
> Thanks,
> 
> -Jason
> 
> Sample epoll_clt text:

Thanks for the proposed text. I have some questions about points
that are not quite clear to me.

> EPOLLEXCLUSIVE
>         Sets an exclusive wakeup mode for the epfd file descriptor that is
> 	being attached to the target file descriptor, fd. Thus, when an
> 	event occurs and multiple epfd file descriptors are attached to the
> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
> 	an event with epoll_wait(2). The default in this scenario (when
> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.

So, assuming an FD is present in the interest list of multiple (say 6)
epoll FDs, and some (say 3) of those attachments were done using
EPOLLEXCLUSVE. Which of the following statements are correct:

(a) It's guaranteed that *none* of the epoll FDs that did NOT specify
    EPOLLEXCLUSIVE will receive an event.

(b) It's guaranteed that *all* of the epoll FDs that did NOT specify
    EPOLLEXCLUSIVE will receive an event.

(c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
    will receive an event.

(d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
    an event, and it is indeterminate which one.

I suppose one point I'm trying to uncover in the above is: what is
the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
FD, or is it setting an attribute in the epoll "interest list" record
for that FD that affects notification behavior across all processes?

And then:

(1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
    disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
    the 'events' field set to 0)?

(2) The source code contains a comment "we do not currently supported 
    nested exclusive wakeups". Could you elaborate on this point? It
    sounds like something that should be documented.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
  2016-01-28  7:16   ` Michael Kerrisk (man-pages)
  (?)
@ 2016-01-28 17:57   ` Jason Baron
  2016-01-29  8:14     ` Michael Kerrisk (man-pages)
  -1 siblings, 1 reply; 31+ messages in thread
From: Jason Baron @ 2016-01-28 17:57 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), akpm
  Cc: mingo, peterz, viro, normalperson, m, corbet, luto, torvalds,
	hagen, linux-kernel, linux-fsdevel, linux-api

Hi,

On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
> Hi Jason,
> 
> On 12/08/2015 04:23 AM, Jason Baron wrote:
>> Hi,
>>
>> Re-post of an old series addressing thundering herd issues when sharing
>> an event source fd amongst multiple epoll fds. Last posting was here
>> for reference: https://lkml.org/lkml/2015/2/25/56
>>  
>> The patch herein drops the core scheduler 'rotate' changes I had previously
>> proposed as this patch seems performant without those.
>>
>> I was prompted to re-post this because Madars Vitolins reported some good
>> speedups with this patch using Enduro/X application. His writeup is here:
>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>
>> Thanks,
>>
>> -Jason
>>
>> Sample epoll_clt text:
> 
> Thanks for the proposed text. I have some questions about points
> that are not quite clear to me.
> 
>> EPOLLEXCLUSIVE
>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>> 	being attached to the target file descriptor, fd. Thus, when an
>> 	event occurs and multiple epfd file descriptors are attached to the
>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>> 	an event with epoll_wait(2). The default in this scenario (when
>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
> 
> So, assuming an FD is present in the interest list of multiple (say 6)
> epoll FDs, and some (say 3) of those attachments were done using
> EPOLLEXCLUSVE. Which of the following statements are correct:
> 
> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>     EPOLLEXCLUSIVE will receive an event.
> 
> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>     EPOLLEXCLUSIVE will receive an event.
> 
> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>     will receive an event.
> 
> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>     an event, and it is indeterminate which one.
> 

So b and c. All the non-exclusive adds will get it and at least 1 of the
exclusive adds will as well.

> I suppose one point I'm trying to uncover in the above is: what is
> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
> FD, or is it setting an attribute in the epoll "interest list" record
> for that FD that affects notification behavior across all processes?
>

Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
on epoll sets connected to fd that do not specify it.


> And then:
> 
> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>     the 'events' field set to 0)?
>

In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
guarantee further wakeups.

And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
would need to either re-arm the thread that set the 'events' field to 0
(by setting back to non-zero), or re-arm in at least one other thread
via EPOLL_CTL_MOD (or delete and add).

> (2) The source code contains a comment "we do not currently supported 
>     nested exclusive wakeups". Could you elaborate on this point? It
>     sounds like something that should be documented.

So I was just trying to say that we return -EINVAL if you try to do and
EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
returned via epoll_create().

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
  2016-01-28 17:57   ` Jason Baron
@ 2016-01-29  8:14     ` Michael Kerrisk (man-pages)
  2016-02-01 19:42         ` Jason Baron
  2016-03-10 18:53       ` Jason Baron
  0 siblings, 2 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-01-29  8:14 UTC (permalink / raw)
  To: Jason Baron, akpm
  Cc: mtk.manpages, mingo, peterz, viro, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Hello Jason,
On 01/28/2016 06:57 PM, Jason Baron wrote:
> Hi,
> 
> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>> Hi Jason,
>>
>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>> Hi,
>>>
>>> Re-post of an old series addressing thundering herd issues when sharing
>>> an event source fd amongst multiple epoll fds. Last posting was here
>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>  
>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>> proposed as this patch seems performant without those.
>>>
>>> I was prompted to re-post this because Madars Vitolins reported some good
>>> speedups with this patch using Enduro/X application. His writeup is here:
>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>
>>> Thanks,
>>>
>>> -Jason
>>>
>>> Sample epoll_clt text:
>>
>> Thanks for the proposed text. I have some questions about points
>> that are not quite clear to me.
>>
>>> EPOLLEXCLUSIVE
>>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>>> 	being attached to the target file descriptor, fd. Thus, when an
>>> 	event occurs and multiple epfd file descriptors are attached to the
>>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>> 	an event with epoll_wait(2). The default in this scenario (when
>>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>
>> So, assuming an FD is present in the interest list of multiple (say 6)
>> epoll FDs, and some (say 3) of those attachments were done using
>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>
>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>>     EPOLLEXCLUSIVE will receive an event.
>>
>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>>     EPOLLEXCLUSIVE will receive an event.
>>
>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>>     will receive an event.
>>
>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>>     an event, and it is indeterminate which one.
>>
> 
> So b and c. All the non-exclusive adds will get it and at least 1 of the
> exclusive adds will as well.

So is it fair to say that the expected use case is that all epoll sets
would use EPOLLEXCLUSIVE?

>> I suppose one point I'm trying to uncover in the above is: what is
>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>> FD, or is it setting an attribute in the epoll "interest list" record
>> for that FD that affects notification behavior across all processes?
>>
> 
> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
> on epoll sets connected to fd that do not specify it.
> 
> 
>> And then:
>>
>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>>     the 'events' field set to 0)?
>>
> 
> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
> guarantee further wakeups.
> 
> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
> would need to either re-arm the thread that set the 'events' field to 0
> (by setting back to non-zero), or re-arm in at least one other thread
> via EPOLL_CTL_MOD (or delete and add).

Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
the FD.

>> (2) The source code contains a comment "we do not currently supported 
>>     nested exclusive wakeups". Could you elaborate on this point? It
>>     sounds like something that should be documented.
> 
> So I was just trying to say that we return -EINVAL if you try to do and
> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
> returned via epoll_create().

Okay -- that definitely belongs in the man page.

I'll work up a text, but would like to get input about the "use case"
question above.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-02-01 19:42         ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2016-02-01 19:42 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), akpm, torvalds
  Cc: mingo, peterz, viro, normalperson, m, corbet, luto, hagen,
	linux-kernel, linux-fsdevel, linux-api



On 01/29/2016 03:14 AM, Michael Kerrisk (man-pages) wrote:
> Hello Jason,
> On 01/28/2016 06:57 PM, Jason Baron wrote:
>> Hi,
>>
>> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>>> Hi Jason,
>>>
>>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>>> Hi,
>>>>
>>>> Re-post of an old series addressing thundering herd issues when sharing
>>>> an event source fd amongst multiple epoll fds. Last posting was here
>>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>>  
>>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>>> proposed as this patch seems performant without those.
>>>>
>>>> I was prompted to re-post this because Madars Vitolins reported some good
>>>> speedups with this patch using Enduro/X application. His writeup is here:
>>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>>
>>>> Thanks,
>>>>
>>>> -Jason
>>>>
>>>> Sample epoll_clt text:
>>>
>>> Thanks for the proposed text. I have some questions about points
>>> that are not quite clear to me.
>>>
>>>> EPOLLEXCLUSIVE
>>>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>>>> 	being attached to the target file descriptor, fd. Thus, when an
>>>> 	event occurs and multiple epfd file descriptors are attached to the
>>>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>>> 	an event with epoll_wait(2). The default in this scenario (when
>>>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>>
>>> So, assuming an FD is present in the interest list of multiple (say 6)
>>> epoll FDs, and some (say 3) of those attachments were done using
>>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>>
>>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>>>     EPOLLEXCLUSIVE will receive an event.
>>>
>>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>>>     EPOLLEXCLUSIVE will receive an event.
>>>
>>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>>>     will receive an event.
>>>
>>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>>>     an event, and it is indeterminate which one.
>>>
>>
>> So b and c. All the non-exclusive adds will get it and at least 1 of the
>> exclusive adds will as well.
> 
> So is it fair to say that the expected use case is that all epoll sets
> would use EPOLLEXCLUSIVE?
> 
>>> I suppose one point I'm trying to uncover in the above is: what is
>>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>>> FD, or is it setting an attribute in the epoll "interest list" record
>>> for that FD that affects notification behavior across all processes?
>>>
>>
>> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
>> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
>> on epoll sets connected to fd that do not specify it.
>>
>>
>>> And then:
>>>
>>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>>>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>>>     the 'events' field set to 0)?
>>>
>>
>> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
>> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
>> guarantee further wakeups.
>>
>> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
>> would need to either re-arm the thread that set the 'events' field to 0
>> (by setting back to non-zero), or re-arm in at least one other thread
>> via EPOLL_CTL_MOD (or delete and add).
> 
> Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
> to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
> the FD.
> 
>>> (2) The source code contains a comment "we do not currently supported 
>>>     nested exclusive wakeups". Could you elaborate on this point? It
>>>     sounds like something that should be documented.
>>
>> So I was just trying to say that we return -EINVAL if you try to do and
>> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
>> returned via epoll_create().
> 
> Okay -- that definitely belongs in the man page.
> 
> I'll work up a text, but would like to get input about the "use case"
> question above.
> 

Hi Michael,

So the current EPOLLEXCLUSIVE interface (added in 4.5-rc1 as epoll: add
EPOLLEXCLUSIVE flag df0108c) is lacking as currently implemented and
that would affect the docs here.

The issue is that if epoll() waiters create differnt POLL* sets and
register them as exclusive against the same target fd, the current
implementation will stop waking any further waiters once it finds the
first idle waiter. This means that waiters will miss wakeups, in the
case the interest sets differ. The common use-case we've had so far for
this has set all the interest sets in the same way.

For example, when we wake up a pipe for reading we do:

wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM);

So if one epoll set or epfd is added with EPOLLEXCLUSVIE to pipe p with
POLLIN and a second set epfd2 is added to pipe p with EPOLLEXCLUSIVE |
POLLRDNORM, only epfd may receive the wakeup since the current
implementation will stop after it finds any intersection of events with
a waiter that is blocked in epoll_wait().

We could potentially address this by requiring all epoll waiters that
are added to p to be required to pass the same set of POLL* events. IE
the first EPOLL_CTL_ADD that passes EPOLLEXCLUSIVE establishes the set
POLL* flags to be used by any other epfds that are added as
EPOLLEXCLUSIVE. However, I think it might be a somewhat confusing
interface as we would have to reference count the number of users for
that set, and so userspace would have to keep track of that count, or we
would need a more complex interface....It also adds some shared state
that we'd have store somewhere. I don't think anybody will want to bloat
__wait_queue_head for this.

I think what we could do instead, is to simply restrict EPOLLEXCLUSIVE
such that it can only be specified with EPOLLIN and/or EPOLLOUT. So that
way if the wakeup includes 'POLLIN' and not 'POLLOUT', we can stop once
we hit the first idle waiter that specifies the EPOLLIN bit, since any
remaining waiters that only have 'POLLOUT' set wouldn't need to be
woken. Likewise, we can do the same thing if 'POLLOUT' is in the wakeup
bit set and not 'POLLIN'. If both 'POLLOUT' and 'POLLIN' are set the
wake bit set (there is at least one example of this I saw in fs/pipe.c),
then we just wake the entire exclusive list. Having both 'POLLOUT' and
'POLLIN' both set should not be on any performance critical path, so I
think that's ok (in fs/pipe.c its in pipe_release()).

Since epoll waiters are going to be interested in other events as well
besides POLLIN and POLLOUT, these can still be added by doing a 'dup()'
on the target fd and adding that as one normally would with
EPOLL_CTL_ADD. Since I think that the POLLIN and POLLOUT events are
really the only ones that we are interested in reducing wakeups for, the
'dup' thing would perhaps be added to only one of the waiter threads. So
this would look something like (rough pseudo-code):

int p[2], dup_p;
pipe(p);

for each thread do:
	epoll_ctl(epfd, EPOLL_CTL_ADD, p[0], EPOLLEXCLUSIVE | EPOLLIN);

dup_p = dup(p[0]);

pick one (or more threads) and do:
	epoll_ctl(epfd, EPOLL_CTL_ADD, dup_p, EPOLLERR | EPOLLHUP);


So I don't think that adds to much complexity to user-space here and
keeps the kernel implementation small. I think this change is really
about balancing POLLIN events (and maybe POLLOUT), and so having
user-space have to explicitly call that out I think is ok.

Kernel patch is below to hopefully explain the thing better.

Thanks,

-Jason


diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ae1dbcf..4f19793 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -94,6 +94,11 @@
 /* Epoll private bits inside the event mask */
 #define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET |
EPOLLEXCLUSIVE)

+#define EPOLLINOUT_BITS (POLLIN | POLLOUT)
+
+#define EPOLLEXCLUSIVE_OK_BITS (EPOLLINOUT_BITS | EPOLLWAKEUP | EPOLLET | \
+				EPOLLEXCLUSIVE)
+
 /* Maximum number of nesting allowed inside epoll sets */
 #define EP_MAX_NESTS 4

@@ -1068,7 +1073,8 @@ static int ep_poll_callback(wait_queue_t *wait,
unsigned mode, int sync, void *k
 	 * wait list.
 	 */
 	if (waitqueue_active(&ep->wq)) {
-		ewake = 1;
+		if ((((unsigned long)key & EPOLLINOUT_BITS) != EPOLLINOUT_BITS))
+			ewake = 1;
 		wake_up_locked(&ep->wq);
 	}
 	if (waitqueue_active(&ep->poll_wait))
@@ -1875,9 +1881,13 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
int, fd,
 	 * so EPOLLEXCLUSIVE is not allowed for a EPOLL_CTL_MOD operation.
 	 * Also, we do not currently supported nested exclusive wakeups.
 	 */
-	if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
-		(op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
-		goto error_tgt_fput;
+	if (epds.events & EPOLLEXCLUSIVE) {
+		if (op == EPOLL_CTL_MOD)
+			goto error_tgt_fput;
+		if (op == EPOLL_CTL_ADD && (is_file_epoll(tf.file) ||
+				(epds.events & ~EPOLLEXCLUSIVE_OK_BITS)))
+			goto error_tgt_fput;
+	}

 	/*
 	 * At this point it is safe to assume that the "private_data" contains
@@ -1935,7 +1945,8 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
int, fd,
 	switch (op) {
 	case EPOLL_CTL_ADD:
 		if (!epi) {
-			epds.events |= POLLERR | POLLHUP;
+			if (!(epds.events & EPOLLEXCLUSIVE))
+				epds.events |= POLLERR | POLLHUP;
 			error = ep_insert(ep, &epds, tf.file, fd, full_check);
 		} else
 			error = -EEXIST;
@@ -1950,8 +1961,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
int, fd,
 		break;
 	case EPOLL_CTL_MOD:
 		if (epi) {
-			epds.events |= POLLERR | POLLHUP;
-			error = ep_modify(ep, epi, &epds);
+			if (!(epi->event.events & EPOLLEXCLUSIVE)) {
+				epds.events |= POLLERR | POLLHUP;
+				error = ep_modify(ep, epi, &epds);
+			}
 		} else
 			error = -ENOENT;
 		break;
-- 
2.6.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-02-01 19:42         ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2016-02-01 19:42 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages),
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, normalperson-rMlxZR9MS24, m,
	corbet-T1hC0tSOHrs, luto-kltTT9wpgjJwATOyAt5JVQ,
	hagen-GvnIQ6b/HdU, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA



On 01/29/2016 03:14 AM, Michael Kerrisk (man-pages) wrote:
> Hello Jason,
> On 01/28/2016 06:57 PM, Jason Baron wrote:
>> Hi,
>>
>> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>>> Hi Jason,
>>>
>>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>>> Hi,
>>>>
>>>> Re-post of an old series addressing thundering herd issues when sharing
>>>> an event source fd amongst multiple epoll fds. Last posting was here
>>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>>  
>>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>>> proposed as this patch seems performant without those.
>>>>
>>>> I was prompted to re-post this because Madars Vitolins reported some good
>>>> speedups with this patch using Enduro/X application. His writeup is here:
>>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>>
>>>> Thanks,
>>>>
>>>> -Jason
>>>>
>>>> Sample epoll_clt text:
>>>
>>> Thanks for the proposed text. I have some questions about points
>>> that are not quite clear to me.
>>>
>>>> EPOLLEXCLUSIVE
>>>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>>>> 	being attached to the target file descriptor, fd. Thus, when an
>>>> 	event occurs and multiple epfd file descriptors are attached to the
>>>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>>> 	an event with epoll_wait(2). The default in this scenario (when
>>>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>>
>>> So, assuming an FD is present in the interest list of multiple (say 6)
>>> epoll FDs, and some (say 3) of those attachments were done using
>>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>>
>>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>>>     EPOLLEXCLUSIVE will receive an event.
>>>
>>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>>>     EPOLLEXCLUSIVE will receive an event.
>>>
>>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>>>     will receive an event.
>>>
>>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>>>     an event, and it is indeterminate which one.
>>>
>>
>> So b and c. All the non-exclusive adds will get it and at least 1 of the
>> exclusive adds will as well.
> 
> So is it fair to say that the expected use case is that all epoll sets
> would use EPOLLEXCLUSIVE?
> 
>>> I suppose one point I'm trying to uncover in the above is: what is
>>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>>> FD, or is it setting an attribute in the epoll "interest list" record
>>> for that FD that affects notification behavior across all processes?
>>>
>>
>> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
>> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
>> on epoll sets connected to fd that do not specify it.
>>
>>
>>> And then:
>>>
>>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>>>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>>>     the 'events' field set to 0)?
>>>
>>
>> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
>> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
>> guarantee further wakeups.
>>
>> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
>> would need to either re-arm the thread that set the 'events' field to 0
>> (by setting back to non-zero), or re-arm in at least one other thread
>> via EPOLL_CTL_MOD (or delete and add).
> 
> Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
> to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
> the FD.
> 
>>> (2) The source code contains a comment "we do not currently supported 
>>>     nested exclusive wakeups". Could you elaborate on this point? It
>>>     sounds like something that should be documented.
>>
>> So I was just trying to say that we return -EINVAL if you try to do and
>> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
>> returned via epoll_create().
> 
> Okay -- that definitely belongs in the man page.
> 
> I'll work up a text, but would like to get input about the "use case"
> question above.
> 

Hi Michael,

So the current EPOLLEXCLUSIVE interface (added in 4.5-rc1 as epoll: add
EPOLLEXCLUSIVE flag df0108c) is lacking as currently implemented and
that would affect the docs here.

The issue is that if epoll() waiters create differnt POLL* sets and
register them as exclusive against the same target fd, the current
implementation will stop waking any further waiters once it finds the
first idle waiter. This means that waiters will miss wakeups, in the
case the interest sets differ. The common use-case we've had so far for
this has set all the interest sets in the same way.

For example, when we wake up a pipe for reading we do:

wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM);

So if one epoll set or epfd is added with EPOLLEXCLUSVIE to pipe p with
POLLIN and a second set epfd2 is added to pipe p with EPOLLEXCLUSIVE |
POLLRDNORM, only epfd may receive the wakeup since the current
implementation will stop after it finds any intersection of events with
a waiter that is blocked in epoll_wait().

We could potentially address this by requiring all epoll waiters that
are added to p to be required to pass the same set of POLL* events. IE
the first EPOLL_CTL_ADD that passes EPOLLEXCLUSIVE establishes the set
POLL* flags to be used by any other epfds that are added as
EPOLLEXCLUSIVE. However, I think it might be a somewhat confusing
interface as we would have to reference count the number of users for
that set, and so userspace would have to keep track of that count, or we
would need a more complex interface....It also adds some shared state
that we'd have store somewhere. I don't think anybody will want to bloat
__wait_queue_head for this.

I think what we could do instead, is to simply restrict EPOLLEXCLUSIVE
such that it can only be specified with EPOLLIN and/or EPOLLOUT. So that
way if the wakeup includes 'POLLIN' and not 'POLLOUT', we can stop once
we hit the first idle waiter that specifies the EPOLLIN bit, since any
remaining waiters that only have 'POLLOUT' set wouldn't need to be
woken. Likewise, we can do the same thing if 'POLLOUT' is in the wakeup
bit set and not 'POLLIN'. If both 'POLLOUT' and 'POLLIN' are set the
wake bit set (there is at least one example of this I saw in fs/pipe.c),
then we just wake the entire exclusive list. Having both 'POLLOUT' and
'POLLIN' both set should not be on any performance critical path, so I
think that's ok (in fs/pipe.c its in pipe_release()).

Since epoll waiters are going to be interested in other events as well
besides POLLIN and POLLOUT, these can still be added by doing a 'dup()'
on the target fd and adding that as one normally would with
EPOLL_CTL_ADD. Since I think that the POLLIN and POLLOUT events are
really the only ones that we are interested in reducing wakeups for, the
'dup' thing would perhaps be added to only one of the waiter threads. So
this would look something like (rough pseudo-code):

int p[2], dup_p;
pipe(p);

for each thread do:
	epoll_ctl(epfd, EPOLL_CTL_ADD, p[0], EPOLLEXCLUSIVE | EPOLLIN);

dup_p = dup(p[0]);

pick one (or more threads) and do:
	epoll_ctl(epfd, EPOLL_CTL_ADD, dup_p, EPOLLERR | EPOLLHUP);


So I don't think that adds to much complexity to user-space here and
keeps the kernel implementation small. I think this change is really
about balancing POLLIN events (and maybe POLLOUT), and so having
user-space have to explicitly call that out I think is ok.

Kernel patch is below to hopefully explain the thing better.

Thanks,

-Jason


diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ae1dbcf..4f19793 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -94,6 +94,11 @@
 /* Epoll private bits inside the event mask */
 #define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET |
EPOLLEXCLUSIVE)

+#define EPOLLINOUT_BITS (POLLIN | POLLOUT)
+
+#define EPOLLEXCLUSIVE_OK_BITS (EPOLLINOUT_BITS | EPOLLWAKEUP | EPOLLET | \
+				EPOLLEXCLUSIVE)
+
 /* Maximum number of nesting allowed inside epoll sets */
 #define EP_MAX_NESTS 4

@@ -1068,7 +1073,8 @@ static int ep_poll_callback(wait_queue_t *wait,
unsigned mode, int sync, void *k
 	 * wait list.
 	 */
 	if (waitqueue_active(&ep->wq)) {
-		ewake = 1;
+		if ((((unsigned long)key & EPOLLINOUT_BITS) != EPOLLINOUT_BITS))
+			ewake = 1;
 		wake_up_locked(&ep->wq);
 	}
 	if (waitqueue_active(&ep->poll_wait))
@@ -1875,9 +1881,13 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
int, fd,
 	 * so EPOLLEXCLUSIVE is not allowed for a EPOLL_CTL_MOD operation.
 	 * Also, we do not currently supported nested exclusive wakeups.
 	 */
-	if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
-		(op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
-		goto error_tgt_fput;
+	if (epds.events & EPOLLEXCLUSIVE) {
+		if (op == EPOLL_CTL_MOD)
+			goto error_tgt_fput;
+		if (op == EPOLL_CTL_ADD && (is_file_epoll(tf.file) ||
+				(epds.events & ~EPOLLEXCLUSIVE_OK_BITS)))
+			goto error_tgt_fput;
+	}

 	/*
 	 * At this point it is safe to assume that the "private_data" contains
@@ -1935,7 +1945,8 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
int, fd,
 	switch (op) {
 	case EPOLL_CTL_ADD:
 		if (!epi) {
-			epds.events |= POLLERR | POLLHUP;
+			if (!(epds.events & EPOLLEXCLUSIVE))
+				epds.events |= POLLERR | POLLHUP;
 			error = ep_insert(ep, &epds, tf.file, fd, full_check);
 		} else
 			error = -EEXIST;
@@ -1950,8 +1961,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
int, fd,
 		break;
 	case EPOLL_CTL_MOD:
 		if (epi) {
-			epds.events |= POLLERR | POLLHUP;
-			error = ep_modify(ep, epi, &epds);
+			if (!(epi->event.events & EPOLLEXCLUSIVE)) {
+				epds.events |= POLLERR | POLLHUP;
+				error = ep_modify(ep, epi, &epds);
+			}
 		} else
 			error = -ENOENT;
 		break;
-- 
2.6.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
  2016-01-29  8:14     ` Michael Kerrisk (man-pages)
  2016-02-01 19:42         ` Jason Baron
@ 2016-03-10 18:53       ` Jason Baron
  2016-03-10 19:47           ` Michael Kerrisk (man-pages)
  2016-03-10 19:58           ` Michael Kerrisk (man-pages)
  1 sibling, 2 replies; 31+ messages in thread
From: Jason Baron @ 2016-03-10 18:53 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), akpm
  Cc: mingo, peterz, viro, normalperson, m, corbet, luto, torvalds,
	hagen, linux-kernel, linux-fsdevel, linux-api

Hi Michael,

On 01/29/2016 03:14 AM, Michael Kerrisk (man-pages) wrote:
> Hello Jason,
> On 01/28/2016 06:57 PM, Jason Baron wrote:
>> Hi,
>>
>> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>>> Hi Jason,
>>>
>>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>>> Hi,
>>>>
>>>> Re-post of an old series addressing thundering herd issues when sharing
>>>> an event source fd amongst multiple epoll fds. Last posting was here
>>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>>  
>>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>>> proposed as this patch seems performant without those.
>>>>
>>>> I was prompted to re-post this because Madars Vitolins reported some good
>>>> speedups with this patch using Enduro/X application. His writeup is here:
>>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>>
>>>> Thanks,
>>>>
>>>> -Jason
>>>>
>>>> Sample epoll_clt text:
>>>
>>> Thanks for the proposed text. I have some questions about points
>>> that are not quite clear to me.
>>>
>>>> EPOLLEXCLUSIVE
>>>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>>>> 	being attached to the target file descriptor, fd. Thus, when an
>>>> 	event occurs and multiple epfd file descriptors are attached to the
>>>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>>> 	an event with epoll_wait(2). The default in this scenario (when
>>>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>>
>>> So, assuming an FD is present in the interest list of multiple (say 6)
>>> epoll FDs, and some (say 3) of those attachments were done using
>>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>>
>>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>>>     EPOLLEXCLUSIVE will receive an event.
>>>
>>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>>>     EPOLLEXCLUSIVE will receive an event.
>>>
>>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>>>     will receive an event.
>>>
>>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>>>     an event, and it is indeterminate which one.
>>>
>>
>> So b and c. All the non-exclusive adds will get it and at least 1 of the
>> exclusive adds will as well.
> 
> So is it fair to say that the expected use case is that all epoll sets
> would use EPOLLEXCLUSIVE?
> 
>>> I suppose one point I'm trying to uncover in the above is: what is
>>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>>> FD, or is it setting an attribute in the epoll "interest list" record
>>> for that FD that affects notification behavior across all processes?
>>>
>>
>> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
>> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
>> on epoll sets connected to fd that do not specify it.
>>
>>
>>> And then:
>>>
>>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>>>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>>>     the 'events' field set to 0)?
>>>
>>
>> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
>> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
>> guarantee further wakeups.
>>
>> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
>> would need to either re-arm the thread that set the 'events' field to 0
>> (by setting back to non-zero), or re-arm in at least one other thread
>> via EPOLL_CTL_MOD (or delete and add).
> 
> Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
> to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
> the FD.
> 
>>> (2) The source code contains a comment "we do not currently supported 
>>>     nested exclusive wakeups". Could you elaborate on this point? It
>>>     sounds like something that should be documented.
>>
>> So I was just trying to say that we return -EINVAL if you try to do and
>> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
>> returned via epoll_create().
> 
> Okay -- that definitely belongs in the man page.
> 
> I'll work up a text, but would like to get input about the "use case"
> question above.
> 
> Cheers,
> 
> Michael
> 
> 
> 

Ok, here's some updated text:

EPOLLEXCLUSIVE

Sets an exclusive wakeup mode for the epfd file descriptor that is being
attached to the target file descriptor, fd. When a wakeup event occurs
and multiple epfd file descriptors are attached to the same target file
using EPOLLEXCLUSIVE, one or more epfds will receive an event with
epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not
set) is for all epfds to receive an event.

The events supported by EPOLLEXCLUSIVE are: EPOLLIN, EPOLLOUT, EPOLLERR,
EPOLLHUP, EPOLLWAKEUP, and EPOLLET. epoll_wait(2) will always wait for
EPOLLERR and EPOLLHUP; it is not necessary to set it in events. If
EPOLLEXCLUSIVE is set using epoll_ctl(2), then a subsequent
EPOLL_CTL_MOD on the same epfd, fd pair will retrun -EINVAL. An
epoll_ctl(2) that specifies EPOLLEXCLUSIVE in events and specifies the
target file descriptor fd as an epoll instance will return -EINVAL
as well.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-10 19:47           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-10 19:47 UTC (permalink / raw)
  To: Daniel Borkmann, akpm
  Cc: mtk.manpages, mingo, peterz, viro, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Hi Jason,

> Ok, here's some updated text:
> 
> EPOLLEXCLUSIVE
> 
> Sets an exclusive wakeup mode for the epfd file descriptor that is being
> attached to the target file descriptor, fd. When a wakeup event occurs
> and multiple epfd file descriptors are attached to the same target file
> using EPOLLEXCLUSIVE, one or more epfds will receive an event with
> epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not
> set) is for all epfds to receive an event.
> 
> The events supported by EPOLLEXCLUSIVE are: EPOLLIN, EPOLLOUT, EPOLLERR,
> EPOLLHUP, EPOLLWAKEUP, and EPOLLET. epoll_wait(2) will always wait for
> EPOLLERR and EPOLLHUP; it is not necessary to set it in events. If
> EPOLLEXCLUSIVE is set using epoll_ctl(2), then a subsequent
> EPOLL_CTL_MOD on the same epfd, fd pair will retrun -EINVAL. An
> epoll_ctl(2) that specifies EPOLLEXCLUSIVE in events and specifies the
> target file descriptor fd as an epoll instance will return -EINVAL
> as well.

So, I worked that up into the following text:

       EPOLLEXCLUSIVE (since Linux 4.5)
              Sets  an  exclusive  wakeup  mode  for  the  epoll  file
              descriptor  that  is  being  attached to the target file
              descriptor, fd.  When a wakeup event occurs and multiple
              epoll  file  descriptors are attached to the same target
              file using EPOLLEXCLUSIVE, one or more of the epoll file
              descriptors  will  receive  an event with epoll_wait(2).
              The default in this scenario (when EPOLLEXCLUSIVE is not
              set)  is  for  all  epoll file descriptors to receive an
              event.  EPOLLEXCLUSIVE is thus useful for avoiding thun‐
              dering herd problems in certain scenarios.

              If  the  same  file  descriptor  is  in  multiple  epoll
              instances, some with the EPOLLEXCLUSIVE flag, and others
              without,   then   events  will  provided  to  all  epoll
              instances that did not specify  EPOLLEXCLUSIVE,  and  at
              least  one  of  the  epoll  instances  that  did specify
              EPOLLEXCLUSIVE.

              The following values may  be  specified  in  conjunction
              with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
              EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
              but  are  ignored (as usual).  Attempts to specify other
              values in events yield an error.  EPOLLEXCLUSIVE may  be
              used  only  in  an  EPOLL_CTL_ADD operation; attempts to
              employ  it  with  EPOLL_CTL_MOD  yield  an  error.    If
              EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
              quent EPOLL_CTL_MOD on the same epfd, fd pair yields  an
              error.  An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
              events and specifies the target file descriptor fd as an
              epoll  instance will likewise fail.  The error in all of
              these cases is EINVAL.

   ERRORS
       EINVAL An invalid event type was specified along with  EPOLLEX‐
              CLUSIVE in events.

       EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.

       EINVAL op  was  EPOLL_CTL_MOD  and  the EPOLLEXCLUSIVE flag has
              previously been applied to this epfd, fd pair.

       EINVAL EPOLLEXCLUSIVE was specified in event and fd  is  refers
              to an epoll instance.

Is there anything that needs to be fixed in the above text?

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-10 19:47           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-10 19:47 UTC (permalink / raw)
  To: Daniel Borkmann, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, normalperson-rMlxZR9MS24, m,
	corbet-T1hC0tSOHrs, luto-kltTT9wpgjJwATOyAt5JVQ,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, hagen-GvnIQ6b/HdU,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi Jason,

> Ok, here's some updated text:
> 
> EPOLLEXCLUSIVE
> 
> Sets an exclusive wakeup mode for the epfd file descriptor that is being
> attached to the target file descriptor, fd. When a wakeup event occurs
> and multiple epfd file descriptors are attached to the same target file
> using EPOLLEXCLUSIVE, one or more epfds will receive an event with
> epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not
> set) is for all epfds to receive an event.
> 
> The events supported by EPOLLEXCLUSIVE are: EPOLLIN, EPOLLOUT, EPOLLERR,
> EPOLLHUP, EPOLLWAKEUP, and EPOLLET. epoll_wait(2) will always wait for
> EPOLLERR and EPOLLHUP; it is not necessary to set it in events. If
> EPOLLEXCLUSIVE is set using epoll_ctl(2), then a subsequent
> EPOLL_CTL_MOD on the same epfd, fd pair will retrun -EINVAL. An
> epoll_ctl(2) that specifies EPOLLEXCLUSIVE in events and specifies the
> target file descriptor fd as an epoll instance will return -EINVAL
> as well.

So, I worked that up into the following text:

       EPOLLEXCLUSIVE (since Linux 4.5)
              Sets  an  exclusive  wakeup  mode  for  the  epoll  file
              descriptor  that  is  being  attached to the target file
              descriptor, fd.  When a wakeup event occurs and multiple
              epoll  file  descriptors are attached to the same target
              file using EPOLLEXCLUSIVE, one or more of the epoll file
              descriptors  will  receive  an event with epoll_wait(2).
              The default in this scenario (when EPOLLEXCLUSIVE is not
              set)  is  for  all  epoll file descriptors to receive an
              event.  EPOLLEXCLUSIVE is thus useful for avoiding thun‐
              dering herd problems in certain scenarios.

              If  the  same  file  descriptor  is  in  multiple  epoll
              instances, some with the EPOLLEXCLUSIVE flag, and others
              without,   then   events  will  provided  to  all  epoll
              instances that did not specify  EPOLLEXCLUSIVE,  and  at
              least  one  of  the  epoll  instances  that  did specify
              EPOLLEXCLUSIVE.

              The following values may  be  specified  in  conjunction
              with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
              EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
              but  are  ignored (as usual).  Attempts to specify other
              values in events yield an error.  EPOLLEXCLUSIVE may  be
              used  only  in  an  EPOLL_CTL_ADD operation; attempts to
              employ  it  with  EPOLL_CTL_MOD  yield  an  error.    If
              EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
              quent EPOLL_CTL_MOD on the same epfd, fd pair yields  an
              error.  An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
              events and specifies the target file descriptor fd as an
              epoll  instance will likewise fail.  The error in all of
              these cases is EINVAL.

   ERRORS
       EINVAL An invalid event type was specified along with  EPOLLEX‐
              CLUSIVE in events.

       EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.

       EINVAL op  was  EPOLL_CTL_MOD  and  the EPOLLEXCLUSIVE flag has
              previously been applied to this epfd, fd pair.

       EINVAL EPOLLEXCLUSIVE was specified in event and fd  is  refers
              to an epoll instance.

Is there anything that needs to be fixed in the above text?

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-10 19:58           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-10 19:58 UTC (permalink / raw)
  To: Jason Baron, akpm
  Cc: mtk.manpages, mingo, peterz, viro, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

On 03/10/2016 07:53 PM, Jason Baron wrote:
> Hi Michael,
> 
> On 01/29/2016 03:14 AM, Michael Kerrisk (man-pages) wrote:
>> Hello Jason,
>> On 01/28/2016 06:57 PM, Jason Baron wrote:
>>> Hi,
>>>
>>> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>>>> Hi Jason,
>>>>
>>>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>>>> Hi,
>>>>>
>>>>> Re-post of an old series addressing thundering herd issues when sharing
>>>>> an event source fd amongst multiple epoll fds. Last posting was here
>>>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>>>  
>>>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>>>> proposed as this patch seems performant without those.
>>>>>
>>>>> I was prompted to re-post this because Madars Vitolins reported some good
>>>>> speedups with this patch using Enduro/X application. His writeup is here:
>>>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Jason
>>>>>
>>>>> Sample epoll_clt text:
>>>>
>>>> Thanks for the proposed text. I have some questions about points
>>>> that are not quite clear to me.
>>>>
>>>>> EPOLLEXCLUSIVE
>>>>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>>>>> 	being attached to the target file descriptor, fd. Thus, when an
>>>>> 	event occurs and multiple epfd file descriptors are attached to the
>>>>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>>>> 	an event with epoll_wait(2). The default in this scenario (when
>>>>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>>>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>>>
>>>> So, assuming an FD is present in the interest list of multiple (say 6)
>>>> epoll FDs, and some (say 3) of those attachments were done using
>>>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>>>
>>>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>
>>>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>
>>>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>>>>     will receive an event.
>>>>
>>>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>>>>     an event, and it is indeterminate which one.
>>>>
>>>
>>> So b and c. All the non-exclusive adds will get it and at least 1 of the
>>> exclusive adds will as well.
>>
>> So is it fair to say that the expected use case is that all epoll sets
>> would use EPOLLEXCLUSIVE?
>>
>>>> I suppose one point I'm trying to uncover in the above is: what is
>>>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>>>> FD, or is it setting an attribute in the epoll "interest list" record
>>>> for that FD that affects notification behavior across all processes?
>>>>
>>>
>>> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
>>> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
>>> on epoll sets connected to fd that do not specify it.
>>>
>>>
>>>> And then:
>>>>
>>>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>>>>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>>>>     the 'events' field set to 0)?
>>>>
>>>
>>> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
>>> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
>>> guarantee further wakeups.
>>>
>>> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
>>> would need to either re-arm the thread that set the 'events' field to 0
>>> (by setting back to non-zero), or re-arm in at least one other thread
>>> via EPOLL_CTL_MOD (or delete and add).
>>
>> Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
>> to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
>> the FD.
>>
>>>> (2) The source code contains a comment "we do not currently supported 
>>>>     nested exclusive wakeups". Could you elaborate on this point? It
>>>>     sounds like something that should be documented.
>>>
>>> So I was just trying to say that we return -EINVAL if you try to do and
>>> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
>>> returned via epoll_create().
>>
>> Okay -- that definitely belongs in the man page.
>>
>> I'll work up a text, but would like to get input about the "use case"
>> question above.
>>
>> Cheers,
>>
>> Michael
>>
>>
>>
> 
> Ok, here's some updated text:
> 
> EPOLLEXCLUSIVE
> 
> Sets an exclusive wakeup mode for the epfd file descriptor that is being
> attached to the target file descriptor, fd. When a wakeup event occurs
> and multiple epfd file descriptors are attached to the same target file
> using EPOLLEXCLUSIVE, one or more epfds will receive an event with
> epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not
> set) is for all epfds to receive an event.
> 
> The events supported by EPOLLEXCLUSIVE are: EPOLLIN, EPOLLOUT, EPOLLERR,
> EPOLLHUP, EPOLLWAKEUP, and EPOLLET. epoll_wait(2) will always wait for
> EPOLLERR and EPOLLHUP; it is not necessary to set it in events. If
> EPOLLEXCLUSIVE is set using epoll_ctl(2), then a subsequent
> EPOLL_CTL_MOD on the same epfd, fd pair will retrun -EINVAL. An
> epoll_ctl(2) that specifies EPOLLEXCLUSIVE in events and specifies the
> target file descriptor fd as an epoll instance will return -EINVAL
> as well.

By the way, in the code you have

        case EPOLL_CTL_MOD:
                if (epi) { 
                        if (!(epi->event.events & EPOLLEXCLUSIVE)) {
                                epds.events |= POLLERR | POLLHUP;
                                error = ep_modify(ep, epi, &epds);
                        }

I think the "if" here is redundant. IIUC, earlier in the code you
disallow EPOLL_CTL_MOD with EPOLLEXCLUSIVE.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-10 19:58           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-10 19:58 UTC (permalink / raw)
  To: Jason Baron, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, normalperson-rMlxZR9MS24, m,
	corbet-T1hC0tSOHrs, luto-kltTT9wpgjJwATOyAt5JVQ,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, hagen-GvnIQ6b/HdU,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 03/10/2016 07:53 PM, Jason Baron wrote:
> Hi Michael,
> 
> On 01/29/2016 03:14 AM, Michael Kerrisk (man-pages) wrote:
>> Hello Jason,
>> On 01/28/2016 06:57 PM, Jason Baron wrote:
>>> Hi,
>>>
>>> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>>>> Hi Jason,
>>>>
>>>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>>>> Hi,
>>>>>
>>>>> Re-post of an old series addressing thundering herd issues when sharing
>>>>> an event source fd amongst multiple epoll fds. Last posting was here
>>>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>>>  
>>>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>>>> proposed as this patch seems performant without those.
>>>>>
>>>>> I was prompted to re-post this because Madars Vitolins reported some good
>>>>> speedups with this patch using Enduro/X application. His writeup is here:
>>>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Jason
>>>>>
>>>>> Sample epoll_clt text:
>>>>
>>>> Thanks for the proposed text. I have some questions about points
>>>> that are not quite clear to me.
>>>>
>>>>> EPOLLEXCLUSIVE
>>>>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>>>>> 	being attached to the target file descriptor, fd. Thus, when an
>>>>> 	event occurs and multiple epfd file descriptors are attached to the
>>>>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>>>> 	an event with epoll_wait(2). The default in this scenario (when
>>>>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>>>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>>>
>>>> So, assuming an FD is present in the interest list of multiple (say 6)
>>>> epoll FDs, and some (say 3) of those attachments were done using
>>>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>>>
>>>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>
>>>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>
>>>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>>>>     will receive an event.
>>>>
>>>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>>>>     an event, and it is indeterminate which one.
>>>>
>>>
>>> So b and c. All the non-exclusive adds will get it and at least 1 of the
>>> exclusive adds will as well.
>>
>> So is it fair to say that the expected use case is that all epoll sets
>> would use EPOLLEXCLUSIVE?
>>
>>>> I suppose one point I'm trying to uncover in the above is: what is
>>>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>>>> FD, or is it setting an attribute in the epoll "interest list" record
>>>> for that FD that affects notification behavior across all processes?
>>>>
>>>
>>> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
>>> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
>>> on epoll sets connected to fd that do not specify it.
>>>
>>>
>>>> And then:
>>>>
>>>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>>>>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>>>>     the 'events' field set to 0)?
>>>>
>>>
>>> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
>>> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
>>> guarantee further wakeups.
>>>
>>> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
>>> would need to either re-arm the thread that set the 'events' field to 0
>>> (by setting back to non-zero), or re-arm in at least one other thread
>>> via EPOLL_CTL_MOD (or delete and add).
>>
>> Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
>> to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
>> the FD.
>>
>>>> (2) The source code contains a comment "we do not currently supported 
>>>>     nested exclusive wakeups". Could you elaborate on this point? It
>>>>     sounds like something that should be documented.
>>>
>>> So I was just trying to say that we return -EINVAL if you try to do and
>>> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
>>> returned via epoll_create().
>>
>> Okay -- that definitely belongs in the man page.
>>
>> I'll work up a text, but would like to get input about the "use case"
>> question above.
>>
>> Cheers,
>>
>> Michael
>>
>>
>>
> 
> Ok, here's some updated text:
> 
> EPOLLEXCLUSIVE
> 
> Sets an exclusive wakeup mode for the epfd file descriptor that is being
> attached to the target file descriptor, fd. When a wakeup event occurs
> and multiple epfd file descriptors are attached to the same target file
> using EPOLLEXCLUSIVE, one or more epfds will receive an event with
> epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not
> set) is for all epfds to receive an event.
> 
> The events supported by EPOLLEXCLUSIVE are: EPOLLIN, EPOLLOUT, EPOLLERR,
> EPOLLHUP, EPOLLWAKEUP, and EPOLLET. epoll_wait(2) will always wait for
> EPOLLERR and EPOLLHUP; it is not necessary to set it in events. If
> EPOLLEXCLUSIVE is set using epoll_ctl(2), then a subsequent
> EPOLL_CTL_MOD on the same epfd, fd pair will retrun -EINVAL. An
> epoll_ctl(2) that specifies EPOLLEXCLUSIVE in events and specifies the
> target file descriptor fd as an epoll instance will return -EINVAL
> as well.

By the way, in the code you have

        case EPOLL_CTL_MOD:
                if (epi) { 
                        if (!(epi->event.events & EPOLLEXCLUSIVE)) {
                                epds.events |= POLLERR | POLLHUP;
                                error = ep_modify(ep, epi, &epds);
                        }

I think the "if" here is redundant. IIUC, earlier in the code you
disallow EPOLL_CTL_MOD with EPOLLEXCLUSIVE.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-10 20:40             ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2016-03-10 20:40 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), akpm
  Cc: mingo, peterz, viro, normalperson, m, corbet, luto, torvalds,
	hagen, linux-kernel, linux-fsdevel, linux-api



On 03/10/2016 02:58 PM, Michael Kerrisk (man-pages) wrote:
> On 03/10/2016 07:53 PM, Jason Baron wrote:
>> Hi Michael,
>>
>> On 01/29/2016 03:14 AM, Michael Kerrisk (man-pages) wrote:
>>> Hello Jason,
>>> On 01/28/2016 06:57 PM, Jason Baron wrote:
>>>> Hi,
>>>>
>>>> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Re-post of an old series addressing thundering herd issues when sharing
>>>>>> an event source fd amongst multiple epoll fds. Last posting was here
>>>>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>>>>  
>>>>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>>>>> proposed as this patch seems performant without those.
>>>>>>
>>>>>> I was prompted to re-post this because Madars Vitolins reported some good
>>>>>> speedups with this patch using Enduro/X application. His writeup is here:
>>>>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -Jason
>>>>>>
>>>>>> Sample epoll_clt text:
>>>>>
>>>>> Thanks for the proposed text. I have some questions about points
>>>>> that are not quite clear to me.
>>>>>
>>>>>> EPOLLEXCLUSIVE
>>>>>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>>>>>> 	being attached to the target file descriptor, fd. Thus, when an
>>>>>> 	event occurs and multiple epfd file descriptors are attached to the
>>>>>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>>>>> 	an event with epoll_wait(2). The default in this scenario (when
>>>>>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>>>>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>>>>
>>>>> So, assuming an FD is present in the interest list of multiple (say 6)
>>>>> epoll FDs, and some (say 3) of those attachments were done using
>>>>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>>>>
>>>>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>>
>>>>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>>
>>>>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>>>>>     will receive an event.
>>>>>
>>>>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>>>>>     an event, and it is indeterminate which one.
>>>>>
>>>>
>>>> So b and c. All the non-exclusive adds will get it and at least 1 of the
>>>> exclusive adds will as well.
>>>
>>> So is it fair to say that the expected use case is that all epoll sets
>>> would use EPOLLEXCLUSIVE?
>>>
>>>>> I suppose one point I'm trying to uncover in the above is: what is
>>>>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>>>>> FD, or is it setting an attribute in the epoll "interest list" record
>>>>> for that FD that affects notification behavior across all processes?
>>>>>
>>>>
>>>> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
>>>> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
>>>> on epoll sets connected to fd that do not specify it.
>>>>
>>>>
>>>>> And then:
>>>>>
>>>>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>>>>>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>>>>>     the 'events' field set to 0)?
>>>>>
>>>>
>>>> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
>>>> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
>>>> guarantee further wakeups.
>>>>
>>>> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
>>>> would need to either re-arm the thread that set the 'events' field to 0
>>>> (by setting back to non-zero), or re-arm in at least one other thread
>>>> via EPOLL_CTL_MOD (or delete and add).
>>>
>>> Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
>>> to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
>>> the FD.
>>>
>>>>> (2) The source code contains a comment "we do not currently supported 
>>>>>     nested exclusive wakeups". Could you elaborate on this point? It
>>>>>     sounds like something that should be documented.
>>>>
>>>> So I was just trying to say that we return -EINVAL if you try to do and
>>>> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
>>>> returned via epoll_create().
>>>
>>> Okay -- that definitely belongs in the man page.
>>>
>>> I'll work up a text, but would like to get input about the "use case"
>>> question above.
>>>
>>> Cheers,
>>>
>>> Michael
>>>
>>>
>>>
>>
>> Ok, here's some updated text:
>>
>> EPOLLEXCLUSIVE
>>
>> Sets an exclusive wakeup mode for the epfd file descriptor that is being
>> attached to the target file descriptor, fd. When a wakeup event occurs
>> and multiple epfd file descriptors are attached to the same target file
>> using EPOLLEXCLUSIVE, one or more epfds will receive an event with
>> epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not
>> set) is for all epfds to receive an event.
>>
>> The events supported by EPOLLEXCLUSIVE are: EPOLLIN, EPOLLOUT, EPOLLERR,
>> EPOLLHUP, EPOLLWAKEUP, and EPOLLET. epoll_wait(2) will always wait for
>> EPOLLERR and EPOLLHUP; it is not necessary to set it in events. If
>> EPOLLEXCLUSIVE is set using epoll_ctl(2), then a subsequent
>> EPOLL_CTL_MOD on the same epfd, fd pair will retrun -EINVAL. An
>> epoll_ctl(2) that specifies EPOLLEXCLUSIVE in events and specifies the
>> target file descriptor fd as an epoll instance will return -EINVAL
>> as well.
> 
> By the way, in the code you have
> 
>         case EPOLL_CTL_MOD:
>                 if (epi) { 
>                         if (!(epi->event.events & EPOLLEXCLUSIVE)) {
>                                 epds.events |= POLLERR | POLLHUP;
>                                 error = ep_modify(ep, epi, &epds);
>                         }
> 
> I think the "if" here is redundant. IIUC, earlier in the code you
> disallow EPOLL_CTL_MOD with EPOLLEXCLUSIVE.
> 
> Cheers,
> 
> Michael
> 
> 

Hi Michael,

So the previous check ensures that you can not add the EPOLLEXCLUSIVE
flag to the events via an EPOLL_CTL_MOD operation, where EPOLLEXCLUSIVE
may not be the existing events set. While this check here ensure you
can't modify an existing set that already has the EPOLLEXCLUSIVE flag.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-10 20:40             ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2016-03-10 20:40 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, normalperson-rMlxZR9MS24, m,
	corbet-T1hC0tSOHrs, luto-kltTT9wpgjJwATOyAt5JVQ,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, hagen-GvnIQ6b/HdU,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA



On 03/10/2016 02:58 PM, Michael Kerrisk (man-pages) wrote:
> On 03/10/2016 07:53 PM, Jason Baron wrote:
>> Hi Michael,
>>
>> On 01/29/2016 03:14 AM, Michael Kerrisk (man-pages) wrote:
>>> Hello Jason,
>>> On 01/28/2016 06:57 PM, Jason Baron wrote:
>>>> Hi,
>>>>
>>>> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Re-post of an old series addressing thundering herd issues when sharing
>>>>>> an event source fd amongst multiple epoll fds. Last posting was here
>>>>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>>>>  
>>>>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>>>>> proposed as this patch seems performant without those.
>>>>>>
>>>>>> I was prompted to re-post this because Madars Vitolins reported some good
>>>>>> speedups with this patch using Enduro/X application. His writeup is here:
>>>>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -Jason
>>>>>>
>>>>>> Sample epoll_clt text:
>>>>>
>>>>> Thanks for the proposed text. I have some questions about points
>>>>> that are not quite clear to me.
>>>>>
>>>>>> EPOLLEXCLUSIVE
>>>>>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>>>>>> 	being attached to the target file descriptor, fd. Thus, when an
>>>>>> 	event occurs and multiple epfd file descriptors are attached to the
>>>>>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>>>>> 	an event with epoll_wait(2). The default in this scenario (when
>>>>>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>>>>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>>>>
>>>>> So, assuming an FD is present in the interest list of multiple (say 6)
>>>>> epoll FDs, and some (say 3) of those attachments were done using
>>>>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>>>>
>>>>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>>
>>>>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>>
>>>>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>>>>>     will receive an event.
>>>>>
>>>>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>>>>>     an event, and it is indeterminate which one.
>>>>>
>>>>
>>>> So b and c. All the non-exclusive adds will get it and at least 1 of the
>>>> exclusive adds will as well.
>>>
>>> So is it fair to say that the expected use case is that all epoll sets
>>> would use EPOLLEXCLUSIVE?
>>>
>>>>> I suppose one point I'm trying to uncover in the above is: what is
>>>>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>>>>> FD, or is it setting an attribute in the epoll "interest list" record
>>>>> for that FD that affects notification behavior across all processes?
>>>>>
>>>>
>>>> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
>>>> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
>>>> on epoll sets connected to fd that do not specify it.
>>>>
>>>>
>>>>> And then:
>>>>>
>>>>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>>>>>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>>>>>     the 'events' field set to 0)?
>>>>>
>>>>
>>>> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
>>>> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
>>>> guarantee further wakeups.
>>>>
>>>> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
>>>> would need to either re-arm the thread that set the 'events' field to 0
>>>> (by setting back to non-zero), or re-arm in at least one other thread
>>>> via EPOLL_CTL_MOD (or delete and add).
>>>
>>> Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
>>> to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
>>> the FD.
>>>
>>>>> (2) The source code contains a comment "we do not currently supported 
>>>>>     nested exclusive wakeups". Could you elaborate on this point? It
>>>>>     sounds like something that should be documented.
>>>>
>>>> So I was just trying to say that we return -EINVAL if you try to do and
>>>> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
>>>> returned via epoll_create().
>>>
>>> Okay -- that definitely belongs in the man page.
>>>
>>> I'll work up a text, but would like to get input about the "use case"
>>> question above.
>>>
>>> Cheers,
>>>
>>> Michael
>>>
>>>
>>>
>>
>> Ok, here's some updated text:
>>
>> EPOLLEXCLUSIVE
>>
>> Sets an exclusive wakeup mode for the epfd file descriptor that is being
>> attached to the target file descriptor, fd. When a wakeup event occurs
>> and multiple epfd file descriptors are attached to the same target file
>> using EPOLLEXCLUSIVE, one or more epfds will receive an event with
>> epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not
>> set) is for all epfds to receive an event.
>>
>> The events supported by EPOLLEXCLUSIVE are: EPOLLIN, EPOLLOUT, EPOLLERR,
>> EPOLLHUP, EPOLLWAKEUP, and EPOLLET. epoll_wait(2) will always wait for
>> EPOLLERR and EPOLLHUP; it is not necessary to set it in events. If
>> EPOLLEXCLUSIVE is set using epoll_ctl(2), then a subsequent
>> EPOLL_CTL_MOD on the same epfd, fd pair will retrun -EINVAL. An
>> epoll_ctl(2) that specifies EPOLLEXCLUSIVE in events and specifies the
>> target file descriptor fd as an epoll instance will return -EINVAL
>> as well.
> 
> By the way, in the code you have
> 
>         case EPOLL_CTL_MOD:
>                 if (epi) { 
>                         if (!(epi->event.events & EPOLLEXCLUSIVE)) {
>                                 epds.events |= POLLERR | POLLHUP;
>                                 error = ep_modify(ep, epi, &epds);
>                         }
> 
> I think the "if" here is redundant. IIUC, earlier in the code you
> disallow EPOLL_CTL_MOD with EPOLLEXCLUSIVE.
> 
> Cheers,
> 
> Michael
> 
> 

Hi Michael,

So the previous check ensures that you can not add the EPOLLEXCLUSIVE
flag to the events via an EPOLL_CTL_MOD operation, where EPOLLEXCLUSIVE
may not be the existing events set. While this check here ensure you
can't modify an existing set that already has the EPOLLEXCLUSIVE flag.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-11 20:30               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-11 20:30 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra, Al Viro, Eric Wong,
	Madars Vitolins, Jonathan Corbet, Andy Lutomirski,
	Linus Torvalds, Hagen Paul Pfeifer, lkml, linux-fsdevel,
	Linux API

>> By the way, in the code you have
>>
>>         case EPOLL_CTL_MOD:
>>                 if (epi) {
>>                         if (!(epi->event.events & EPOLLEXCLUSIVE)) {
>>                                 epds.events |= POLLERR | POLLHUP;
>>                                 error = ep_modify(ep, epi, &epds);
>>                         }
>>
>> I think the "if" here is redundant. IIUC, earlier in the code you
>> disallow EPOLL_CTL_MOD with EPOLLEXCLUSIVE.
>>
>> Cheers,
>>
>> Michael
>>
>>
>
> Hi Michael,
>
> So the previous check ensures that you can not add the EPOLLEXCLUSIVE
> flag to the events via an EPOLL_CTL_MOD operation, where EPOLLEXCLUSIVE
> may not be the existing events set. While this check here ensure you
> can't modify an existing set that already has the EPOLLEXCLUSIVE flag.

Hmmm - I misread the code, itr seems :-/. Could you please carefully
check the man page text I sent earlier in this thread. Maybe I
injected some errors into the text.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-11 20:30               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-11 20:30 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra, Al Viro, Eric Wong,
	Madars Vitolins, Jonathan Corbet, Andy Lutomirski,
	Linus Torvalds, Hagen Paul Pfeifer, lkml,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API

>> By the way, in the code you have
>>
>>         case EPOLL_CTL_MOD:
>>                 if (epi) {
>>                         if (!(epi->event.events & EPOLLEXCLUSIVE)) {
>>                                 epds.events |= POLLERR | POLLHUP;
>>                                 error = ep_modify(ep, epi, &epds);
>>                         }
>>
>> I think the "if" here is redundant. IIUC, earlier in the code you
>> disallow EPOLL_CTL_MOD with EPOLLEXCLUSIVE.
>>
>> Cheers,
>>
>> Michael
>>
>>
>
> Hi Michael,
>
> So the previous check ensures that you can not add the EPOLLEXCLUSIVE
> flag to the events via an EPOLL_CTL_MOD operation, where EPOLLEXCLUSIVE
> may not be the existing events set. While this check here ensure you
> can't modify an existing set that already has the EPOLLEXCLUSIVE flag.

Hmmm - I misread the code, itr seems :-/. Could you please carefully
check the man page text I sent earlier in this thread. Maybe I
injected some errors into the text.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
       [not found]                   ` <56E6D0ED.20609@akamai.com>
@ 2016-03-14 17:47                     ` Michael Kerrisk (man-pages)
  2016-03-14 19:32                         ` Jason Baron
  0 siblings, 1 reply; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-14 17:47 UTC (permalink / raw)
  To: Jason Baron, Andrew Morton
  Cc: mtk.manpages, mingo, peterz, viro, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

[Restoring CC, which I see I accidentally dropped, one iteration back.]

Hi Jason,

Thanks for the review. I've tweaked one piece to respond to your
feedback. But I also have another new question below.

On 03/15/2016 03:55 AM, Jason Baron wrote:
> On 03/11/2016 06:25 PM, Michael Kerrisk (man-pages) wrote:
>> On 03/11/2016 09:51 PM, Jason Baron wrote:
>>> On 03/11/2016 03:30 PM, Michael Kerrisk (man-pages) wrote:

[...]

> Hi Michael,
> 
> Looks good. One comment below.
> 
> Thanks,
> 
>>        EPOLLEXCLUSIVE (since Linux 4.5)
>>               Sets  an  exclusive  wakeup  mode  for  the  epoll  file
>>               descriptor  that  is  being  attached to the target file
>>               descriptor, fd.  When a wakeup event occurs and multiple
>>               epoll  file  descriptors are attached to the same target
>>               file using EPOLLEXCLUSIVE, one or more of the epoll file
>>               descriptors  will  receive  an event with epoll_wait(2).
>>               The default in this scenario (when EPOLLEXCLUSIVE is not
>>               set)  is  for  all  epoll file descriptors to receive an
>>               event.  EPOLLEXCLUSIVE is thus useful for avoiding thun‐
>>               dering herd problems in certain scenarios.
>>
>>               If  the  same  file  descriptor  is  in  multiple  epoll
>>               instances, some with the EPOLLEXCLUSIVE flag, and others
>>               without,   then   events  will  provided  to  all  epoll
>>               instances that did not specify  EPOLLEXCLUSIVE,  and  at
>>               least  one  of  the  epoll  instances  that  did specify
>>               EPOLLEXCLUSIVE.
>>
>>               The following values may  be  specified  in  conjunction
>>               with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>>               EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
>>               but  are  ignored (as usual).  Attempts to specify other
> 
> I'm not sure 'ignored' is the right wording here. 'EPOLLHUP' and
> 'EPOLERR' are always included in the set of events when something is
> added as EPOLLEXCLUSIVE. This is consistent with the non-EPOLLEXCLUSIVE
> add case. 

Yes.

> So 'EPOLLHUP' and 'EPOLERR' may be specified but will be
> included in the set of events on an add, whether they are specified or not.

Yes. I understand your discomfort with the work "ignored", but the 
problem was that, because it made special mention of EPOLLHUP and EPOLLERR,
your proposed text made it sound as though EPOLLEXCLUSIVE somehow was
special with respect to these two flags. I wanted to clarify that it is not.
How about this:

              The following values may  be  specified  in  conjunction
              with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
              EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
              but  this  is  not  required: as usual, these events are
              always reported if they  occur,  regardless  of  whether
              they are specified in events.
?

>>               values in events yield an error.  EPOLLEXCLUSIVE may  be
>>               used  only  in  an  EPOLL_CTL_ADD operation; attempts to
>>               employ  it  with  EPOLL_CTL_MOD  yield  an  error.    If
>>               EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>               quent EPOLL_CTL_MOD on the same epfd, fd pair yields  an
b>>               error.  An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>               events and specifies the target file descriptor fd as an
>>               epoll  instance will likewise fail.  The error in all of
>>               these cases is EINVAL.
>>
>>    ERRORS
>>        EINVAL An invalid event type was specified along with  EPOLLEX‐
>>               CLUSIVE in events.
>>
>>        EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>
>>        EINVAL op  was  EPOLL_CTL_MOD  and  the EPOLLEXCLUSIVE flag has
>>               previously been applied to this epfd, fd pair.
>>
>>        EINVAL EPOLLEXCLUSIVE was specified in event and fd  is  refers
>>               to an epoll instance.

Returning to the second sentence in this description:

              When a wakeup event occurs and multiple epoll file descrip‐
              tors are attached to the same target file using EPOLLEXCLU‐
              SIVE, one or  more  of  the  epoll  file  descriptors  will
              receive  an  event with epoll_wait(2).

There is a point that is unclear to me: what does "target file" refer to?
Is it an open file description (aka open file table entry) or an inode?
I suspect the former, but it was not clear in your original text.

To make this point even clearer, here are two scenarios I'm thinking of.
In each case, we're talking of monitoring the read end of a FIFO.

===

Scenario 1:

We have three processes each of which
1. Creates an epoll instance
2. Opens the read end of the FIFO
3. Adds the read end of the FIFO to the epoll instance, specifying
   EPOLLEXCLUSIVE

When input becomes available on the FIFO, how many processes
get a wakeup?

===

Scenario 3

A parent process opens the read end of a FIFO and then calls
fork() three times to create three children. Each child then:

1. Creates an epoll instance
2. Adds the read end of the FIFO to the epoll instance, specifying
EPOLLEXCLUSIVE

When input becomes available on the FIFO, how many processes
get a wakeup?

===

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-14 19:32                         ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2016-03-14 19:32 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Andrew Morton
  Cc: mingo, peterz, viro, normalperson, m, corbet, luto, torvalds,
	hagen, linux-kernel, linux-fsdevel, linux-api



On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
> [Restoring CC, which I see I accidentally dropped, one iteration back.]
> 
> Hi Jason,
> 
> Thanks for the review. I've tweaked one piece to respond to your
> feedback. But I also have another new question below.
> 
> On 03/15/2016 03:55 AM, Jason Baron wrote:
>> On 03/11/2016 06:25 PM, Michael Kerrisk (man-pages) wrote:
>>> On 03/11/2016 09:51 PM, Jason Baron wrote:
>>>> On 03/11/2016 03:30 PM, Michael Kerrisk (man-pages) wrote:
> 
> [...]
> 
>> Hi Michael,
>>
>> Looks good. One comment below.
>>
>> Thanks,
>>
>>>        EPOLLEXCLUSIVE (since Linux 4.5)
>>>               Sets  an  exclusive  wakeup  mode  for  the  epoll  file
>>>               descriptor  that  is  being  attached to the target file
>>>               descriptor, fd.  When a wakeup event occurs and multiple
>>>               epoll  file  descriptors are attached to the same target
>>>               file using EPOLLEXCLUSIVE, one or more of the epoll file
>>>               descriptors  will  receive  an event with epoll_wait(2).
>>>               The default in this scenario (when EPOLLEXCLUSIVE is not
>>>               set)  is  for  all  epoll file descriptors to receive an
>>>               event.  EPOLLEXCLUSIVE is thus useful for avoiding thun‐
>>>               dering herd problems in certain scenarios.
>>>
>>>               If  the  same  file  descriptor  is  in  multiple  epoll
>>>               instances, some with the EPOLLEXCLUSIVE flag, and others
>>>               without,   then   events  will  provided  to  all  epoll
>>>               instances that did not specify  EPOLLEXCLUSIVE,  and  at
>>>               least  one  of  the  epoll  instances  that  did specify
>>>               EPOLLEXCLUSIVE.
>>>
>>>               The following values may  be  specified  in  conjunction
>>>               with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>>>               EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
>>>               but  are  ignored (as usual).  Attempts to specify other
>>
>> I'm not sure 'ignored' is the right wording here. 'EPOLLHUP' and
>> 'EPOLERR' are always included in the set of events when something is
>> added as EPOLLEXCLUSIVE. This is consistent with the non-EPOLLEXCLUSIVE
>> add case. 
> 
> Yes.
> 
>> So 'EPOLLHUP' and 'EPOLERR' may be specified but will be
>> included in the set of events on an add, whether they are specified or not.
> 
> Yes. I understand your discomfort with the work "ignored", but the 
> problem was that, because it made special mention of EPOLLHUP and EPOLLERR,
> your proposed text made it sound as though EPOLLEXCLUSIVE somehow was
> special with respect to these two flags. I wanted to clarify that it is not.
> How about this:
> 
>               The following values may  be  specified  in  conjunction
>               with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>               EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
>               but  this  is  not  required: as usual, these events are
>               always reported if they  occur,  regardless  of  whether
>               they are specified in events.
> ?

Yes, nothing special here with respect to EPOLLHUP and EPOLLERR. So this
looks fine to me.

> 
>>>               values in events yield an error.  EPOLLEXCLUSIVE may  be
>>>               used  only  in  an  EPOLL_CTL_ADD operation; attempts to
>>>               employ  it  with  EPOLL_CTL_MOD  yield  an  error.    If
>>>               EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>>               quent EPOLL_CTL_MOD on the same epfd, fd pair yields  an
> b>>               error.  An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>>               events and specifies the target file descriptor fd as an
>>>               epoll  instance will likewise fail.  The error in all of
>>>               these cases is EINVAL.
>>>
>>>    ERRORS
>>>        EINVAL An invalid event type was specified along with  EPOLLEX‐
>>>               CLUSIVE in events.
>>>
>>>        EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>
>>>        EINVAL op  was  EPOLL_CTL_MOD  and  the EPOLLEXCLUSIVE flag has
>>>               previously been applied to this epfd, fd pair.
>>>
>>>        EINVAL EPOLLEXCLUSIVE was specified in event and fd  is  refers
>>>               to an epoll instance.
> 
> Returning to the second sentence in this description:
> 
>               When a wakeup event occurs and multiple epoll file descrip‐
>               tors are attached to the same target file using EPOLLEXCLU‐
>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>               receive  an  event with epoll_wait(2).
> 
> There is a point that is unclear to me: what does "target file" refer to?
> Is it an open file description (aka open file table entry) or an inode?
> I suspect the former, but it was not clear in your original text.
>

So from epoll's perspective, the wakeups are associated with a 'wait
queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
file->poll()) results in adding to the same 'wait queue' then we will
get 'exclusive' wakeup behavior.

So in general, I think the answer here is that its associated with the
inode (I coudn't say with 100% certainty without really looking at all
file->poll() implementations). Certainly, with the 'FIFO' example below,
the two scenarios will have the same behavior with respect to
EPOLLEXCLUSIVE.

Also, the 'non-exclusive' mode would be subject to the same question of
which wait queue is the epfd is associated with...

Thanks,

-Jason

> To make this point even clearer, here are two scenarios I'm thinking of.
> In each case, we're talking of monitoring the read end of a FIFO.
> 
> ===
> 
> Scenario 1:
> 
> We have three processes each of which
> 1. Creates an epoll instance
> 2. Opens the read end of the FIFO
> 3. Adds the read end of the FIFO to the epoll instance, specifying
>    EPOLLEXCLUSIVE
> 
> When input becomes available on the FIFO, how many processes
> get a wakeup?
> 
> ===
> 
> Scenario 3
> 
> A parent process opens the read end of a FIFO and then calls
> fork() three times to create three children. Each child then:
> 
> 1. Creates an epoll instance
> 2. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
> 
> When input becomes available on the FIFO, how many processes
> get a wakeup?
> 
> ===
> 
> Cheers,
> 
> Michael
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-14 19:32                         ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2016-03-14 19:32 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Andrew Morton
  Cc: mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, normalperson-rMlxZR9MS24, m,
	corbet-T1hC0tSOHrs, luto-kltTT9wpgjJwATOyAt5JVQ,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, hagen-GvnIQ6b/HdU,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA



On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
> [Restoring CC, which I see I accidentally dropped, one iteration back.]
> 
> Hi Jason,
> 
> Thanks for the review. I've tweaked one piece to respond to your
> feedback. But I also have another new question below.
> 
> On 03/15/2016 03:55 AM, Jason Baron wrote:
>> On 03/11/2016 06:25 PM, Michael Kerrisk (man-pages) wrote:
>>> On 03/11/2016 09:51 PM, Jason Baron wrote:
>>>> On 03/11/2016 03:30 PM, Michael Kerrisk (man-pages) wrote:
> 
> [...]
> 
>> Hi Michael,
>>
>> Looks good. One comment below.
>>
>> Thanks,
>>
>>>        EPOLLEXCLUSIVE (since Linux 4.5)
>>>               Sets  an  exclusive  wakeup  mode  for  the  epoll  file
>>>               descriptor  that  is  being  attached to the target file
>>>               descriptor, fd.  When a wakeup event occurs and multiple
>>>               epoll  file  descriptors are attached to the same target
>>>               file using EPOLLEXCLUSIVE, one or more of the epoll file
>>>               descriptors  will  receive  an event with epoll_wait(2).
>>>               The default in this scenario (when EPOLLEXCLUSIVE is not
>>>               set)  is  for  all  epoll file descriptors to receive an
>>>               event.  EPOLLEXCLUSIVE is thus useful for avoiding thun‐
>>>               dering herd problems in certain scenarios.
>>>
>>>               If  the  same  file  descriptor  is  in  multiple  epoll
>>>               instances, some with the EPOLLEXCLUSIVE flag, and others
>>>               without,   then   events  will  provided  to  all  epoll
>>>               instances that did not specify  EPOLLEXCLUSIVE,  and  at
>>>               least  one  of  the  epoll  instances  that  did specify
>>>               EPOLLEXCLUSIVE.
>>>
>>>               The following values may  be  specified  in  conjunction
>>>               with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>>>               EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
>>>               but  are  ignored (as usual).  Attempts to specify other
>>
>> I'm not sure 'ignored' is the right wording here. 'EPOLLHUP' and
>> 'EPOLERR' are always included in the set of events when something is
>> added as EPOLLEXCLUSIVE. This is consistent with the non-EPOLLEXCLUSIVE
>> add case. 
> 
> Yes.
> 
>> So 'EPOLLHUP' and 'EPOLERR' may be specified but will be
>> included in the set of events on an add, whether they are specified or not.
> 
> Yes. I understand your discomfort with the work "ignored", but the 
> problem was that, because it made special mention of EPOLLHUP and EPOLLERR,
> your proposed text made it sound as though EPOLLEXCLUSIVE somehow was
> special with respect to these two flags. I wanted to clarify that it is not.
> How about this:
> 
>               The following values may  be  specified  in  conjunction
>               with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>               EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
>               but  this  is  not  required: as usual, these events are
>               always reported if they  occur,  regardless  of  whether
>               they are specified in events.
> ?

Yes, nothing special here with respect to EPOLLHUP and EPOLLERR. So this
looks fine to me.

> 
>>>               values in events yield an error.  EPOLLEXCLUSIVE may  be
>>>               used  only  in  an  EPOLL_CTL_ADD operation; attempts to
>>>               employ  it  with  EPOLL_CTL_MOD  yield  an  error.    If
>>>               EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>>               quent EPOLL_CTL_MOD on the same epfd, fd pair yields  an
> b>>               error.  An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>>               events and specifies the target file descriptor fd as an
>>>               epoll  instance will likewise fail.  The error in all of
>>>               these cases is EINVAL.
>>>
>>>    ERRORS
>>>        EINVAL An invalid event type was specified along with  EPOLLEX‐
>>>               CLUSIVE in events.
>>>
>>>        EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>
>>>        EINVAL op  was  EPOLL_CTL_MOD  and  the EPOLLEXCLUSIVE flag has
>>>               previously been applied to this epfd, fd pair.
>>>
>>>        EINVAL EPOLLEXCLUSIVE was specified in event and fd  is  refers
>>>               to an epoll instance.
> 
> Returning to the second sentence in this description:
> 
>               When a wakeup event occurs and multiple epoll file descrip‐
>               tors are attached to the same target file using EPOLLEXCLU‐
>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>               receive  an  event with epoll_wait(2).
> 
> There is a point that is unclear to me: what does "target file" refer to?
> Is it an open file description (aka open file table entry) or an inode?
> I suspect the former, but it was not clear in your original text.
>

So from epoll's perspective, the wakeups are associated with a 'wait
queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
file->poll()) results in adding to the same 'wait queue' then we will
get 'exclusive' wakeup behavior.

So in general, I think the answer here is that its associated with the
inode (I coudn't say with 100% certainty without really looking at all
file->poll() implementations). Certainly, with the 'FIFO' example below,
the two scenarios will have the same behavior with respect to
EPOLLEXCLUSIVE.

Also, the 'non-exclusive' mode would be subject to the same question of
which wait queue is the epfd is associated with...

Thanks,

-Jason

> To make this point even clearer, here are two scenarios I'm thinking of.
> In each case, we're talking of monitoring the read end of a FIFO.
> 
> ===
> 
> Scenario 1:
> 
> We have three processes each of which
> 1. Creates an epoll instance
> 2. Opens the read end of the FIFO
> 3. Adds the read end of the FIFO to the epoll instance, specifying
>    EPOLLEXCLUSIVE
> 
> When input becomes available on the FIFO, how many processes
> get a wakeup?
> 
> ===
> 
> Scenario 3
> 
> A parent process opens the read end of a FIFO and then calls
> fork() three times to create three children. Each child then:
> 
> 1. Creates an epoll instance
> 2. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
> 
> When input becomes available on the FIFO, how many processes
> get a wakeup?
> 
> ===
> 
> Cheers,
> 
> Michael
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
  2016-03-14 19:32                         ` Jason Baron
@ 2016-03-14 20:01                           ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-14 20:01 UTC (permalink / raw)
  To: Jason Baron, Andrew Morton
  Cc: mtk.manpages, mingo, peterz, viro, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Hi Jason,

On 03/15/2016 08:32 AM, Jason Baron wrote:
> 
> 
> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>> [Restoring CC, which I see I accidentally dropped, one iteration back.]

[...]

>>>>               values in events yield an error.  EPOLLEXCLUSIVE may  be
>>>>               used  only  in  an  EPOLL_CTL_ADD operation; attempts to
>>>>               employ  it  with  EPOLL_CTL_MOD  yield  an  error.    If
>>>>               EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>>>               quent EPOLL_CTL_MOD on the same epfd, fd pair yields  an
>> b>>               error.  An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>>>               events and specifies the target file descriptor fd as an
>>>>               epoll  instance will likewise fail.  The error in all of
>>>>               these cases is EINVAL.
>>>>
>>>>    ERRORS
>>>>        EINVAL An invalid event type was specified along with  EPOLLEX‐
>>>>               CLUSIVE in events.
>>>>
>>>>        EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>>
>>>>        EINVAL op  was  EPOLL_CTL_MOD  and  the EPOLLEXCLUSIVE flag has
>>>>               previously been applied to this epfd, fd pair.
>>>>
>>>>        EINVAL EPOLLEXCLUSIVE was specified in event and fd  is  refers
>>>>               to an epoll instance.
>>
>> Returning to the second sentence in this description:
>>
>>               When a wakeup event occurs and multiple epoll file descrip‐
>>               tors are attached to the same target file using EPOLLEXCLU‐
>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>               receive  an  event with epoll_wait(2).
>>
>> There is a point that is unclear to me: what does "target file" refer to?
>> Is it an open file description (aka open file table entry) or an inode?
>> I suspect the former, but it was not clear in your original text.
>>
> 
> So from epoll's perspective, the wakeups are associated with a 'wait
> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
> file->poll()) results in adding to the same 'wait queue' then we will
> get 'exclusive' wakeup behavior.
> 
> So in general, I think the answer here is that its associated with the
> inode (I coudn't say with 100% certainty without really looking at all
> file->poll() implementations). Certainly, with the 'FIFO' example below,
> the two scenarios will have the same behavior with respect to
> EPOLLEXCLUSIVE.

So, in both scenarios, *one or more* processes will get a wakeup?
(I'll try to add something to the text to clarify the detail we're 
discussing.)

> Also, the 'non-exclusive' mode would be subject to the same question of
> which wait queue is the epfd is associated with...

I'm not sure of the point you are trying to make here?

Cheers,

Michael


>> To make this point even clearer, here are two scenarios I'm thinking of.
>> In each case, we're talking of monitoring the read end of a FIFO.
>>
>> ===
>>
>> Scenario 1:
>>
>> We have three processes each of which
>> 1. Creates an epoll instance
>> 2. Opens the read end of the FIFO
>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>    EPOLLEXCLUSIVE
>>
>> When input becomes available on the FIFO, how many processes
>> get a wakeup?
>>
>> ===
>>
>> Scenario 3
>>
>> A parent process opens the read end of a FIFO and then calls
>> fork() three times to create three children. Each child then:
>>
>> 1. Creates an epoll instance
>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>> EPOLLEXCLUSIVE
>>
>> When input becomes available on the FIFO, how many processes
>> get a wakeup?
>>
>> ===
>>
>> Cheers,
>>
>> Michael
>>
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-14 20:01                           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-14 20:01 UTC (permalink / raw)
  To: Jason Baron, Andrew Morton
  Cc: mtk.manpages, mingo, peterz, viro, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Hi Jason,

On 03/15/2016 08:32 AM, Jason Baron wrote:
> 
> 
> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>> [Restoring CC, which I see I accidentally dropped, one iteration back.]

[...]

>>>>               values in events yield an error.  EPOLLEXCLUSIVE may  be
>>>>               used  only  in  an  EPOLL_CTL_ADD operation; attempts to
>>>>               employ  it  with  EPOLL_CTL_MOD  yield  an  error.    If
>>>>               EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>>>               quent EPOLL_CTL_MOD on the same epfd, fd pair yields  an
>> b>>               error.  An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>>>               events and specifies the target file descriptor fd as an
>>>>               epoll  instance will likewise fail.  The error in all of
>>>>               these cases is EINVAL.
>>>>
>>>>    ERRORS
>>>>        EINVAL An invalid event type was specified along with  EPOLLEX‐
>>>>               CLUSIVE in events.
>>>>
>>>>        EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>>
>>>>        EINVAL op  was  EPOLL_CTL_MOD  and  the EPOLLEXCLUSIVE flag has
>>>>               previously been applied to this epfd, fd pair.
>>>>
>>>>        EINVAL EPOLLEXCLUSIVE was specified in event and fd  is  refers
>>>>               to an epoll instance.
>>
>> Returning to the second sentence in this description:
>>
>>               When a wakeup event occurs and multiple epoll file descrip‐
>>               tors are attached to the same target file using EPOLLEXCLU‐
>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>               receive  an  event with epoll_wait(2).
>>
>> There is a point that is unclear to me: what does "target file" refer to?
>> Is it an open file description (aka open file table entry) or an inode?
>> I suspect the former, but it was not clear in your original text.
>>
> 
> So from epoll's perspective, the wakeups are associated with a 'wait
> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
> file->poll()) results in adding to the same 'wait queue' then we will
> get 'exclusive' wakeup behavior.
> 
> So in general, I think the answer here is that its associated with the
> inode (I coudn't say with 100% certainty without really looking at all
> file->poll() implementations). Certainly, with the 'FIFO' example below,
> the two scenarios will have the same behavior with respect to
> EPOLLEXCLUSIVE.

So, in both scenarios, *one or more* processes will get a wakeup?
(I'll try to add something to the text to clarify the detail we're 
discussing.)

> Also, the 'non-exclusive' mode would be subject to the same question of
> which wait queue is the epfd is associated with...

I'm not sure of the point you are trying to make here?

Cheers,

Michael


>> To make this point even clearer, here are two scenarios I'm thinking of.
>> In each case, we're talking of monitoring the read end of a FIFO.
>>
>> ===
>>
>> Scenario 1:
>>
>> We have three processes each of which
>> 1. Creates an epoll instance
>> 2. Opens the read end of the FIFO
>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>    EPOLLEXCLUSIVE
>>
>> When input becomes available on the FIFO, how many processes
>> get a wakeup?
>>
>> ===
>>
>> Scenario 3
>>
>> A parent process opens the read end of a FIFO and then calls
>> fork() three times to create three children. Each child then:
>>
>> 1. Creates an epoll instance
>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>> EPOLLEXCLUSIVE
>>
>> When input becomes available on the FIFO, how many processes
>> get a wakeup?
>>
>> ===
>>
>> Cheers,
>>
>> Michael
>>
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-14 21:03                             ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-14 21:03 UTC (permalink / raw)
  To: Jason Baron, Andrew Morton
  Cc: mtk.manpages, mingo, peterz, viro, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Hi Jason,

On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
> Hi Jason,
> 
> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>
>>
>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]

[...]

>>> Returning to the second sentence in this description:
>>>
>>>               When a wakeup event occurs and multiple epoll file descrip‐
>>>               tors are attached to the same target file using EPOLLEXCLU‐
>>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>>               receive  an  event with epoll_wait(2).
>>>
>>> There is a point that is unclear to me: what does "target file" refer to?
>>> Is it an open file description (aka open file table entry) or an inode?
>>> I suspect the former, but it was not clear in your original text.
>>>
>>
>> So from epoll's perspective, the wakeups are associated with a 'wait
>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>> file->poll()) results in adding to the same 'wait queue' then we will
>> get 'exclusive' wakeup behavior.
>>
>> So in general, I think the answer here is that its associated with the
>> inode (I coudn't say with 100% certainty without really looking at all
>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>> the two scenarios will have the same behavior with respect to
>> EPOLLEXCLUSIVE.

So, I was actually a little surprised by this, and went away and tested
this point. It appears to me that that the two scenarios described below
do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.

> So, in both scenarios, *one or more* processes will get a wakeup?
> (I'll try to add something to the text to clarify the detail we're 
> discussing.)
> 
>> Also, the 'non-exclusive' mode would be subject to the same question of
>> which wait queue is the epfd is associated with...
> 
> I'm not sure of the point you are trying to make here?
> 
> Cheers,
> 
> Michael
> 
> 
>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>
>>> ===
>>>
>>> Scenario 1:
>>>
>>> We have three processes each of which
>>> 1. Creates an epoll instance
>>> 2. Opens the read end of the FIFO
>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>    EPOLLEXCLUSIVE
>>>
>>> When input becomes available on the FIFO, how many processes
>>> get a wakeup?

When I test this scenario, all three processes get a wakeup.

>>> ===
>>>
>>> Scenario 3
>>>
>>> A parent process opens the read end of a FIFO and then calls
>>> fork() three times to create three children. Each child then:
>>>
>>> 1. Creates an epoll instance
>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>> EPOLLEXCLUSIVE
>>>
>>> When input becomes available on the FIFO, how many processes
>>> get a wakeup?

When I test this scenario, one process gets a wakeup.

In other words, "target file" appears to mean open file description
(aka open file table entry), not inode.

This is actually what I suspected might be the case, but now I am
puzzled. Given what I've discovered and what you suggest are the
semantics, is the implementation correct? (I suspect that it is,
but it is at odds with your statement above. My test programs are
inline below.

Cheers,

Michael

============

/* t_EPOLLEXCLUSIVE_multipen.c

   Licensed under GNU GPLv2 or later.
*/
#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define usageErr(msg, progName) \
                        do { fprintf(stderr, "Usage: "); \
                             fprintf(stderr, msg, progName); \
                             exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
    int fd, epfd, nready;
    struct epoll_event ev, rev;

    if (argc != 2 || strcmp(argv[1], "--help") == 0)
        usageErr("%s <FIFO>n", argv[0]);

    epfd = epoll_create(2);
    if (epfd == -1)
        errExit("epoll_create");

    fd = open(argv[1], O_RDONLY);
    if (fd == -1)
        errExit("open");
    printf("Opened %s\n", argv[1]);

    ev.events = EPOLLIN | EPOLLEXCLUSIVE;
    if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
        errExit("epoll_ctl");

    nready = epoll_wait(epfd, &rev, 1, -1);
    if (nready == -1)
        errExit("epoll-wait");
    printf("epoll_wait() returned %d\n", nready);

    exit(EXIT_SUCCESS);
}

===============

/* t_EPOLLEXCLUSIVE_fork.c 

   Licensed under GNU GPLv2 or later.
*/

#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define usageErr(msg, progName) \
                        do { fprintf(stderr, "Usage: "); \
                             fprintf(stderr, msg, progName); \
                             exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
    int fd, epfd, nready;
    struct epoll_event ev, rev;
    int cnum;

    if (argc != 2 || strcmp(argv[1], "--help") == 0)
        usageErr("%s <FIFO>n", argv[0]);

    fd = open(argv[1], O_RDONLY);
    if (fd == -1)
        errExit("open");
    printf("Opened %s\n", argv[1]);

    for (cnum = 0; cnum < 3; cnum++) {
        switch (fork()) {
        case -1:
            errExit("fork");

        case 0: /* Child */
            epfd = epoll_create(2);
            if (epfd == -1)
                errExit("epoll_create");

            ev.events = EPOLLIN | EPOLLEXCLUSIVE;
            if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
                errExit("epoll_ctl");

            nready = epoll_wait(epfd, &rev, 1, -1);
            if (nready == -1)
                errExit("epoll-wait");
            printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
            exit(EXIT_SUCCESS);

        default:
            break;
        }
    }

    wait(NULL);
    wait(NULL);
    wait(NULL);

    exit(EXIT_SUCCESS);
}

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-14 21:03                             ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-14 21:03 UTC (permalink / raw)
  To: Jason Baron, Andrew Morton
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, normalperson-rMlxZR9MS24, m,
	corbet-T1hC0tSOHrs, luto-kltTT9wpgjJwATOyAt5JVQ,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, hagen-GvnIQ6b/HdU,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi Jason,

On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
> Hi Jason,
> 
> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>
>>
>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]

[...]

>>> Returning to the second sentence in this description:
>>>
>>>               When a wakeup event occurs and multiple epoll file descrip‐
>>>               tors are attached to the same target file using EPOLLEXCLU‐
>>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>>               receive  an  event with epoll_wait(2).
>>>
>>> There is a point that is unclear to me: what does "target file" refer to?
>>> Is it an open file description (aka open file table entry) or an inode?
>>> I suspect the former, but it was not clear in your original text.
>>>
>>
>> So from epoll's perspective, the wakeups are associated with a 'wait
>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>> file->poll()) results in adding to the same 'wait queue' then we will
>> get 'exclusive' wakeup behavior.
>>
>> So in general, I think the answer here is that its associated with the
>> inode (I coudn't say with 100% certainty without really looking at all
>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>> the two scenarios will have the same behavior with respect to
>> EPOLLEXCLUSIVE.

So, I was actually a little surprised by this, and went away and tested
this point. It appears to me that that the two scenarios described below
do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.

> So, in both scenarios, *one or more* processes will get a wakeup?
> (I'll try to add something to the text to clarify the detail we're 
> discussing.)
> 
>> Also, the 'non-exclusive' mode would be subject to the same question of
>> which wait queue is the epfd is associated with...
> 
> I'm not sure of the point you are trying to make here?
> 
> Cheers,
> 
> Michael
> 
> 
>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>
>>> ===
>>>
>>> Scenario 1:
>>>
>>> We have three processes each of which
>>> 1. Creates an epoll instance
>>> 2. Opens the read end of the FIFO
>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>    EPOLLEXCLUSIVE
>>>
>>> When input becomes available on the FIFO, how many processes
>>> get a wakeup?

When I test this scenario, all three processes get a wakeup.

>>> ===
>>>
>>> Scenario 3
>>>
>>> A parent process opens the read end of a FIFO and then calls
>>> fork() three times to create three children. Each child then:
>>>
>>> 1. Creates an epoll instance
>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>> EPOLLEXCLUSIVE
>>>
>>> When input becomes available on the FIFO, how many processes
>>> get a wakeup?

When I test this scenario, one process gets a wakeup.

In other words, "target file" appears to mean open file description
(aka open file table entry), not inode.

This is actually what I suspected might be the case, but now I am
puzzled. Given what I've discovered and what you suggest are the
semantics, is the implementation correct? (I suspect that it is,
but it is at odds with your statement above. My test programs are
inline below.

Cheers,

Michael

============

/* t_EPOLLEXCLUSIVE_multipen.c

   Licensed under GNU GPLv2 or later.
*/
#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define usageErr(msg, progName) \
                        do { fprintf(stderr, "Usage: "); \
                             fprintf(stderr, msg, progName); \
                             exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
    int fd, epfd, nready;
    struct epoll_event ev, rev;

    if (argc != 2 || strcmp(argv[1], "--help") == 0)
        usageErr("%s <FIFO>n", argv[0]);

    epfd = epoll_create(2);
    if (epfd == -1)
        errExit("epoll_create");

    fd = open(argv[1], O_RDONLY);
    if (fd == -1)
        errExit("open");
    printf("Opened %s\n", argv[1]);

    ev.events = EPOLLIN | EPOLLEXCLUSIVE;
    if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
        errExit("epoll_ctl");

    nready = epoll_wait(epfd, &rev, 1, -1);
    if (nready == -1)
        errExit("epoll-wait");
    printf("epoll_wait() returned %d\n", nready);

    exit(EXIT_SUCCESS);
}

===============

/* t_EPOLLEXCLUSIVE_fork.c 

   Licensed under GNU GPLv2 or later.
*/

#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define usageErr(msg, progName) \
                        do { fprintf(stderr, "Usage: "); \
                             fprintf(stderr, msg, progName); \
                             exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
    int fd, epfd, nready;
    struct epoll_event ev, rev;
    int cnum;

    if (argc != 2 || strcmp(argv[1], "--help") == 0)
        usageErr("%s <FIFO>n", argv[0]);

    fd = open(argv[1], O_RDONLY);
    if (fd == -1)
        errExit("open");
    printf("Opened %s\n", argv[1]);

    for (cnum = 0; cnum < 3; cnum++) {
        switch (fork()) {
        case -1:
            errExit("fork");

        case 0: /* Child */
            epfd = epoll_create(2);
            if (epfd == -1)
                errExit("epoll_create");

            ev.events = EPOLLIN | EPOLLEXCLUSIVE;
            if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
                errExit("epoll_ctl");

            nready = epoll_wait(epfd, &rev, 1, -1);
            if (nready == -1)
                errExit("epoll-wait");
            printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
            exit(EXIT_SUCCESS);

        default:
            break;
        }
    }

    wait(NULL);
    wait(NULL);
    wait(NULL);

    exit(EXIT_SUCCESS);
}

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
  2016-03-14 21:03                             ` Michael Kerrisk (man-pages)
  (?)
@ 2016-03-14 22:35                             ` Jason Baron
  2016-03-14 23:09                               ` Madars Vitolins
  2016-03-14 23:26                                 ` Michael Kerrisk (man-pages)
  -1 siblings, 2 replies; 31+ messages in thread
From: Jason Baron @ 2016-03-14 22:35 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Andrew Morton
  Cc: mingo, peterz, viro, normalperson, m, corbet, luto, torvalds,
	hagen, linux-kernel, linux-fsdevel, linux-api

Hi Michael,

On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
> Hi Jason,
> 
> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
>> Hi Jason,
>>
>> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>>
>>>
>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]
> 
> [...]
> 
>>>> Returning to the second sentence in this description:
>>>>
>>>>               When a wakeup event occurs and multiple epoll file descrip‐
>>>>               tors are attached to the same target file using EPOLLEXCLU‐
>>>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>>>               receive  an  event with epoll_wait(2).
>>>>
>>>> There is a point that is unclear to me: what does "target file" refer to?
>>>> Is it an open file description (aka open file table entry) or an inode?
>>>> I suspect the former, but it was not clear in your original text.
>>>>
>>>
>>> So from epoll's perspective, the wakeups are associated with a 'wait
>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>>> file->poll()) results in adding to the same 'wait queue' then we will
>>> get 'exclusive' wakeup behavior.
>>>
>>> So in general, I think the answer here is that its associated with the
>>> inode (I coudn't say with 100% certainty without really looking at all
>>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>>> the two scenarios will have the same behavior with respect to
>>> EPOLLEXCLUSIVE.
> 
> So, I was actually a little surprised by this, and went away and tested
> this point. It appears to me that that the two scenarios described below
> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.
> 
>> So, in both scenarios, *one or more* processes will get a wakeup?
>> (I'll try to add something to the text to clarify the detail we're 
>> discussing.)
>>
>>> Also, the 'non-exclusive' mode would be subject to the same question of
>>> which wait queue is the epfd is associated with...
>>
>> I'm not sure of the point you are trying to make here?
>>
>> Cheers,
>>
>> Michael
>>
>>
>>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>>
>>>> ===
>>>>
>>>> Scenario 1:
>>>>
>>>> We have three processes each of which
>>>> 1. Creates an epoll instance
>>>> 2. Opens the read end of the FIFO
>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>>    EPOLLEXCLUSIVE
>>>>
>>>> When input becomes available on the FIFO, how many processes
>>>> get a wakeup?
> 
> When I test this scenario, all three processes get a wakeup.
> 
>>>> ===
>>>>
>>>> Scenario 3
>>>>
>>>> A parent process opens the read end of a FIFO and then calls
>>>> fork() three times to create three children. Each child then:
>>>>
>>>> 1. Creates an epoll instance
>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>>> EPOLLEXCLUSIVE
>>>>
>>>> When input becomes available on the FIFO, how many processes
>>>> get a wakeup?
> 
> When I test this scenario, one process gets a wakeup.
> 
> In other words, "target file" appears to mean open file description
> (aka open file table entry), not inode.
> 
> This is actually what I suspected might be the case, but now I am
> puzzled. Given what I've discovered and what you suggest are the
> semantics, is the implementation correct? (I suspect that it is,
> but it is at odds with your statement above. My test programs are
> inline below.
> 
> Cheers,
> 
> Michael
> 

Thanks for the test cases. So in your first test case, you are exiting
immediately after the epoll_wait() returns. So this is actually causing
the next wakeup. And then the 2nd thread returns from epoll_wait() and
this causes the 3rd wakeup.

So the wakeups are actually not happening from the write directly, but
instead from the readers doing a close(). If you do some sort of sleep
after the epoll_wait() you can confirm the behavior. So I believe this
is working as expected.

Thanks,

-Jason


> ============
> 
> /* t_EPOLLEXCLUSIVE_multipen.c
> 
>    Licensed under GNU GPLv2 or later.
> */
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> 
> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>                         } while (0)
> 
> #define usageErr(msg, progName) \
>                         do { fprintf(stderr, "Usage: "); \
>                              fprintf(stderr, msg, progName); \
>                              exit(EXIT_FAILURE); } while (0)
> 
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
> 
> int
> main(int argc, char *argv[])
> {
>     int fd, epfd, nready;
>     struct epoll_event ev, rev;
> 
>     if (argc != 2 || strcmp(argv[1], "--help") == 0)
>         usageErr("%s <FIFO>n", argv[0]);
> 
>     epfd = epoll_create(2);
>     if (epfd == -1)
>         errExit("epoll_create");
> 
>     fd = open(argv[1], O_RDONLY);
>     if (fd == -1)
>         errExit("open");
>     printf("Opened %s\n", argv[1]);
> 
>     ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>     if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>         errExit("epoll_ctl");
> 
>     nready = epoll_wait(epfd, &rev, 1, -1);
>     if (nready == -1)
>         errExit("epoll-wait");
>     printf("epoll_wait() returned %d\n", nready);
> 
>     exit(EXIT_SUCCESS);
> }
> 
> ===============
> 
> /* t_EPOLLEXCLUSIVE_fork.c 
> 
>    Licensed under GNU GPLv2 or later.
> */
> 
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> 
> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>                         } while (0)
> 
> #define usageErr(msg, progName) \
>                         do { fprintf(stderr, "Usage: "); \
>                              fprintf(stderr, msg, progName); \
>                              exit(EXIT_FAILURE); } while (0)
> 
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
> 
> int
> main(int argc, char *argv[])
> {
>     int fd, epfd, nready;
>     struct epoll_event ev, rev;
>     int cnum;
> 
>     if (argc != 2 || strcmp(argv[1], "--help") == 0)
>         usageErr("%s <FIFO>n", argv[0]);
> 
>     fd = open(argv[1], O_RDONLY);
>     if (fd == -1)
>         errExit("open");
>     printf("Opened %s\n", argv[1]);
> 
>     for (cnum = 0; cnum < 3; cnum++) {
>         switch (fork()) {
>         case -1:
>             errExit("fork");
> 
>         case 0: /* Child */
>             epfd = epoll_create(2);
>             if (epfd == -1)
>                 errExit("epoll_create");
> 
>             ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>             if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>                 errExit("epoll_ctl");
> 
>             nready = epoll_wait(epfd, &rev, 1, -1);
>             if (nready == -1)
>                 errExit("epoll-wait");
>             printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
>             exit(EXIT_SUCCESS);
> 
>         default:
>             break;
>         }
>     }
> 
>     wait(NULL);
>     wait(NULL);
>     wait(NULL);
> 
>     exit(EXIT_SUCCESS);
> }
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
  2016-03-14 22:35                             ` Jason Baron
@ 2016-03-14 23:09                               ` Madars Vitolins
  2016-03-14 23:26                                 ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 31+ messages in thread
From: Madars Vitolins @ 2016-03-14 23:09 UTC (permalink / raw)
  To: Jason Baron, Michael Kerrisk (man-pages)
  Cc: Andrew Morton, mingo, peterz, viro, normalperson, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Hi Jason and Michael,

Hmm... I tried to play with those pipe samples bellow, but even with 
sleep I got that all process wakeups (maybe I miss something too), also 
tried with EPOLLIN.

On same bases I created sample with Posix Queues with EPOLLIN | 
EPOLLEXCLUSIVE and the goods news are that it works correctly.

file q.c:
==================
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/epoll.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <errno.h>
#include <mqueue.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                         } while (0)

#define usageErr(msg, progName) \
                         do { fprintf(stderr, "Usage: "); \
                              fprintf(stderr, msg, progName); \
                              exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

#define MAX_SIZE 10

int
main (int argc, char *argv[])
{
   int epfd, nready;
   struct epoll_event ev, rev;
   mqd_t fd;
   struct mq_attr attr;
   char buffer[MAX_SIZE + 1];
   int cnum;

   /* initialize the queue attributes */
   attr.mq_flags = 0;
   attr.mq_maxmsg = 5;
   attr.mq_msgsize = MAX_SIZE;
   attr.mq_curmsgs = 0;

   /* cleanup for multiple runs... */
   mq_unlink ("/TESTQ");

   /* create the message queue */
   fd =
     mq_open ("/TESTQ", O_CREAT | O_RDWR | O_NONBLOCK, S_IWUSR | S_IRUSR,
	     &attr);
   if (fd == -1)
     errExit ("open");

   for (cnum = 0; cnum < 3; cnum++)
     {
       switch (fork ())
	{
	case -1:
	  errExit ("fork");

	case 0:		/* Child */
	  epfd = epoll_create (2);
	  if (epfd == -1)
	    errExit ("epoll_create");

	  ev.events = EPOLLIN | EPOLLEXCLUSIVE;
	  if (epoll_ctl (epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
	    errExit ("epoll_ctl");

	  printf ("About to wait...\n");
	  nready = epoll_wait (epfd, &rev, 1, -1);
	  if (nready == -1)
	    errExit ("epoll-wait");

	  printf ("Child %d: epoll_wait() returned %d\n", cnum, nready);
	  exit (EXIT_SUCCESS);

	default:
	  break;
	}
     }
   sleep (1);
   /* send a msq to Q */
   memset (buffer, 0, MAX_SIZE);
   if (0 > mq_send (fd, buffer, MAX_SIZE, 0))
     errExit ("mq_send");
   printf ("msg sent ok...\n");

   wait (NULL);
   wait (NULL);
   wait (NULL);

   exit (EXIT_SUCCESS);
}
==================

$ gcc q.c -lrt
$ ./a.out
About to wait...
About to wait...
About to wait...
msg sent ok...
Child 2: epoll_wait() returned 1
^C
$



Best regards,
Madars


Jason Baron @ 2016-03-15 00:35 rakstīja:
> Hi Michael,
> 
> On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
>> Hi Jason,
>> 
>> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
>>> Hi Jason,
>>> 
>>> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>>> 
>>>> 
>>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>>>> [Restoring CC, which I see I accidentally dropped, one iteration 
>>>>> back.]
>> 
>> [...]
>> 
>>>>> Returning to the second sentence in this description:
>>>>> 
>>>>>               When a wakeup event occurs and multiple epoll file 
>>>>> descrip‐
>>>>>               tors are attached to the same target file using 
>>>>> EPOLLEXCLU‐
>>>>>               SIVE, one or  more  of  the  epoll  file  descriptors 
>>>>>  will
>>>>>               receive  an  event with epoll_wait(2).
>>>>> 
>>>>> There is a point that is unclear to me: what does "target file" 
>>>>> refer to?
>>>>> Is it an open file description (aka open file table entry) or an 
>>>>> inode?
>>>>> I suspect the former, but it was not clear in your original text.
>>>>> 
>>>> 
>>>> So from epoll's perspective, the wakeups are associated with a 'wait
>>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done 
>>>> via
>>>> file->poll()) results in adding to the same 'wait queue' then we 
>>>> will
>>>> get 'exclusive' wakeup behavior.
>>>> 
>>>> So in general, I think the answer here is that its associated with 
>>>> the
>>>> inode (I coudn't say with 100% certainty without really looking at 
>>>> all
>>>> file->poll() implementations). Certainly, with the 'FIFO' example 
>>>> below,
>>>> the two scenarios will have the same behavior with respect to
>>>> EPOLLEXCLUSIVE.
>> 
>> So, I was actually a little surprised by this, and went away and 
>> tested
>> this point. It appears to me that that the two scenarios described 
>> below
>> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See 
>> below.
>> 
>>> So, in both scenarios, *one or more* processes will get a wakeup?
>>> (I'll try to add something to the text to clarify the detail we're
>>> discussing.)
>>> 
>>>> Also, the 'non-exclusive' mode would be subject to the same question 
>>>> of
>>>> which wait queue is the epfd is associated with...
>>> 
>>> I'm not sure of the point you are trying to make here?
>>> 
>>> Cheers,
>>> 
>>> Michael
>>> 
>>> 
>>>>> To make this point even clearer, here are two scenarios I'm 
>>>>> thinking of.
>>>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>>> 
>>>>> ===
>>>>> 
>>>>> Scenario 1:
>>>>> 
>>>>> We have three processes each of which
>>>>> 1. Creates an epoll instance
>>>>> 2. Opens the read end of the FIFO
>>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>>>    EPOLLEXCLUSIVE
>>>>> 
>>>>> When input becomes available on the FIFO, how many processes
>>>>> get a wakeup?
>> 
>> When I test this scenario, all three processes get a wakeup.
>> 
>>>>> ===
>>>>> 
>>>>> Scenario 3
>>>>> 
>>>>> A parent process opens the read end of a FIFO and then calls
>>>>> fork() three times to create three children. Each child then:
>>>>> 
>>>>> 1. Creates an epoll instance
>>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>>>> EPOLLEXCLUSIVE
>>>>> 
>>>>> When input becomes available on the FIFO, how many processes
>>>>> get a wakeup?
>> 
>> When I test this scenario, one process gets a wakeup.
>> 
>> In other words, "target file" appears to mean open file description
>> (aka open file table entry), not inode.
>> 
>> This is actually what I suspected might be the case, but now I am
>> puzzled. Given what I've discovered and what you suggest are the
>> semantics, is the implementation correct? (I suspect that it is,
>> but it is at odds with your statement above. My test programs are
>> inline below.
>> 
>> Cheers,
>> 
>> Michael
>> 
> 
> Thanks for the test cases. So in your first test case, you are exiting
> immediately after the epoll_wait() returns. So this is actually causing
> the next wakeup. And then the 2nd thread returns from epoll_wait() and
> this causes the 3rd wakeup.
> 
> So the wakeups are actually not happening from the write directly, but
> instead from the readers doing a close(). If you do some sort of sleep
> after the epoll_wait() you can confirm the behavior. So I believe this
> is working as expected.
> 
> Thanks,
> 
> -Jason
> 
> 
>> ============
>> 
>> /* t_EPOLLEXCLUSIVE_multipen.c
>> 
>>    Licensed under GNU GPLv2 or later.
>> */
>> #include <sys/epoll.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <sys/types.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <string.h>
>> 
>> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>>                         } while (0)
>> 
>> #define usageErr(msg, progName) \
>>                         do { fprintf(stderr, "Usage: "); \
>>                              fprintf(stderr, msg, progName); \
>>                              exit(EXIT_FAILURE); } while (0)
>> 
>> #ifndef EPOLLEXCLUSIVE
>> #define EPOLLEXCLUSIVE (1 << 28)
>> #endif
>> 
>> int
>> main(int argc, char *argv[])
>> {
>>     int fd, epfd, nready;
>>     struct epoll_event ev, rev;
>> 
>>     if (argc != 2 || strcmp(argv[1], "--help") == 0)
>>         usageErr("%s <FIFO>n", argv[0]);
>> 
>>     epfd = epoll_create(2);
>>     if (epfd == -1)
>>         errExit("epoll_create");
>> 
>>     fd = open(argv[1], O_RDONLY);
>>     if (fd == -1)
>>         errExit("open");
>>     printf("Opened %s\n", argv[1]);
>> 
>>     ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>>     if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>>         errExit("epoll_ctl");
>> 
>>     nready = epoll_wait(epfd, &rev, 1, -1);
>>     if (nready == -1)
>>         errExit("epoll-wait");
>>     printf("epoll_wait() returned %d\n", nready);
>> 
>>     exit(EXIT_SUCCESS);
>> }
>> 
>> ===============
>> 
>> /* t_EPOLLEXCLUSIVE_fork.c
>> 
>>    Licensed under GNU GPLv2 or later.
>> */
>> 
>> #include <sys/epoll.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <sys/types.h>
>> #include <sys/wait.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <string.h>
>> 
>> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>>                         } while (0)
>> 
>> #define usageErr(msg, progName) \
>>                         do { fprintf(stderr, "Usage: "); \
>>                              fprintf(stderr, msg, progName); \
>>                              exit(EXIT_FAILURE); } while (0)
>> 
>> #ifndef EPOLLEXCLUSIVE
>> #define EPOLLEXCLUSIVE (1 << 28)
>> #endif
>> 
>> int
>> main(int argc, char *argv[])
>> {
>>     int fd, epfd, nready;
>>     struct epoll_event ev, rev;
>>     int cnum;
>> 
>>     if (argc != 2 || strcmp(argv[1], "--help") == 0)
>>         usageErr("%s <FIFO>n", argv[0]);
>> 
>>     fd = open(argv[1], O_RDONLY);
>>     if (fd == -1)
>>         errExit("open");
>>     printf("Opened %s\n", argv[1]);
>> 
>>     for (cnum = 0; cnum < 3; cnum++) {
>>         switch (fork()) {
>>         case -1:
>>             errExit("fork");
>> 
>>         case 0: /* Child */
>>             epfd = epoll_create(2);
>>             if (epfd == -1)
>>                 errExit("epoll_create");
>> 
>>             ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>>             if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>>                 errExit("epoll_ctl");
>> 
>>             nready = epoll_wait(epfd, &rev, 1, -1);
>>             if (nready == -1)
>>                 errExit("epoll-wait");
>>             printf("Child %d: epoll_wait() returned %d\n", cnum, 
>> nready);
>>             exit(EXIT_SUCCESS);
>> 
>>         default:
>>             break;
>>         }
>>     }
>> 
>>     wait(NULL);
>>     wait(NULL);
>>     wait(NULL);
>> 
>>     exit(EXIT_SUCCESS);
>> }
>> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-14 23:26                                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-14 23:26 UTC (permalink / raw)
  To: Jason Baron, Andrew Morton
  Cc: mtk.manpages, mingo, peterz, viro, normalperson, m, corbet, luto,
	torvalds, hagen, linux-kernel, linux-fsdevel, linux-api

Hi Jason,

On 03/15/2016 11:35 AM, Jason Baron wrote:
> Hi Michael,
> 
> On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
>> Hi Jason,
>>
>> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
>>> Hi Jason,
>>>
>>> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>>>
>>>>
>>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>>
>> [...]
>>
>>>>> Returning to the second sentence in this description:
>>>>>
>>>>>               When a wakeup event occurs and multiple epoll file descrip‐
>>>>>               tors are attached to the same target file using EPOLLEXCLU‐
>>>>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>>>>               receive  an  event with epoll_wait(2).
>>>>>
>>>>> There is a point that is unclear to me: what does "target file" refer to?
>>>>> Is it an open file description (aka open file table entry) or an inode?
>>>>> I suspect the former, but it was not clear in your original text.
>>>>>
>>>>
>>>> So from epoll's perspective, the wakeups are associated with a 'wait
>>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>>>> file->poll()) results in adding to the same 'wait queue' then we will
>>>> get 'exclusive' wakeup behavior.
>>>>
>>>> So in general, I think the answer here is that its associated with the
>>>> inode (I coudn't say with 100% certainty without really looking at all
>>>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>>>> the two scenarios will have the same behavior with respect to
>>>> EPOLLEXCLUSIVE.
>>
>> So, I was actually a little surprised by this, and went away and tested
>> this point. It appears to me that that the two scenarios described below
>> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.
>>
>>> So, in both scenarios, *one or more* processes will get a wakeup?
>>> (I'll try to add something to the text to clarify the detail we're 
>>> discussing.)
>>>
>>>> Also, the 'non-exclusive' mode would be subject to the same question of
>>>> which wait queue is the epfd is associated with...
>>>
>>> I'm not sure of the point you are trying to make here?
>>>
>>> Cheers,
>>>
>>> Michael
>>>
>>>
>>>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>>>
>>>>> ===
>>>>>
>>>>> Scenario 1:
>>>>>
>>>>> We have three processes each of which
>>>>> 1. Creates an epoll instance
>>>>> 2. Opens the read end of the FIFO
>>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>>>    EPOLLEXCLUSIVE
>>>>>
>>>>> When input becomes available on the FIFO, how many processes
>>>>> get a wakeup?
>>
>> When I test this scenario, all three processes get a wakeup.
>>
>>>>> ===
>>>>>
>>>>> Scenario 3
>>>>>
>>>>> A parent process opens the read end of a FIFO and then calls
>>>>> fork() three times to create three children. Each child then:
>>>>>
>>>>> 1. Creates an epoll instance
>>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>>>> EPOLLEXCLUSIVE
>>>>>
>>>>> When input becomes available on the FIFO, how many processes
>>>>> get a wakeup?
>>
>> When I test this scenario, one process gets a wakeup.
>>
>> In other words, "target file" appears to mean open file description
>> (aka open file table entry), not inode.
>>
>> This is actually what I suspected might be the case, but now I am
>> puzzled. Given what I've discovered and what you suggest are the
>> semantics, is the implementation correct? (I suspect that it is,
>> but it is at odds with your statement above. My test programs are
>> inline below.
>>
>> Cheers,
>>
>> Michael
>>
> 
> Thanks for the test cases. So in your first test case, you are exiting
> immediately after the epoll_wait() returns. So this is actually causing
> the next wakeup. 

Can I just check my understanding of the rationale for the preceding 
point. The next process is getting woken up, because the previous process
did not "consume" the event (that is, the input is still available on the 
FIFO). Right?

> And then the 2nd thread returns from epoll_wait() and
> this causes the 3rd wakeup.

I added the sleep() calls, but still things don't seem to happen
quite as you suggest. In the first scenario, after the first process
terminates, *all* of the remaining processes wake from epoll_wait().
What's happening in this case? (This smells like a possible bug.)

In the second scenario (fork()), after the first process terminates
(without consuming the FIFO input), all of the other processes remain 
blocked in epoll-wait(). (Note, I extended the test program here
to allow the number of child processes to be specified as a command-line
argument.) I think I can make sense of that: it's because the open 
file descriptor for the read end of the FIFO has been duplicated
in all of the child processes, and closing the FD in one child
does not cause the corresponding open file description in other
processes to be torn down because there are other FDs that still
refer to it.

> So the wakeups are actually not happening from the write directly, but
> instead from the readers doing a close(). If you do some sort of sleep
> after the epoll_wait() you can confirm the behavior. So I believe this
> is working as expected.

As note above, I'm still slightly puzzled.
Revised test programs pasted below.

Cheers,

Michael

==========

/* t_EPOLLEXCLUSIVE_multiopen.c

  Licensed under GNU GPLv2 or later.
*/

#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define usageErr(msg, progName) \
                        do { fprintf(stderr, "Usage: "); \
                             fprintf(stderr, msg, progName); \
                             exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
    int fd, epfd, nready;
    struct epoll_event ev, rev;

    if (argc != 2 || strcmp(argv[1], "--help") == 0)
        usageErr("%s <FIFO>\n", argv[0]);

    epfd = epoll_create(2);
    if (epfd == -1)
        errExit("epoll_create");

    fd = open(argv[1], O_RDONLY);
    if (fd == -1)
        errExit("open");
    printf("Opened %s\n", argv[1]);

    ev.events = EPOLLIN | EPOLLEXCLUSIVE;
    if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
        errExit("epoll_ctl");

    nready = epoll_wait(epfd, &rev, 1, -1);
    if (nready == -1)
        errExit("epoll-wait");
    printf("epoll_wait() returned %d\n", nready);

    printf("sleeping\n");
    sleep(3);
    printf("Terminating\n");
    exit(EXIT_SUCCESS);
}

===================

/* t_EPOLLEXCLUSIVE_fork.c 
 
  Licensed under GNU GPLv2 or later.
*/

#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define usageErr(msg, progName) \
                        do { fprintf(stderr, "Usage: "); \
                             fprintf(stderr, msg, progName); \
                             exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
    int fd, epfd, nready;
    struct epoll_event ev, rev;
    int cnum, cmax;

    if (argc < 2 || strcmp(argv[1], "--help") == 0)
        usageErr("%s <FIFO> [num-children]\n", argv[0]);

    fd = open(argv[1], O_RDONLY);
    if (fd == -1)
        errExit("open");
    printf("Opened %s\n", argv[1]);

    cmax = (argc > 2) ? atoi(argv[2]) : 3;

    for (cnum = 0; cnum < cmax; cnum++) {
        switch (fork()) {
        case -1:
            errExit("fork");

        case 0: /* Child */
            epfd = epoll_create(2);
            if (epfd == -1)
                errExit("epoll_create");

            ev.events = EPOLLIN | EPOLLEXCLUSIVE;
            if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
                errExit("epoll_ctl");

            nready = epoll_wait(epfd, &rev, 1, -1);
            if (nready == -1)
                errExit("epoll-wait");
            printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
            printf("sleeping\n");
            sleep(3);
            printf("Child %d terminating\n", cnum);
            exit(EXIT_SUCCESS);

        default:
            break;
        }
    }

    for (cnum = 0; cnum < cmax; cnum++)
        wait(NULL);

    exit(EXIT_SUCCESS);
}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-14 23:26                                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-14 23:26 UTC (permalink / raw)
  To: Jason Baron, Andrew Morton
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, normalperson-rMlxZR9MS24, m,
	corbet-T1hC0tSOHrs, luto-kltTT9wpgjJwATOyAt5JVQ,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, hagen-GvnIQ6b/HdU,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi Jason,

On 03/15/2016 11:35 AM, Jason Baron wrote:
> Hi Michael,
> 
> On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
>> Hi Jason,
>>
>> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
>>> Hi Jason,
>>>
>>> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>>>
>>>>
>>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>>
>> [...]
>>
>>>>> Returning to the second sentence in this description:
>>>>>
>>>>>               When a wakeup event occurs and multiple epoll file descrip‐
>>>>>               tors are attached to the same target file using EPOLLEXCLU‐
>>>>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>>>>               receive  an  event with epoll_wait(2).
>>>>>
>>>>> There is a point that is unclear to me: what does "target file" refer to?
>>>>> Is it an open file description (aka open file table entry) or an inode?
>>>>> I suspect the former, but it was not clear in your original text.
>>>>>
>>>>
>>>> So from epoll's perspective, the wakeups are associated with a 'wait
>>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>>>> file->poll()) results in adding to the same 'wait queue' then we will
>>>> get 'exclusive' wakeup behavior.
>>>>
>>>> So in general, I think the answer here is that its associated with the
>>>> inode (I coudn't say with 100% certainty without really looking at all
>>>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>>>> the two scenarios will have the same behavior with respect to
>>>> EPOLLEXCLUSIVE.
>>
>> So, I was actually a little surprised by this, and went away and tested
>> this point. It appears to me that that the two scenarios described below
>> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.
>>
>>> So, in both scenarios, *one or more* processes will get a wakeup?
>>> (I'll try to add something to the text to clarify the detail we're 
>>> discussing.)
>>>
>>>> Also, the 'non-exclusive' mode would be subject to the same question of
>>>> which wait queue is the epfd is associated with...
>>>
>>> I'm not sure of the point you are trying to make here?
>>>
>>> Cheers,
>>>
>>> Michael
>>>
>>>
>>>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>>>
>>>>> ===
>>>>>
>>>>> Scenario 1:
>>>>>
>>>>> We have three processes each of which
>>>>> 1. Creates an epoll instance
>>>>> 2. Opens the read end of the FIFO
>>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>>>    EPOLLEXCLUSIVE
>>>>>
>>>>> When input becomes available on the FIFO, how many processes
>>>>> get a wakeup?
>>
>> When I test this scenario, all three processes get a wakeup.
>>
>>>>> ===
>>>>>
>>>>> Scenario 3
>>>>>
>>>>> A parent process opens the read end of a FIFO and then calls
>>>>> fork() three times to create three children. Each child then:
>>>>>
>>>>> 1. Creates an epoll instance
>>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>>>> EPOLLEXCLUSIVE
>>>>>
>>>>> When input becomes available on the FIFO, how many processes
>>>>> get a wakeup?
>>
>> When I test this scenario, one process gets a wakeup.
>>
>> In other words, "target file" appears to mean open file description
>> (aka open file table entry), not inode.
>>
>> This is actually what I suspected might be the case, but now I am
>> puzzled. Given what I've discovered and what you suggest are the
>> semantics, is the implementation correct? (I suspect that it is,
>> but it is at odds with your statement above. My test programs are
>> inline below.
>>
>> Cheers,
>>
>> Michael
>>
> 
> Thanks for the test cases. So in your first test case, you are exiting
> immediately after the epoll_wait() returns. So this is actually causing
> the next wakeup. 

Can I just check my understanding of the rationale for the preceding 
point. The next process is getting woken up, because the previous process
did not "consume" the event (that is, the input is still available on the 
FIFO). Right?

> And then the 2nd thread returns from epoll_wait() and
> this causes the 3rd wakeup.

I added the sleep() calls, but still things don't seem to happen
quite as you suggest. In the first scenario, after the first process
terminates, *all* of the remaining processes wake from epoll_wait().
What's happening in this case? (This smells like a possible bug.)

In the second scenario (fork()), after the first process terminates
(without consuming the FIFO input), all of the other processes remain 
blocked in epoll-wait(). (Note, I extended the test program here
to allow the number of child processes to be specified as a command-line
argument.) I think I can make sense of that: it's because the open 
file descriptor for the read end of the FIFO has been duplicated
in all of the child processes, and closing the FD in one child
does not cause the corresponding open file description in other
processes to be torn down because there are other FDs that still
refer to it.

> So the wakeups are actually not happening from the write directly, but
> instead from the readers doing a close(). If you do some sort of sleep
> after the epoll_wait() you can confirm the behavior. So I believe this
> is working as expected.

As note above, I'm still slightly puzzled.
Revised test programs pasted below.

Cheers,

Michael

==========

/* t_EPOLLEXCLUSIVE_multiopen.c

  Licensed under GNU GPLv2 or later.
*/

#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define usageErr(msg, progName) \
                        do { fprintf(stderr, "Usage: "); \
                             fprintf(stderr, msg, progName); \
                             exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
    int fd, epfd, nready;
    struct epoll_event ev, rev;

    if (argc != 2 || strcmp(argv[1], "--help") == 0)
        usageErr("%s <FIFO>\n", argv[0]);

    epfd = epoll_create(2);
    if (epfd == -1)
        errExit("epoll_create");

    fd = open(argv[1], O_RDONLY);
    if (fd == -1)
        errExit("open");
    printf("Opened %s\n", argv[1]);

    ev.events = EPOLLIN | EPOLLEXCLUSIVE;
    if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
        errExit("epoll_ctl");

    nready = epoll_wait(epfd, &rev, 1, -1);
    if (nready == -1)
        errExit("epoll-wait");
    printf("epoll_wait() returned %d\n", nready);

    printf("sleeping\n");
    sleep(3);
    printf("Terminating\n");
    exit(EXIT_SUCCESS);
}

===================

/* t_EPOLLEXCLUSIVE_fork.c 
 
  Licensed under GNU GPLv2 or later.
*/

#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define usageErr(msg, progName) \
                        do { fprintf(stderr, "Usage: "); \
                             fprintf(stderr, msg, progName); \
                             exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
    int fd, epfd, nready;
    struct epoll_event ev, rev;
    int cnum, cmax;

    if (argc < 2 || strcmp(argv[1], "--help") == 0)
        usageErr("%s <FIFO> [num-children]\n", argv[0]);

    fd = open(argv[1], O_RDONLY);
    if (fd == -1)
        errExit("open");
    printf("Opened %s\n", argv[1]);

    cmax = (argc > 2) ? atoi(argv[2]) : 3;

    for (cnum = 0; cnum < cmax; cnum++) {
        switch (fork()) {
        case -1:
            errExit("fork");

        case 0: /* Child */
            epfd = epoll_create(2);
            if (epfd == -1)
                errExit("epoll_create");

            ev.events = EPOLLIN | EPOLLEXCLUSIVE;
            if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
                errExit("epoll_ctl");

            nready = epoll_wait(epfd, &rev, 1, -1);
            if (nready == -1)
                errExit("epoll-wait");
            printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
            printf("sleeping\n");
            sleep(3);
            printf("Child %d terminating\n", cnum);
            exit(EXIT_SUCCESS);

        default:
            break;
        }
    }

    for (cnum = 0; cnum < cmax; cnum++)
        wait(NULL);

    exit(EXIT_SUCCESS);
}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-15  2:36                                   ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2016-03-15  2:36 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Andrew Morton
  Cc: mingo, peterz, viro, normalperson, m, corbet, luto, torvalds,
	hagen, linux-kernel, linux-fsdevel, linux-api

Hi Michael,

On 03/14/2016 07:26 PM, Michael Kerrisk (man-pages) wrote:
> Hi Jason,
> 
> On 03/15/2016 11:35 AM, Jason Baron wrote:
>> Hi Michael,
>>
>> On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
>>> Hi Jason,
>>>
>>> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
>>>> Hi Jason,
>>>>
>>>> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>>>>
>>>>>
>>>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>>>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>>>
>>> [...]
>>>
>>>>>> Returning to the second sentence in this description:
>>>>>>
>>>>>>               When a wakeup event occurs and multiple epoll file descrip‐
>>>>>>               tors are attached to the same target file using EPOLLEXCLU‐
>>>>>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>>>>>               receive  an  event with epoll_wait(2).
>>>>>>
>>>>>> There is a point that is unclear to me: what does "target file" refer to?
>>>>>> Is it an open file description (aka open file table entry) or an inode?
>>>>>> I suspect the former, but it was not clear in your original text.
>>>>>>
>>>>>
>>>>> So from epoll's perspective, the wakeups are associated with a 'wait
>>>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>>>>> file->poll()) results in adding to the same 'wait queue' then we will
>>>>> get 'exclusive' wakeup behavior.
>>>>>
>>>>> So in general, I think the answer here is that its associated with the
>>>>> inode (I coudn't say with 100% certainty without really looking at all
>>>>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>>>>> the two scenarios will have the same behavior with respect to
>>>>> EPOLLEXCLUSIVE.
>>>
>>> So, I was actually a little surprised by this, and went away and tested
>>> this point. It appears to me that that the two scenarios described below
>>> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.
>>>
>>>> So, in both scenarios, *one or more* processes will get a wakeup?
>>>> (I'll try to add something to the text to clarify the detail we're 
>>>> discussing.)
>>>>
>>>>> Also, the 'non-exclusive' mode would be subject to the same question of
>>>>> which wait queue is the epfd is associated with...
>>>>
>>>> I'm not sure of the point you are trying to make here?
>>>>
>>>> Cheers,
>>>>
>>>> Michael
>>>>
>>>>
>>>>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>>>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>>>>
>>>>>> ===
>>>>>>
>>>>>> Scenario 1:
>>>>>>
>>>>>> We have three processes each of which
>>>>>> 1. Creates an epoll instance
>>>>>> 2. Opens the read end of the FIFO
>>>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>>>>    EPOLLEXCLUSIVE
>>>>>>
>>>>>> When input becomes available on the FIFO, how many processes
>>>>>> get a wakeup?
>>>
>>> When I test this scenario, all three processes get a wakeup.
>>>
>>>>>> ===
>>>>>>
>>>>>> Scenario 3
>>>>>>
>>>>>> A parent process opens the read end of a FIFO and then calls
>>>>>> fork() three times to create three children. Each child then:
>>>>>>
>>>>>> 1. Creates an epoll instance
>>>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>>>>> EPOLLEXCLUSIVE
>>>>>>
>>>>>> When input becomes available on the FIFO, how many processes
>>>>>> get a wakeup?
>>>
>>> When I test this scenario, one process gets a wakeup.
>>>
>>> In other words, "target file" appears to mean open file description
>>> (aka open file table entry), not inode.
>>>
>>> This is actually what I suspected might be the case, but now I am
>>> puzzled. Given what I've discovered and what you suggest are the
>>> semantics, is the implementation correct? (I suspect that it is,
>>> but it is at odds with your statement above. My test programs are
>>> inline below.
>>>
>>> Cheers,
>>>
>>> Michael
>>>
>>
>> Thanks for the test cases. So in your first test case, you are exiting
>> immediately after the epoll_wait() returns. So this is actually causing
>> the next wakeup. 
> 
> Can I just check my understanding of the rationale for the preceding 
> point. The next process is getting woken up, because the previous process
> did not "consume" the event (that is, the input is still available on the 
> FIFO). Right?
> 
>> And then the 2nd thread returns from epoll_wait() and
>> this causes the 3rd wakeup.
> 
> I added the sleep() calls, but still things don't seem to happen
> quite as you suggest. In the first scenario, after the first process
> terminates, *all* of the remaining processes wake from epoll_wait().
> What's happening in this case? (This smells like a possible bug.)

Yes, you are right. When the first process exits() and thus closes
the read-side of the pipe, it will wake up all the other calls that
are in epoll_wait(). When the file closes, since there are no more
fds referencing it, the pipe_release() routine does a wakeup specifying
both POLLIN and POLLOUT. In this case, all of the epoll exclusive
waiters will get a wakeup. The combination of POLLIN and POLLOUT is
not expected to be the typical use-case here. Normally, we would have
the threads in event loops, and just POLLIN would be set resulting
in the exclusive waskeup behavior. So yes, there can be multiple
wakeups in some cases, but the *common* case of only POLLIN or only
POLLOUT set will yield exclusive wakeups. There is no guarantee here
that only 1 thread wakes up - only that *at least* one.

> 
> In the second scenario (fork()), after the first process terminates
> (without consuming the FIFO input), all of the other processes remain 
> blocked in epoll-wait(). (Note, I extended the test program here
> to allow the number of child processes to be specified as a command-line
> argument.) I think I can make sense of that: it's because the open 
> file descriptor for the read end of the FIFO has been duplicated
> in all of the child processes, and closing the FD in one child
> does not cause the corresponding open file description in other
> processes to be torn down because there are other FDs that still
> refer to it.
> 

Yes, exactly. The final process is going to invoke the pipe_release(),
but at that point there is nobody left to wakeup.

>> So the wakeups are actually not happening from the write directly, but
>> instead from the readers doing a close(). If you do some sort of sleep
>> after the epoll_wait() you can confirm the behavior. So I believe this
>> is working as expected.
> 
> As note above, I'm still slightly puzzled.
> Revised test programs pasted below.
> 

Ok, hopefully this makes sense.

Thanks,

-Jason

> Cheers,
> 
> Michael
> 
> ==========
> 
> /* t_EPOLLEXCLUSIVE_multiopen.c
> 
>   Licensed under GNU GPLv2 or later.
> */
> 
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> 
> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>                         } while (0)
> 
> #define usageErr(msg, progName) \
>                         do { fprintf(stderr, "Usage: "); \
>                              fprintf(stderr, msg, progName); \
>                              exit(EXIT_FAILURE); } while (0)
> 
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
> 
> int
> main(int argc, char *argv[])
> {
>     int fd, epfd, nready;
>     struct epoll_event ev, rev;
> 
>     if (argc != 2 || strcmp(argv[1], "--help") == 0)
>         usageErr("%s <FIFO>\n", argv[0]);
> 
>     epfd = epoll_create(2);
>     if (epfd == -1)
>         errExit("epoll_create");
> 
>     fd = open(argv[1], O_RDONLY);
>     if (fd == -1)
>         errExit("open");
>     printf("Opened %s\n", argv[1]);
> 
>     ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>     if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>         errExit("epoll_ctl");
> 
>     nready = epoll_wait(epfd, &rev, 1, -1);
>     if (nready == -1)
>         errExit("epoll-wait");
>     printf("epoll_wait() returned %d\n", nready);
> 
>     printf("sleeping\n");
>     sleep(3);
>     printf("Terminating\n");
>     exit(EXIT_SUCCESS);
> }
> 
> ===================
> 
> /* t_EPOLLEXCLUSIVE_fork.c 
>  
>   Licensed under GNU GPLv2 or later.
> */
> 
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> 
> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>                         } while (0)
> 
> #define usageErr(msg, progName) \
>                         do { fprintf(stderr, "Usage: "); \
>                              fprintf(stderr, msg, progName); \
>                              exit(EXIT_FAILURE); } while (0)
> 
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
> 
> int
> main(int argc, char *argv[])
> {
>     int fd, epfd, nready;
>     struct epoll_event ev, rev;
>     int cnum, cmax;
> 
>     if (argc < 2 || strcmp(argv[1], "--help") == 0)
>         usageErr("%s <FIFO> [num-children]\n", argv[0]);
> 
>     fd = open(argv[1], O_RDONLY);
>     if (fd == -1)
>         errExit("open");
>     printf("Opened %s\n", argv[1]);
> 
>     cmax = (argc > 2) ? atoi(argv[2]) : 3;
> 
>     for (cnum = 0; cnum < cmax; cnum++) {
>         switch (fork()) {
>         case -1:
>             errExit("fork");
> 
>         case 0: /* Child */
>             epfd = epoll_create(2);
>             if (epfd == -1)
>                 errExit("epoll_create");
> 
>             ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>             if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>                 errExit("epoll_ctl");
> 
>             nready = epoll_wait(epfd, &rev, 1, -1);
>             if (nready == -1)
>                 errExit("epoll-wait");
>             printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
>             printf("sleeping\n");
>             sleep(3);
>             printf("Child %d terminating\n", cnum);
>             exit(EXIT_SUCCESS);
> 
>         default:
>             break;
>         }
>     }
> 
>     for (cnum = 0; cnum < cmax; cnum++)
>         wait(NULL);
> 
>     exit(EXIT_SUCCESS);
> }
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] epoll: add exclusive wakeups flag
@ 2016-03-15  2:36                                   ` Jason Baron
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Baron @ 2016-03-15  2:36 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Andrew Morton
  Cc: mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	viro-rfM+Q5joDG/XmaaqVzeoHQ, normalperson-rMlxZR9MS24, m,
	corbet-T1hC0tSOHrs, luto-kltTT9wpgjJwATOyAt5JVQ,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, hagen-GvnIQ6b/HdU,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi Michael,

On 03/14/2016 07:26 PM, Michael Kerrisk (man-pages) wrote:
> Hi Jason,
> 
> On 03/15/2016 11:35 AM, Jason Baron wrote:
>> Hi Michael,
>>
>> On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
>>> Hi Jason,
>>>
>>> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
>>>> Hi Jason,
>>>>
>>>> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>>>>
>>>>>
>>>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>>>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>>>
>>> [...]
>>>
>>>>>> Returning to the second sentence in this description:
>>>>>>
>>>>>>               When a wakeup event occurs and multiple epoll file descrip‐
>>>>>>               tors are attached to the same target file using EPOLLEXCLU‐
>>>>>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>>>>>               receive  an  event with epoll_wait(2).
>>>>>>
>>>>>> There is a point that is unclear to me: what does "target file" refer to?
>>>>>> Is it an open file description (aka open file table entry) or an inode?
>>>>>> I suspect the former, but it was not clear in your original text.
>>>>>>
>>>>>
>>>>> So from epoll's perspective, the wakeups are associated with a 'wait
>>>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>>>>> file->poll()) results in adding to the same 'wait queue' then we will
>>>>> get 'exclusive' wakeup behavior.
>>>>>
>>>>> So in general, I think the answer here is that its associated with the
>>>>> inode (I coudn't say with 100% certainty without really looking at all
>>>>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>>>>> the two scenarios will have the same behavior with respect to
>>>>> EPOLLEXCLUSIVE.
>>>
>>> So, I was actually a little surprised by this, and went away and tested
>>> this point. It appears to me that that the two scenarios described below
>>> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.
>>>
>>>> So, in both scenarios, *one or more* processes will get a wakeup?
>>>> (I'll try to add something to the text to clarify the detail we're 
>>>> discussing.)
>>>>
>>>>> Also, the 'non-exclusive' mode would be subject to the same question of
>>>>> which wait queue is the epfd is associated with...
>>>>
>>>> I'm not sure of the point you are trying to make here?
>>>>
>>>> Cheers,
>>>>
>>>> Michael
>>>>
>>>>
>>>>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>>>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>>>>
>>>>>> ===
>>>>>>
>>>>>> Scenario 1:
>>>>>>
>>>>>> We have three processes each of which
>>>>>> 1. Creates an epoll instance
>>>>>> 2. Opens the read end of the FIFO
>>>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>>>>    EPOLLEXCLUSIVE
>>>>>>
>>>>>> When input becomes available on the FIFO, how many processes
>>>>>> get a wakeup?
>>>
>>> When I test this scenario, all three processes get a wakeup.
>>>
>>>>>> ===
>>>>>>
>>>>>> Scenario 3
>>>>>>
>>>>>> A parent process opens the read end of a FIFO and then calls
>>>>>> fork() three times to create three children. Each child then:
>>>>>>
>>>>>> 1. Creates an epoll instance
>>>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>>>>> EPOLLEXCLUSIVE
>>>>>>
>>>>>> When input becomes available on the FIFO, how many processes
>>>>>> get a wakeup?
>>>
>>> When I test this scenario, one process gets a wakeup.
>>>
>>> In other words, "target file" appears to mean open file description
>>> (aka open file table entry), not inode.
>>>
>>> This is actually what I suspected might be the case, but now I am
>>> puzzled. Given what I've discovered and what you suggest are the
>>> semantics, is the implementation correct? (I suspect that it is,
>>> but it is at odds with your statement above. My test programs are
>>> inline below.
>>>
>>> Cheers,
>>>
>>> Michael
>>>
>>
>> Thanks for the test cases. So in your first test case, you are exiting
>> immediately after the epoll_wait() returns. So this is actually causing
>> the next wakeup. 
> 
> Can I just check my understanding of the rationale for the preceding 
> point. The next process is getting woken up, because the previous process
> did not "consume" the event (that is, the input is still available on the 
> FIFO). Right?
> 
>> And then the 2nd thread returns from epoll_wait() and
>> this causes the 3rd wakeup.
> 
> I added the sleep() calls, but still things don't seem to happen
> quite as you suggest. In the first scenario, after the first process
> terminates, *all* of the remaining processes wake from epoll_wait().
> What's happening in this case? (This smells like a possible bug.)

Yes, you are right. When the first process exits() and thus closes
the read-side of the pipe, it will wake up all the other calls that
are in epoll_wait(). When the file closes, since there are no more
fds referencing it, the pipe_release() routine does a wakeup specifying
both POLLIN and POLLOUT. In this case, all of the epoll exclusive
waiters will get a wakeup. The combination of POLLIN and POLLOUT is
not expected to be the typical use-case here. Normally, we would have
the threads in event loops, and just POLLIN would be set resulting
in the exclusive waskeup behavior. So yes, there can be multiple
wakeups in some cases, but the *common* case of only POLLIN or only
POLLOUT set will yield exclusive wakeups. There is no guarantee here
that only 1 thread wakes up - only that *at least* one.

> 
> In the second scenario (fork()), after the first process terminates
> (without consuming the FIFO input), all of the other processes remain 
> blocked in epoll-wait(). (Note, I extended the test program here
> to allow the number of child processes to be specified as a command-line
> argument.) I think I can make sense of that: it's because the open 
> file descriptor for the read end of the FIFO has been duplicated
> in all of the child processes, and closing the FD in one child
> does not cause the corresponding open file description in other
> processes to be torn down because there are other FDs that still
> refer to it.
> 

Yes, exactly. The final process is going to invoke the pipe_release(),
but at that point there is nobody left to wakeup.

>> So the wakeups are actually not happening from the write directly, but
>> instead from the readers doing a close(). If you do some sort of sleep
>> after the epoll_wait() you can confirm the behavior. So I believe this
>> is working as expected.
> 
> As note above, I'm still slightly puzzled.
> Revised test programs pasted below.
> 

Ok, hopefully this makes sense.

Thanks,

-Jason

> Cheers,
> 
> Michael
> 
> ==========
> 
> /* t_EPOLLEXCLUSIVE_multiopen.c
> 
>   Licensed under GNU GPLv2 or later.
> */
> 
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> 
> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>                         } while (0)
> 
> #define usageErr(msg, progName) \
>                         do { fprintf(stderr, "Usage: "); \
>                              fprintf(stderr, msg, progName); \
>                              exit(EXIT_FAILURE); } while (0)
> 
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
> 
> int
> main(int argc, char *argv[])
> {
>     int fd, epfd, nready;
>     struct epoll_event ev, rev;
> 
>     if (argc != 2 || strcmp(argv[1], "--help") == 0)
>         usageErr("%s <FIFO>\n", argv[0]);
> 
>     epfd = epoll_create(2);
>     if (epfd == -1)
>         errExit("epoll_create");
> 
>     fd = open(argv[1], O_RDONLY);
>     if (fd == -1)
>         errExit("open");
>     printf("Opened %s\n", argv[1]);
> 
>     ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>     if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>         errExit("epoll_ctl");
> 
>     nready = epoll_wait(epfd, &rev, 1, -1);
>     if (nready == -1)
>         errExit("epoll-wait");
>     printf("epoll_wait() returned %d\n", nready);
> 
>     printf("sleeping\n");
>     sleep(3);
>     printf("Terminating\n");
>     exit(EXIT_SUCCESS);
> }
> 
> ===================
> 
> /* t_EPOLLEXCLUSIVE_fork.c 
>  
>   Licensed under GNU GPLv2 or later.
> */
> 
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> 
> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>                         } while (0)
> 
> #define usageErr(msg, progName) \
>                         do { fprintf(stderr, "Usage: "); \
>                              fprintf(stderr, msg, progName); \
>                              exit(EXIT_FAILURE); } while (0)
> 
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
> 
> int
> main(int argc, char *argv[])
> {
>     int fd, epfd, nready;
>     struct epoll_event ev, rev;
>     int cnum, cmax;
> 
>     if (argc < 2 || strcmp(argv[1], "--help") == 0)
>         usageErr("%s <FIFO> [num-children]\n", argv[0]);
> 
>     fd = open(argv[1], O_RDONLY);
>     if (fd == -1)
>         errExit("open");
>     printf("Opened %s\n", argv[1]);
> 
>     cmax = (argc > 2) ? atoi(argv[2]) : 3;
> 
>     for (cnum = 0; cnum < cmax; cnum++) {
>         switch (fork()) {
>         case -1:
>             errExit("fork");
> 
>         case 0: /* Child */
>             epfd = epoll_create(2);
>             if (epfd == -1)
>                 errExit("epoll_create");
> 
>             ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>             if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>                 errExit("epoll_ctl");
> 
>             nready = epoll_wait(epfd, &rev, 1, -1);
>             if (nready == -1)
>                 errExit("epoll-wait");
>             printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
>             printf("sleeping\n");
>             sleep(3);
>             printf("Child %d terminating\n", cnum);
>             exit(EXIT_SUCCESS);
> 
>         default:
>             break;
>         }
>     }
> 
>     for (cnum = 0; cnum < cmax; cnum++)
>         wait(NULL);
> 
>     exit(EXIT_SUCCESS);
> }
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-03-15  2:45 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-08  3:23 [PATCH] epoll: add exclusive wakeups flag Jason Baron
2015-12-08  3:23 ` [PATCH] epoll: add EPOLLEXCLUSIVE flag Jason Baron
2015-12-08  3:23   ` Jason Baron
2016-01-28  7:16 ` [PATCH] epoll: add exclusive wakeups flag Michael Kerrisk (man-pages)
2016-01-28  7:16   ` Michael Kerrisk (man-pages)
2016-01-28 17:57   ` Jason Baron
2016-01-29  8:14     ` Michael Kerrisk (man-pages)
2016-02-01 19:42       ` Jason Baron
2016-02-01 19:42         ` Jason Baron
2016-03-10 18:53       ` Jason Baron
2016-03-10 19:47         ` Michael Kerrisk (man-pages)
2016-03-10 19:47           ` Michael Kerrisk (man-pages)
2016-03-10 19:58         ` Michael Kerrisk (man-pages)
2016-03-10 19:58           ` Michael Kerrisk (man-pages)
2016-03-10 20:40           ` Jason Baron
2016-03-10 20:40             ` Jason Baron
2016-03-11 20:30             ` Michael Kerrisk (man-pages)
2016-03-11 20:30               ` Michael Kerrisk (man-pages)
     [not found]               ` <56E32FC5.4030902@akamai.com>
     [not found]                 ` <56E353CF.6050503@gmail.com>
     [not found]                   ` <56E6D0ED.20609@akamai.com>
2016-03-14 17:47                     ` Michael Kerrisk (man-pages)
2016-03-14 19:32                       ` Jason Baron
2016-03-14 19:32                         ` Jason Baron
2016-03-14 20:01                         ` Michael Kerrisk (man-pages)
2016-03-14 20:01                           ` Michael Kerrisk (man-pages)
2016-03-14 21:03                           ` Michael Kerrisk (man-pages)
2016-03-14 21:03                             ` Michael Kerrisk (man-pages)
2016-03-14 22:35                             ` Jason Baron
2016-03-14 23:09                               ` Madars Vitolins
2016-03-14 23:26                               ` Michael Kerrisk (man-pages)
2016-03-14 23:26                                 ` Michael Kerrisk (man-pages)
2016-03-15  2:36                                 ` Jason Baron
2016-03-15  2:36                                   ` Jason Baron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.