All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] epoll: Improved support for multi-threaded clients
@ 2012-06-11 22:34 Paton Lewis
  2012-06-13 23:27 ` Andrew Morton
  2012-06-16 18:47 ` Christof Meerwald
  0 siblings, 2 replies; 12+ messages in thread
From: Paton Lewis @ 2012-06-11 22:34 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Jason Baron, linux-fsdevel, linux-kernel
  Cc: Paton Lewis, Paul Holland


It is not currently possible to reliably delete epoll items when using the
same epoll set from multiple threads. After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related to an
event for that epoll item (in response to epoll_wait). Therefore the deleting
thread does not know when it is safe to delete resources pertaining to the
associated epoll item because another thread might be using those resources.

The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before another
thread is done handling an event returned by epoll_wait.

This patch introduces the new epoll_ctl command EPOLL_CTL_DISABLE, which
disables the associated epoll item and returns -EBUSY if the epoll item is not
currently in the epoll ready queue. This allows multiple threads to use a
mutex to determine when it is safe to delete an epoll item and its associated
resources. This allows epoll items to be deleted and closed efficiently and
without error.

This patch has been tested against the following checklist:
http://kernelnewbies.org/UpstreamMerge/SubmitChecklist

The following psuedocode attempts to illustrate the problem as well as the
solution provided by this patch.


Pseudocode for the deleting thread:


void delete_epoll_item( int index )
{
	bool handling_io = false;
	pthread_mutex_lock( items[ index ].stop_mutex );
	items[ index ].stop_work = true;
#ifdef EPOLL_CTL_DISABLE
	if( epoll_ctl( epoll_fd, EPOLL_CTL_DISABLE, items[ index ].fd, NULL ) == 0 )
	{
		if( epoll_ctl( epoll_fd, EPOLL_CTL_DEL, items[ index ].fd, NULL ) < 0 )
			handle_error();
	}
	else if ( errno == EBUSY )
		handling_io = true;
	else
		handle_error();
#else
	if( epoll_ctl( epoll_fd, EPOLL_CTL_DEL, items[ index ].fd, NULL ) < 0 )
		handle_error();
	else
		/* Wait in case another thread is currently doing I/O on this item.
		   Note that we have no idea if this will be long enough or not: */
		sleep( 10 );
#endif
	if( !handling_io )
		delete_item( items[ index ] );
	pthread_mutex_unlock( items[ index ].stop_mutex );
}


Pseudocode for the processing thread:


	while( epoll_wait( epoll_fd, &event, 1, -1 ) == 1 )
	{
		item = item_from_event( event );
		/* Without EPOLL_CTL_DISABLE, the following call can fail because
		   'item' has been deleted: */
		pthread_mutex_lock( item.stop_mutex );
		if( item.stop_work )
		{
			pthread_mutex_unlock( item.stop_mutex );
			/* Without EPOLL_CTL_DISABLE, the following call can fail because
		   	   'item' has already been deleted: */
			delete_item( item );
		}
		else
		{
			perform_io( item );
			event.events = EPOLLONESHOT | EPOLLET | EPOLLIN | EPOLLOUT;
			if( epoll_ctl( epoll_fd, EPOLL_CTL_ADD, item.fd, &event ) < 0 )
				handle_error();
			pthread_mutex_unlock( item.stop_mutex );
		}
	}


Signed-off-by: Paton J. Lewis <palewis@adobe.com>
---
 fs/eventpoll.c            |   35 ++++++++++++++++++++++++++++++++---
 include/linux/eventpoll.h |    1 +
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..ef924e5 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -339,7 +339,7 @@ static inline struct epitem *ep_item_from_epqueue(poll_table *p)
 /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */
 static inline int ep_op_has_event(int op)
 {
-	return op != EPOLL_CTL_DEL;
+	return op == EPOLL_CTL_ADD || op == EPOLL_CTL_MOD;
 }
 
 /* Initialize the poll safe wake up structure */
@@ -664,6 +664,31 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 	return 0;
 }
 
+/*
+ * Disables a "struct epitem" in the eventpoll set. Returns -EBUSY
+ * if the item was not in the ready list and its event may therefore
+ * be currently getting handled by another thread.
+ * Must be called with "mtx" held.
+ */
+static int ep_disable(struct eventpoll *ep, struct epitem *epi)
+{
+	int result = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ep->lock, flags);
+	if (ep_is_linked(&epi->rdllink))
+		list_del_init(&epi->rdllink);
+	else
+		result = -EBUSY;
+
+	/* Ensure ep_poll_callback will not add epi back onto ready list: */
+	epi->event.events &= EP_PRIVATE_BITS;
+
+	spin_unlock_irqrestore(&ep->lock, flags);
+
+	return result;
+}
+
 static void ep_free(struct eventpoll *ep)
 {
 	struct rb_node *rbp;
@@ -996,8 +1021,6 @@ static void ep_rbtree_insert(struct eventpoll *ep, struct epitem *epi)
 	rb_insert_color(&epi->rbn, &ep->rbr);
 }
 
-
-
 #define PATH_ARR_SIZE 5
 /*
  * These are the number paths of length 1 to 5, that we are allowing to emanate
@@ -1701,6 +1724,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 		} else
 			error = -ENOENT;
 		break;
+	case EPOLL_CTL_DISABLE:
+		if (epi)
+			error = ep_disable(ep, epi);
+		else
+			error = -ENOENT;
+		break;
 	}
 	mutex_unlock(&ep->mtx);
 
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..e91f7e3 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -25,6 +25,7 @@
 #define EPOLL_CTL_ADD 1
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
+#define EPOLL_CTL_DISABLE 4
 
 /* Set the One Shot behaviour for the target file descriptor */
 #define EPOLLONESHOT (1 << 30)
-- 
1.7.5.4


Signed-off-by: Paton J. Lewis <palewis@adobe.com>
---
 fs/eventpoll.c            |   35 ++++++++++++++++++++++++++++++++---
 include/linux/eventpoll.h |    1 +
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..ef924e5 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -339,7 +339,7 @@ static inline struct epitem *ep_item_from_epqueue(poll_table *p)
 /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */
 static inline int ep_op_has_event(int op)
 {
-	return op != EPOLL_CTL_DEL;
+	return op == EPOLL_CTL_ADD || op == EPOLL_CTL_MOD;
 }
 
 /* Initialize the poll safe wake up structure */
@@ -664,6 +664,31 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 	return 0;
 }
 
+/*
+ * Disables a "struct epitem" in the eventpoll set. Returns -EBUSY
+ * if the item was not in the ready list and its event may therefore
+ * be currently getting handled by another thread.
+ * Must be called with "mtx" held.
+ */
+static int ep_disable(struct eventpoll *ep, struct epitem *epi)
+{
+	int result = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ep->lock, flags);
+	if (ep_is_linked(&epi->rdllink))
+		list_del_init(&epi->rdllink);
+	else
+		result = -EBUSY;
+
+	/* Ensure ep_poll_callback will not add epi back onto ready list: */
+	epi->event.events &= EP_PRIVATE_BITS;
+
+	spin_unlock_irqrestore(&ep->lock, flags);
+
+	return result;
+}
+
 static void ep_free(struct eventpoll *ep)
 {
 	struct rb_node *rbp;
@@ -996,8 +1021,6 @@ static void ep_rbtree_insert(struct eventpoll *ep, struct epitem *epi)
 	rb_insert_color(&epi->rbn, &ep->rbr);
 }
 
-
-
 #define PATH_ARR_SIZE 5
 /*
  * These are the number paths of length 1 to 5, that we are allowing to emanate
@@ -1701,6 +1724,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 		} else
 			error = -ENOENT;
 		break;
+	case EPOLL_CTL_DISABLE:
+		if (epi)
+			error = ep_disable(ep, epi);
+		else
+			error = -ENOENT;
+		break;
 	}
 	mutex_unlock(&ep->mtx);
 
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..e91f7e3 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -25,6 +25,7 @@
 #define EPOLL_CTL_ADD 1
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
+#define EPOLL_CTL_DISABLE 4
 
 /* Set the One Shot behaviour for the target file descriptor */
 #define EPOLLONESHOT (1 << 30)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-06-11 22:34 [PATCH] epoll: Improved support for multi-threaded clients Paton Lewis
@ 2012-06-13 23:27 ` Andrew Morton
  2012-06-18 21:58   ` Paton J. Lewis
  2012-06-16 18:47 ` Christof Meerwald
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2012-06-13 23:27 UTC (permalink / raw)
  To: Paton Lewis
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi


Let's cc Davide.

On Mon, 11 Jun 2012 15:34:49 -0700
Paton Lewis <palewis@adobe.com> wrote:

> 
> It is not currently possible to reliably delete epoll items when using the
> same epoll set from multiple threads. After calling epoll_ctl with
> EPOLL_CTL_DEL, another thread might still be executing code related to an
> event for that epoll item (in response to epoll_wait). Therefore the deleting
> thread does not know when it is safe to delete resources pertaining to the
> associated epoll item because another thread might be using those resources.

We often solve this sort of thing with refcounting - the EPOLL_CTL_DEL
will drop the refcount and, if that fell to zero, remove the object. 
So if the object is in use by another thread, that other thread will
hold a refcount and that other thread will do the object removal when
dropping its refcount.

I don't know if that model can be used here?

> The deleting thread could wait an arbitrary amount of time after calling
> epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
> inefficient and could result in the destruction of resources before another
> thread is done handling an event returned by epoll_wait.
> 
> This patch introduces the new epoll_ctl command EPOLL_CTL_DISABLE, which
> disables the associated epoll item and returns -EBUSY if the epoll item is not
> currently in the epoll ready queue. This allows multiple threads to use a
> mutex to determine when it is safe to delete an epoll item and its associated
> resources. This allows epoll items to be deleted and closed efficiently and
> without error.
> 
> This patch has been tested against the following checklist:
> http://kernelnewbies.org/UpstreamMerge/SubmitChecklist
> 
> The following psuedocode attempts to illustrate the problem as well as the
> solution provided by this patch.
> 
> 
> Pseudocode for the deleting thread:

It would be nice to start accumulating epoll test code in
tools/testing/selftests/epoll.  If you have something which can be used
to kick that off, please do consider preparing it.

Also, a user-visible feature wuch as this should be documented in Linux
manpages.  So please do cc Michael Kerrisk <mtk.manpages@gmail.com> as
we work on this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-06-11 22:34 [PATCH] epoll: Improved support for multi-threaded clients Paton Lewis
  2012-06-13 23:27 ` Andrew Morton
@ 2012-06-16 18:47 ` Christof Meerwald
  2012-06-18 23:24   ` Paton J. Lewis
  1 sibling, 1 reply; 12+ messages in thread
From: Christof Meerwald @ 2012-06-16 18:47 UTC (permalink / raw)
  To: Paton Lewis
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi

On Mon, 11 Jun 2012 15:34:49 -0700, Paton Lewis wrote:
> This patch introduces the new epoll_ctl command EPOLL_CTL_DISABLE, which
> disables the associated epoll item and returns -EBUSY if the epoll item is not
> currently in the epoll ready queue. This allows multiple threads to use a
> mutex to determine when it is safe to delete an epoll item and its associated
> resources. This allows epoll items to be deleted and closed efficiently and
> without error.

Do you assume that EPOLLONESHOT is being used for this to work or
would you expect your patch to also address the case where
EPOLLONESHOT is not used?


Christof

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-06-13 23:27 ` Andrew Morton
@ 2012-06-18 21:58   ` Paton J. Lewis
  2012-06-19 23:42     ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Paton J. Lewis @ 2012-06-18 21:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi, Michael Kerrisk

cc Michael Kerrisk as recommended below.

At 6/13/2012 04:27 PM, Andrew Morton wrote:

>Let's cc Davide.
>
>On Mon, 11 Jun 2012 15:34:49 -0700
>Paton Lewis <palewis@adobe.com> wrote:
>
> >
> > It is not currently possible to reliably delete epoll items when using the
> > same epoll set from multiple threads. After calling epoll_ctl with
> > EPOLL_CTL_DEL, another thread might still be executing code related to an
> > event for that epoll item (in response to epoll_wait). Therefore 
> the deleting
> > thread does not know when it is safe to delete resources pertaining to the
> > associated epoll item because another thread might be using those 
> resources.
>
>We often solve this sort of thing with refcounting - the EPOLL_CTL_DEL
>will drop the refcount and, if that fell to zero, remove the object.
>So if the object is in use by another thread, that other thread will
>hold a refcount and that other thread will do the object removal when
>dropping its refcount.
>
>I don't know if that model can be used here?

Andrew, thank you for your response.

First, I should point out that the fundamental problem we are trying 
to solve is that userspace code needs to know whether or not an epoll 
item was in the ready queue at the time EPOLL_CTL_DEL was executed. 
Furthermore, we want to transfer that information back to userspace 
in a way that won't break existing code that uses epoll.

We could use a refcount to track whether the epoll item was in the 
ready list or not at the time EPOLL_CTL_DEL was received, but we 
still need a way to transfer that information back to userspace. 
Since the refcount would only ever be zero or one, epoll_ctl could 
set a bit in epoll_event.events (perhaps called EPOLLNOTREADY). 
However, this could cause a problem with any legacy code that relies 
on the fact that the epoll_ctl epoll_event parameter is ignored for 
EPOLL_CTL_DEL. Any such code which passed an invalid pointer for that 
parameter would suddenly generate a fault when running on the new 
kernel code, even though it worked fine in the past.

It seems that there are two ways to send information from the kernel 
back to userspace via the epoll_ctl API: via the return value or by 
modifying the epoll_event structure. In either case we are wary of 
changing the behavior of epoll_ctl in response to the existing 
control commands for fear of breaking legacy code. Therefore we 
recommended adding a new control word (EPOLL_CTL_DISABLE) to ensure 
that legacy code would not be affected.

I should point out that an alternative possibility would be to add a 
new control word, perhaps called EPOLL_CTL_DEL_AND_REPORT, that would 
function exactly like EPOLL_CTL_DEL except that it could return 
-EBUSY if the deleted item was not originally in the ready queue.

> > The deleting thread could wait an arbitrary amount of time after calling
> > epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
> > inefficient and could result in the destruction of resources before another
> > thread is done handling an event returned by epoll_wait.
> >
> > This patch introduces the new epoll_ctl command EPOLL_CTL_DISABLE, which
> > disables the associated epoll item and returns -EBUSY if the 
> epoll item is not
> > currently in the epoll ready queue. This allows multiple threads to use a
> > mutex to determine when it is safe to delete an epoll item and 
> its associated
> > resources. This allows epoll items to be deleted and closed efficiently and
> > without error.
> >
> > This patch has been tested against the following checklist:
> > http://kernelnewbies.org/UpstreamMerge/SubmitChecklist
> >
> > The following psuedocode attempts to illustrate the problem as well as the
> > solution provided by this patch.
> >
> >
> > Pseudocode for the deleting thread:
>
>It would be nice to start accumulating epoll test code in
>tools/testing/selftests/epoll.  If you have something which can be used
>to kick that off, please do consider preparing it.

I'm happy to contribute; thanks for suggesting that.

>Also, a user-visible feature wuch as this should be documented in Linux
>manpages.  So please do cc Michael Kerrisk <mtk.manpages@gmail.com> as
>we work on this.

Added, thank you.

Since we're discussing a possible userspace kernel API change, is 
there someone on the GCC team who we should also cc?

Thank you,
Pat


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-06-16 18:47 ` Christof Meerwald
@ 2012-06-18 23:24   ` Paton J. Lewis
  2012-06-19 18:17     ` Christof Meerwald
  0 siblings, 1 reply; 12+ messages in thread
From: Paton J. Lewis @ 2012-06-18 23:24 UTC (permalink / raw)
  To: Christof Meerwald
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi

Christof,

We believe that EPOLLONESHOT is required in order to make any 
sensible use of calling epoll_wait on a single epoll set concurrently 
in multiple threads. So in that sense the new functionality is really 
only useful for coordinating resource deletion between threads when 
using EPOLLONESHOT. However, when EPOLLONESHOT is not used, 
EPOLL_CTL_DISABLE still disables the item, and can be used solely for 
that purpose if the caller intends to re-enable the item via 
EPOLL_CTL_MOD and doesn't wish to incur the additional overhead 
associated with deleting and re-adding an item.

Thank you,
Pat

At 6/16/2012 11:47 AM, Christof Meerwald wrote:
>On Mon, 11 Jun 2012 15:34:49 -0700, Paton Lewis wrote:
> > This patch introduces the new epoll_ctl command EPOLL_CTL_DISABLE, which
> > disables the associated epoll item and returns -EBUSY if the 
> epoll item is not
> > currently in the epoll ready queue. This allows multiple threads to use a
> > mutex to determine when it is safe to delete an epoll item and 
> its associated
> > resources. This allows epoll items to be deleted and closed efficiently and
> > without error.
>
>Do you assume that EPOLLONESHOT is being used for this to work or
>would you expect your patch to also address the case where
>EPOLLONESHOT is not used?
>
>
>Christof
>
>--
>
>http://cmeerw.org                             sip:cmeerw at cmeerw.org
>mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-06-18 23:24   ` Paton J. Lewis
@ 2012-06-19 18:17     ` Christof Meerwald
  2012-06-29 21:43       ` Paton J. Lewis
  0 siblings, 1 reply; 12+ messages in thread
From: Christof Meerwald @ 2012-06-19 18:17 UTC (permalink / raw)
  To: Paton J. Lewis
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi

Hi Paton,

On Mon, Jun 18, 2012 at 04:24:35PM -0700, Paton J. Lewis wrote:
> We believe that EPOLLONESHOT is required in order to make any
> sensible use of calling epoll_wait on a single epoll set
> concurrently in multiple threads.

I guess we have to disagree here - though it might be more difficult.


> >On Mon, 11 Jun 2012 15:34:49 -0700, Paton Lewis wrote:
> >> This patch introduces the new epoll_ctl command EPOLL_CTL_DISABLE, which
> >> disables the associated epoll item and returns -EBUSY if the
> >epoll item is not
> >> currently in the epoll ready queue. This allows multiple threads to use a
> >> mutex to determine when it is safe to delete an epoll item and
> >its associated
> >> resources. This allows epoll items to be deleted and closed efficiently and
> >> without error.

Maybe I am missing something here (as I am not really familiar with
the kernel internals), but I don't really understand the logic behind
your patch. Isn't the "expected" case that the item is not on the
ready list and no I/O is being processed for that item?

So I think instead of checking for the item being on the ready list,
checking for the event mask would make more sense for me, e.g.

  if (!(epi->event.events & ~EP_PRIVATE_BITS))


But, taking one step back - wouldn't an alternative approach be to add
some mechanism to allow a thread to post a user-event for an fd? So in
delete_epoll_item you would post a user event (e.g. EPOLLUSER) for the
fd which you can then handle in your epoll_wait processing thread -
with no additional synchronisation necessary.

However, this would still require EPOLLONESHOT to be useful for memory
management.


Christof

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-06-18 21:58   ` Paton J. Lewis
@ 2012-06-19 23:42     ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2012-06-19 23:42 UTC (permalink / raw)
  To: Paton J. Lewis
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi, Michael Kerrisk

On Mon, 18 Jun 2012 14:58:44 -0700
"Paton J. Lewis" <palewis@adobe.com> wrote:

> cc Michael Kerrisk as recommended below.
> 
> At 6/13/2012 04:27 PM, Andrew Morton wrote:
> 
> >Let's cc Davide.
> >
> >On Mon, 11 Jun 2012 15:34:49 -0700
> >Paton Lewis <palewis@adobe.com> wrote:
> >
> > >
> > > It is not currently possible to reliably delete epoll items when using the
> > > same epoll set from multiple threads. After calling epoll_ctl with
> > > EPOLL_CTL_DEL, another thread might still be executing code related to an
> > > event for that epoll item (in response to epoll_wait). Therefore 
> > the deleting
> > > thread does not know when it is safe to delete resources pertaining to the
> > > associated epoll item because another thread might be using those 
> > resources.
> >
> >We often solve this sort of thing with refcounting - the EPOLL_CTL_DEL
> >will drop the refcount and, if that fell to zero, remove the object.
> >So if the object is in use by another thread, that other thread will
> >hold a refcount and that other thread will do the object removal when
> >dropping its refcount.
> >
> >I don't know if that model can be used here?
> 
> Andrew, thank you for your response.
> 
> First, I should point out that the fundamental problem we are trying 
> to solve is that userspace code needs to know whether or not an epoll 
> item was in the ready queue at the time EPOLL_CTL_DEL was executed. 
> Furthermore, we want to transfer that information back to userspace 
> in a way that won't break existing code that uses epoll.
> 
> We could use a refcount to track whether the epoll item was in the 
> ready list or not at the time EPOLL_CTL_DEL was received, but we 
> still need a way to transfer that information back to userspace. 
> Since the refcount would only ever be zero or one, epoll_ctl could 
> set a bit in epoll_event.events (perhaps called EPOLLNOTREADY). 
> However, this could cause a problem with any legacy code that relies 
> on the fact that the epoll_ctl epoll_event parameter is ignored for 
> EPOLL_CTL_DEL. Any such code which passed an invalid pointer for that 
> parameter would suddenly generate a fault when running on the new 
> kernel code, even though it worked fine in the past.
> 
> It seems that there are two ways to send information from the kernel 
> back to userspace via the epoll_ctl API: via the return value or by 
> modifying the epoll_event structure. In either case we are wary of 
> changing the behavior of epoll_ctl in response to the existing 
> control commands for fear of breaking legacy code. Therefore we 
> recommended adding a new control word (EPOLL_CTL_DISABLE) to ensure 
> that legacy code would not be affected.
> 
> I should point out that an alternative possibility would be to add a 
> new control word, perhaps called EPOLL_CTL_DEL_AND_REPORT, that would 
> function exactly like EPOLL_CTL_DEL except that it could return 
> -EBUSY if the deleted item was not originally in the ready queue.

hm, OK, thanks.

I need to take a closer look at this - Davide appears to have gone awol
and nobody else regularly works on epoll.  Please resend when covenient.

> >Also, a user-visible feature wuch as this should be documented in Linux
> >manpages.  So please do cc Michael Kerrisk <mtk.manpages@gmail.com> as
> >we work on this.
> 
> Added, thank you.
> 
> Since we're discussing a possible userspace kernel API change, is 
> there someone on the GCC team who we should also cc?

That would be glibc.  glibc does have epoll support and getting
feedback from the glibc developers would be valuable.  Please cc
libc-alpha@sourceware.org for this.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-06-19 18:17     ` Christof Meerwald
@ 2012-06-29 21:43       ` Paton J. Lewis
  2012-07-09 18:45         ` Christof Meerwald
  2012-08-03  1:37         ` Paton J. Lewis
  0 siblings, 2 replies; 12+ messages in thread
From: Paton J. Lewis @ 2012-06-29 21:43 UTC (permalink / raw)
  To: Christof Meerwald
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi

At 6/19/2012 11:17 AM, Christof Meerwald wrote:
>Hi Paton,
>
>On Mon, Jun 18, 2012 at 04:24:35PM -0700, Paton J. Lewis wrote:
> > We believe that EPOLLONESHOT is required in order to make any
> > sensible use of calling epoll_wait on a single epoll set
> > concurrently in multiple threads.
>
>I guess we have to disagree here - though it might be more difficult.
>
>
> > >On Mon, 11 Jun 2012 15:34:49 -0700, Paton Lewis wrote:
> > >> This patch introduces the new epoll_ctl command EPOLL_CTL_DISABLE, which
> > >> disables the associated epoll item and returns -EBUSY if the
> > >epoll item is not
> > >> currently in the epoll ready queue. This allows multiple 
> threads to use a
> > >> mutex to determine when it is safe to delete an epoll item and
> > >its associated
> > >> resources. This allows epoll items to be deleted and closed 
> efficiently and
> > >> without error.
>
>Maybe I am missing something here (as I am not really familiar with
>the kernel internals), but I don't really understand the logic behind
>your patch. Isn't the "expected" case that the item is not on the
>ready list and no I/O is being processed for that item?

Consider the case where we want to have a set of threads waiting for 
'write' events on a set of pipes or sockets. We have no control over 
when code on the other side of a pipe or socket might write into it, 
and so have no control over when one of the threads calling 
epoll_wait will receive events relative to the timing of the thread 
that is attempting to cancel the pending read operation.

I believe there is no "expected" case, because the probability for an 
item to be on the ready list is a complex function of the number of 
file descriptors being monitored, the frequency at which those 
descriptors receive events, the number of threads calling epoll_wait, 
and the complexity of the code responding to events.

Therefore for some clients of epoll it will be the case that items 
will often be on the ready list, and for others it will not.

>So I think instead of checking for the item being on the ready list,
>checking for the event mask would make more sense for me, e.g.
>
>   if (!(epi->event.events & ~EP_PRIVATE_BITS))

I think that would be fine. However, the inlined function 
ep_is_linked boils down to a call to the inline function list_empty, 
which is just a single comparison. I haven't compared the 
disassembly, but I would expect the two methods to be roughly 
equivalent in terms of performance. Given that the operation of 
deleting an epoll item is likely to be an exceptional circumstance 
and therefore not performance-critical, would it be better to test 
against ep_is_linked for clarity's sake in the code?

However, I believe these discussions are rendered moot by your 
suggestion below:


>But, taking one step back - wouldn't an alternative approach be to add
>some mechanism to allow a thread to post a user-event for an fd? So in
>delete_epoll_item you would post a user event (e.g. EPOLLUSER) for the
>fd which you can then handle in your epoll_wait processing thread -
>with no additional synchronisation necessary.

I think this is an excellent suggestion, and in fact your proposal is 
more similar to what Windows provides when solving this problem. I'll 
test this idea out with our code and get back to you. Is there an 
existing kernel technique that you would recommend for posting a user 
event for an fd, or should I explore using epoll_ctl with EPOLL_CTL_MOD?

I apologize in advance for any delay in responding to you; our office 
is closed next week.

Thank you,
Pat

>However, this would still require EPOLLONESHOT to be useful for memory
>management.
>
>
>Christof
>
>--
>
>http://cmeerw.org                              sip:cmeerw at cmeerw.org
>mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-06-29 21:43       ` Paton J. Lewis
@ 2012-07-09 18:45         ` Christof Meerwald
  2012-08-03  1:37         ` Paton J. Lewis
  1 sibling, 0 replies; 12+ messages in thread
From: Christof Meerwald @ 2012-07-09 18:45 UTC (permalink / raw)
  To: Paton J. Lewis
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi

On Fri, Jun 29, 2012 at 02:43:06PM -0700, Paton J. Lewis wrote:
> At 6/19/2012 11:17 AM, Christof Meerwald wrote:
> >But, taking one step back - wouldn't an alternative approach be to add
> >some mechanism to allow a thread to post a user-event for an fd? So in
> >delete_epoll_item you would post a user event (e.g. EPOLLUSER) for the
> >fd which you can then handle in your epoll_wait processing thread -
> >with no additional synchronisation necessary.
> I think this is an excellent suggestion, and in fact your proposal
> is more similar to what Windows provides when solving this problem.
> I'll test this idea out with our code and get back to you. Is there
> an existing kernel technique that you would recommend for posting a
> user event for an fd, or should I explore using epoll_ctl with
> EPOLL_CTL_MOD?

I don't know about any existing kernel technique for this, but my gut
feeling would be a new op value for epoll_ctl, maybe something like
EPOLL_CTL_TRIGGER.


Christof

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-06-29 21:43       ` Paton J. Lewis
  2012-07-09 18:45         ` Christof Meerwald
@ 2012-08-03  1:37         ` Paton J. Lewis
  2012-08-14 20:21           ` Christof Meerwald
  1 sibling, 1 reply; 12+ messages in thread
From: Paton J. Lewis @ 2012-08-03  1:37 UTC (permalink / raw)
  To: Christof Meerwald
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi

Christof,

I notice that Windows (via I/O Completion Ports) and both BSD and 
OS/X (via kqueue) all appear to have support for both of the concepts 
we have been discussing: 1) the ability to disable epoll items, and 
2) the ability to send custom events. This suggests that either 
solution may be acceptable to a broader community.

I implemented and tested the method of using a new custom epoll event 
as a way to signal that an epoll item should be deleted. Although 
this alternative technique works, the approach has several drawbacks 
that I feel make it less desirable than the original proposal.

My first concern is about code clarity. Using a custom event to 
delete an event type (either EPOLLIN or EPOLLOUT) from an epoll item 
requires that functionality to be split across two areas of code: the 
code that requests the deletion (via the call to epoll_ctl), and the 
code that responds to it (via epoll_wait).

However, my main concern is about performance. Handling a custom 
event means that each return from epoll_wait requires the responding 
thread to check for possible custom events, which in the case of 
deletion is going to be relatively rare. Thus code which was once 
purely concerned with responding to I/O events must now spend a 
fraction of its time testing for exceptional conditions. In addition, 
handling deletion in this manner now requires a thread or context switch.

Given the drawbacks listed above, and the kernel design philosophy of 
only implementing what is actually needed, I would argue for sticking 
with the original EPOLL_CTL_DISABLE proposal for now.

Pat

At 6/29/2012 02:43 PM, Paton J. Lewis wrote:
>At 6/19/2012 11:17 AM, Christof Meerwald wrote:
>>Hi Paton,
>>
>>On Mon, Jun 18, 2012 at 04:24:35PM -0700, Paton J. Lewis wrote:
>> > We believe that EPOLLONESHOT is required in order to make any
>> > sensible use of calling epoll_wait on a single epoll set
>> > concurrently in multiple threads.
>>
>>I guess we have to disagree here - though it might be more difficult.
>>
>>
>> > >On Mon, 11 Jun 2012 15:34:49 -0700, Paton Lewis wrote:
>> > >> This patch introduces the new epoll_ctl command 
>> EPOLL_CTL_DISABLE, which
>> > >> disables the associated epoll item and returns -EBUSY if the
>> > >epoll item is not
>> > >> currently in the epoll ready queue. This allows multiple 
>> threads to use a
>> > >> mutex to determine when it is safe to delete an epoll item and
>> > >its associated
>> > >> resources. This allows epoll items to be deleted and closed 
>> efficiently and
>> > >> without error.
>>
>>Maybe I am missing something here (as I am not really familiar with
>>the kernel internals), but I don't really understand the logic behind
>>your patch. Isn't the "expected" case that the item is not on the
>>ready list and no I/O is being processed for that item?
>
>Consider the case where we want to have a set of threads waiting for 
>'write' events on a set of pipes or sockets. We have no control over 
>when code on the other side of a pipe or socket might write into it, 
>and so have no control over when one of the threads calling 
>epoll_wait will receive events relative to the timing of the thread 
>that is attempting to cancel the pending read operation.
>
>I believe there is no "expected" case, because the probability for 
>an item to be on the ready list is a complex function of the number 
>of file descriptors being monitored, the frequency at which those 
>descriptors receive events, the number of threads calling 
>epoll_wait, and the complexity of the code responding to events.
>
>Therefore for some clients of epoll it will be the case that items 
>will often be on the ready list, and for others it will not.
>
>>So I think instead of checking for the item being on the ready list,
>>checking for the event mask would make more sense for me, e.g.
>>
>>   if (!(epi->event.events & ~EP_PRIVATE_BITS))
>
>I think that would be fine. However, the inlined function 
>ep_is_linked boils down to a call to the inline function list_empty, 
>which is just a single comparison. I haven't compared the 
>disassembly, but I would expect the two methods to be roughly 
>equivalent in terms of performance. Given that the operation of 
>deleting an epoll item is likely to be an exceptional circumstance 
>and therefore not performance-critical, would it be better to test 
>against ep_is_linked for clarity's sake in the code?
>
>However, I believe these discussions are rendered moot by your 
>suggestion below:
>
>
>>But, taking one step back - wouldn't an alternative approach be to add
>>some mechanism to allow a thread to post a user-event for an fd? So in
>>delete_epoll_item you would post a user event (e.g. EPOLLUSER) for the
>>fd which you can then handle in your epoll_wait processing thread -
>>with no additional synchronisation necessary.
>
>I think this is an excellent suggestion, and in fact your proposal 
>is more similar to what Windows provides when solving this problem. 
>I'll test this idea out with our code and get back to you. Is there 
>an existing kernel technique that you would recommend for posting a 
>user event for an fd, or should I explore using epoll_ctl with EPOLL_CTL_MOD?
>
>I apologize in advance for any delay in responding to you; our 
>office is closed next week.
>
>Thank you,
>Pat
>
>>However, this would still require EPOLLONESHOT to be useful for memory
>>management.
>>
>>
>>Christof
>>
>>--
>>
>>http://cmeerw.org                             sip:cmeerw at cmeerw.org
>>mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-08-03  1:37         ` Paton J. Lewis
@ 2012-08-14 20:21           ` Christof Meerwald
  2012-08-14 22:13             ` Paton J. Lewis
  0 siblings, 1 reply; 12+ messages in thread
From: Christof Meerwald @ 2012-08-14 20:21 UTC (permalink / raw)
  To: Paton J. Lewis
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi

Hi Paton,

On Thu, Aug 02, 2012 at 06:37:06PM -0700, Paton J. Lewis wrote:
[...]
> My first concern is about code clarity. Using a custom event to
> delete an event type (either EPOLLIN or EPOLLOUT) from an epoll item
> requires that functionality to be split across two areas of code:
> the code that requests the deletion (via the call to epoll_ctl), and
> the code that responds to it (via epoll_wait).

But don't you have a similar problem in your proposal as well as you
might get an EBUSY when trying to disabling the item - in which case
you would have to do the deletion in the epoll_wait loop.

> However, my main concern is about performance. Handling a custom
> event means that each return from epoll_wait requires the responding
> thread to check for possible custom events, which in the case of
> deletion is going to be relatively rare. Thus code which was once
> purely concerned with responding to I/O events must now spend a
> fraction of its time testing for exceptional conditions. In
> addition, handling deletion in this manner now requires a thread or
> context switch.

But in your initial proposal you also had the code checking for
deletion in the epoll_wait loop.


> Given the drawbacks listed above, and the kernel design philosophy
> of only implementing what is actually needed, I would argue for
> sticking with the original EPOLL_CTL_DISABLE proposal for now.

I have finally had some chance to play around with your patch a bit
and I really think that you don't want to check for
ep_is_linked(&epi->rdllink) in ep_disable as I don't see that this
would provide any useful semantics with respect to race-conditions.
I.e. consider the point in the epoll_wait loop just after you have
re-enabled to item - in this case ep_disable would (almost certainly)
return EBUSY, but there is no guarantee that epoll_wait will be woken
up on the next iteration.

As I mentioned, I think it would be much more useful to check for
"epi->event.events & ~EP_PRIVATE_BITS" instead which I believe would
provide more useful semantics.


Christof

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] epoll: Improved support for multi-threaded clients
  2012-08-14 20:21           ` Christof Meerwald
@ 2012-08-14 22:13             ` Paton J. Lewis
  0 siblings, 0 replies; 12+ messages in thread
From: Paton J. Lewis @ 2012-08-14 22:13 UTC (permalink / raw)
  To: Christof Meerwald
  Cc: Alexander Viro, Jason Baron, linux-fsdevel, linux-kernel,
	Paul Holland, Davide Libenzi

At 8/14/2012 01:21 PM, Christof Meerwald wrote:
>Hi Paton,
>
>On Thu, Aug 02, 2012 at 06:37:06PM -0700, Paton J. Lewis wrote:
>[...]
> > My first concern is about code clarity. Using a custom event to
> > delete an event type (either EPOLLIN or EPOLLOUT) from an epoll item
> > requires that functionality to be split across two areas of code:
> > the code that requests the deletion (via the call to epoll_ctl), and
> > the code that responds to it (via epoll_wait).
>
>But don't you have a similar problem in your proposal as well as you
>might get an EBUSY when trying to disabling the item - in which case
>you would have to do the deletion in the epoll_wait loop.

Good point.

> > However, my main concern is about performance. Handling a custom
> > event means that each return from epoll_wait requires the responding
> > thread to check for possible custom events, which in the case of
> > deletion is going to be relatively rare. Thus code which was once
> > purely concerned with responding to I/O events must now spend a
> > fraction of its time testing for exceptional conditions. In
> > addition, handling deletion in this manner now requires a thread or
> > context switch.
>
>But in your initial proposal you also had the code checking for
>deletion in the epoll_wait loop.

Also true. However, I believe the context switch is always required 
by the custom message passing technique, but will not always happen 
when using the event disabling technique.


> > Given the drawbacks listed above, and the kernel design philosophy
> > of only implementing what is actually needed, I would argue for
> > sticking with the original EPOLL_CTL_DISABLE proposal for now.
>
>I have finally had some chance to play around with your patch a bit
>and I really think that you don't want to check for
>ep_is_linked(&epi->rdllink) in ep_disable as I don't see that this
>would provide any useful semantics with respect to race-conditions.
>I.e. consider the point in the epoll_wait loop just after you have
>re-enabled to item - in this case ep_disable would (almost certainly)
>return EBUSY, but there is no guarantee that epoll_wait will be woken
>up on the next iteration.
>
>As I mentioned, I think it would be much more useful to check for
>"epi->event.events & ~EP_PRIVATE_BITS" instead which I believe would
>provide more useful semantics.

You are correct. Thanks for being patient and persistent here. I 
discovered this problem myself last week during testing, and I am 
planning to post a v2 patch proposal that includes this fix.

I am also working on an epoll self-test as Andrew Morton suggested. 
I'm going to finish that before re-submitting the EPOLL_CTL_DISABLE 
patch to help reduce the possibility that the v2 patch still contains bugs.

Pat


>Christof
>
>--
>
>http://cmeerw.org                              sip:cmeerw at cmeerw.org
>mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-08-14 22:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-11 22:34 [PATCH] epoll: Improved support for multi-threaded clients Paton Lewis
2012-06-13 23:27 ` Andrew Morton
2012-06-18 21:58   ` Paton J. Lewis
2012-06-19 23:42     ` Andrew Morton
2012-06-16 18:47 ` Christof Meerwald
2012-06-18 23:24   ` Paton J. Lewis
2012-06-19 18:17     ` Christof Meerwald
2012-06-29 21:43       ` Paton J. Lewis
2012-07-09 18:45         ` Christof Meerwald
2012-08-03  1:37         ` Paton J. Lewis
2012-08-14 20:21           ` Christof Meerwald
2012-08-14 22:13             ` Paton J. Lewis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.