linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFCv7 PATCH 0/4] Add poll_requested_events() function.
@ 2012-02-02 10:26 Hans Verkuil
  2012-02-02 10:26 ` [RFCv7 PATCH 1/4] eventpoll: set key to the correct events mask Hans Verkuil
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Hans Verkuil @ 2012-02-02 10:26 UTC (permalink / raw)
  To: linux-media
  Cc: Al Viro, Jonathan Corbet, Andrew Morton, Davide Libenzi,
	linux-kernel, linux-fsdevel, David S. Miller, Enke Chen

Hi all,

This is the seventh version of this patch series (the fifth and sixth where
never posted and where internal iterations only).

Al Viro had concerns about silent API changes. I have made an extensive
analysis of that in my comments in patch 2/4.

This patch series is rebased to v3.3-rc2. The changes compared to the
previously posted version are:

- I have renamed the qproc field to pq_proc to prevent any driver that tries
  to access that directly to fail. No kernel driver does this, BTW.

- I added a new poll_does_not_wait() inline that returns true if it is known
  that poll() will not wait on return. This removes the last reason for
  looking inside the poll_table struct. include/net/sock.h has been adapted
  to use this new inline (and it is the only place inside the kernel that
  need this).

I hope that the analysis I made answers any remaining concerns about possible
silent API changes.

This patch series is also available here:

http://git.linuxtv.org/hverkuil/media_tree.git/shortlog/refs/heads/pollv7

It was suggested to me that creating a new poll system call might be an option
as well. I've attempted that as well and code implementing that can be found
here:

http://git.linuxtv.org/hverkuil/media_tree.git/shortlog/refs/heads/pollwithkey

However, I think this turned out to be very messy. And because some drivers
call the poll fop directly or through some framework I could not be certain I
was not introducing any errors.

If it is really required to change the API in some way, then I would suggest
changing this:

typedef struct poll_table_struct {
        poll_queue_proc pq_proc;
        unsigned long key;
} poll_table;

to this:

struct poll_table {
        poll_queue_proc pq_proc;
        unsigned long key;
};

and adapting all users.

However, I honestly do not think this is necessary at all. But if it is the
only way to get this in, then I'll do the work. The media/video subsystem really
needs this functionality. Also note that previous versions of this patch have
been in linux-next for months now.

The first version of this patch was posted July 1st, 2011. I really hope that
it won't take another six months to get a review from a fs developer. As this
LWN article (http://lwn.net/Articles/450658/) said: 'There has been little
discussion of the patch; it doesn't seem like there is any real reason for it
not to go in for 3.1.'

The earliest this can go in now is 3.4. The only reason it takes so long is
that it has been almost impossible to get a Ack or comments or even just a
simple reply from the fs developers. That is really frustrating, I'm sorry
to say.

Anyway, comments, reviews, etc. are very welcome.

Regards,

	Hans


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFCv7 PATCH 1/4] eventpoll: set key to the correct events mask.
  2012-02-02 10:26 [RFCv7 PATCH 0/4] Add poll_requested_events() function Hans Verkuil
@ 2012-02-02 10:26 ` Hans Verkuil
  2012-02-02 10:26   ` [RFCv7 PATCH 2/4] poll: add poll_requested_events() and poll_does_not_wait() functions Hans Verkuil
                     ` (2 more replies)
  2012-02-02 22:48 ` [RFCv7 PATCH 0/4] Add poll_requested_events() function Andrew Morton
  2012-02-03  1:58 ` Enke Chen
  2 siblings, 3 replies; 9+ messages in thread
From: Hans Verkuil @ 2012-02-02 10:26 UTC (permalink / raw)
  To: linux-media
  Cc: Al Viro, Jonathan Corbet, Andrew Morton, Davide Libenzi,
	linux-kernel, linux-fsdevel, David S. Miller, Enke Chen,
	Hans Verkuil

From: Hans Verkuil <hans.verkuil@cisco.com>

The eventpoll implementation always left the key field at ~0 instead of
using the requested events mask.

This was changed so the key field now contains the actual events that
should be polled for as set by the caller.

This information is needed by drivers that need to e.g. start DMA if the
caller wants to poll for incoming data as opposed to, say, exceptions.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
---
 fs/eventpoll.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index aabdfc3..ca47608 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -682,9 +682,12 @@ static int ep_read_events_proc(struct eventpoll *ep, struct list_head *head,
 			       void *priv)
 {
 	struct epitem *epi, *tmp;
+	poll_table pt;
 
+	init_poll_funcptr(&pt, NULL);
 	list_for_each_entry_safe(epi, tmp, head, rdllink) {
-		if (epi->ffd.file->f_op->poll(epi->ffd.file, NULL) &
+		pt.key = epi->event.events;
+		if (epi->ffd.file->f_op->poll(epi->ffd.file, &pt) &
 		    epi->event.events)
 			return POLLIN | POLLRDNORM;
 		else {
@@ -1065,6 +1068,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
 	/* Initialize the poll table using the queue callback */
 	epq.epi = epi;
 	init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);
+	epq.pt.key = event->events;
 
 	/*
 	 * Attach the item to the poll hooks and get current event bits.
@@ -1159,6 +1163,9 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 {
 	int pwake = 0;
 	unsigned int revents;
+	poll_table pt;
+
+	init_poll_funcptr(&pt, NULL);
 
 	/*
 	 * Set the new event interest mask before calling f_op->poll();
@@ -1166,13 +1173,14 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 	 * f_op->poll() call and the new event set registering.
 	 */
 	epi->event.events = event->events;
+	pt.key = event->events;
 	epi->event.data = event->data; /* protected by mtx */
 
 	/*
 	 * Get current event bits. We can safely use the file* here because
 	 * its usage count has been increased by the caller of this function.
 	 */
-	revents = epi->ffd.file->f_op->poll(epi->ffd.file, NULL);
+	revents = epi->ffd.file->f_op->poll(epi->ffd.file, &pt);
 
 	/*
 	 * If the item is "hot" and it is not registered inside the ready
@@ -1207,6 +1215,9 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
 	unsigned int revents;
 	struct epitem *epi;
 	struct epoll_event __user *uevent;
+	poll_table pt;
+
+	init_poll_funcptr(&pt, NULL);
 
 	/*
 	 * We can loop without lock because we are passed a task private list.
@@ -1219,7 +1230,8 @@ static int ep_send_events_proc(struct eventpoll *ep, struct list_head *head,
 
 		list_del_init(&epi->rdllink);
 
-		revents = epi->ffd.file->f_op->poll(epi->ffd.file, NULL) &
+		pt.key = epi->event.events;
+		revents = epi->ffd.file->f_op->poll(epi->ffd.file, &pt) &
 			epi->event.events;
 
 		/*
-- 
1.7.8.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFCv7 PATCH 2/4] poll: add poll_requested_events() and poll_does_not_wait() functions
  2012-02-02 10:26 ` [RFCv7 PATCH 1/4] eventpoll: set key to the correct events mask Hans Verkuil
@ 2012-02-02 10:26   ` Hans Verkuil
  2012-02-02 22:48     ` Andrew Morton
  2012-02-02 10:26   ` [RFCv7 PATCH 3/4] net/sock.h: use poll_does_not_wait() in sock_poll_wait() Hans Verkuil
  2012-02-02 10:26   ` [RFCv7 PATCH 4/4] unix/af_unix.c: use poll_requested_events() in unix_dgram_poll() Hans Verkuil
  2 siblings, 1 reply; 9+ messages in thread
From: Hans Verkuil @ 2012-02-02 10:26 UTC (permalink / raw)
  To: linux-media
  Cc: Al Viro, Jonathan Corbet, Andrew Morton, Davide Libenzi,
	linux-kernel, linux-fsdevel, David S. Miller, Enke Chen,
	Hans Verkuil

From: Hans Verkuil <hans.verkuil@cisco.com>

In some cases the poll() implementation in a driver has to do different
things depending on the events the caller wants to poll for. An example is
when a driver needs to start a DMA engine if the caller polls for POLLIN,
but doesn't want to do that if POLLIN is not requested but instead only
POLLOUT or POLLPRI is requested. This is something that can happen in the
video4linux subsystem.

Unfortunately, the current epoll/poll/select implementation doesn't provide
that information reliably. The poll_table_struct does have it: it has a key
field with the event mask. But once a poll() call matches one or more bits
of that mask any following poll() calls are passed a NULL poll_table_struct
pointer.

The solution is to set the qproc field to NULL in poll_table_struct once
poll() matches the events, not the poll_table_struct pointer itself. That
way drivers can obtain the mask through a new poll_requested_events inline.

The poll_table_struct can still be NULL since some kernel code calls it
internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h). In
that case poll_requested_events() returns ~0 (i.e. all events).

Very rarely drivers might want to know whether poll_wait will actually wait.
If another earlier file descriptor in the set already matched the events the
caller wanted to wait for, then the kernel will return from the select() call
without waiting.

A new helper function poll_does_not_wait() is added that drivers can use to
detect this situation.

Drivers should no longer access any of the poll_table internals, but use the
poll_requested_events() and poll_does_not_wait() access functions instead.

Since the behavior of the qproc field changes with this patch (since this
function pointer can now be NULL when that wasn't possible in the past) I
have renamed that field from qproc to pq_proc. Any out-of-tree driver that
uses it will now fail to compile.

Some notes regarding the correctness of this patch: the driver's poll()
function is called with a 'struct poll_table_struct *wait' argument. This
pointer may or may not be NULL, drivers can never rely on it being one or
the other as that depends on whether or not an earlier file descriptor in
the select()'s fdset matched the requested events.

There are only three things a driver can do with the wait argument:

1) obtain the key field:

	events = wait ? wait->key : ~0;

   This will still work although it should be replaced with the new
   poll_requested_events() function (which does exactly the same).
   This will now even work better, since wait is no longer set to NULL
   unnecessarily.

2) use the qproc callback. This could be deadly since qproc can now be
   NULL. Renaming qproc should prevent this from happening. There are no
   kernel drivers that actually access this callback directly, BTW.

3) test whether wait == NULL to determine whether poll would return without
   waiting. This is no longer sufficient as the correct test is now
   wait == NULL || wait->pq_proc == NULL.

   However, the worst that can happen here is a slight performance hit in
   the case where wait != NULL and wait->pq_proc == NULL. In that case the
   driver will assume that poll_wait() will actually add the fd to the set
   of waiting file descriptors. Of course, poll_wait() will not do that
   since it tests for wait->pq_proc. This will not break anything, though.

   There is only one place in the whole kernel where this happens
   (sock_poll_wait() in include/net/sock.h) and that code will be replaced
   by a call to poll_does_not_wait() in the next patch.

   Note that even if wait->pq_proc != NULL drivers cannot rely on poll_wait()
   actually waiting. The next file descriptor from the set might match the
   event mask and thus any possible waits will never happen.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Jonathan Corbet <corbet@lwn.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 fs/select.c          |   38 +++++++++++++++++---------------------
 include/linux/poll.h |   35 ++++++++++++++++++++++++++++++-----
 2 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index d33418f..4bcc3a4 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -386,13 +386,11 @@ get_max:
 static inline void wait_key_set(poll_table *wait, unsigned long in,
 				unsigned long out, unsigned long bit)
 {
-	if (wait) {
-		wait->key = POLLEX_SET;
-		if (in & bit)
-			wait->key |= POLLIN_SET;
-		if (out & bit)
-			wait->key |= POLLOUT_SET;
-	}
+	wait->key = POLLEX_SET;
+	if (in & bit)
+		wait->key |= POLLIN_SET;
+	if (out & bit)
+		wait->key |= POLLOUT_SET;
 }
 
 int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
@@ -414,7 +412,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
 	poll_initwait(&table);
 	wait = &table.pt;
 	if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
-		wait = NULL;
+		wait->pq_proc = NULL;
 		timed_out = 1;
 	}
 
@@ -459,17 +457,17 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
 					if ((mask & POLLIN_SET) && (in & bit)) {
 						res_in |= bit;
 						retval++;
-						wait = NULL;
+						wait->pq_proc = NULL;
 					}
 					if ((mask & POLLOUT_SET) && (out & bit)) {
 						res_out |= bit;
 						retval++;
-						wait = NULL;
+						wait->pq_proc = NULL;
 					}
 					if ((mask & POLLEX_SET) && (ex & bit)) {
 						res_ex |= bit;
 						retval++;
-						wait = NULL;
+						wait->pq_proc = NULL;
 					}
 				}
 			}
@@ -481,7 +479,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
 				*rexp = res_ex;
 			cond_resched();
 		}
-		wait = NULL;
+		wait->pq_proc = NULL;
 		if (retval || timed_out || signal_pending(current))
 			break;
 		if (table.error) {
@@ -720,7 +718,7 @@ struct poll_list {
  * interested in events matching the pollfd->events mask, and the result
  * matching that mask is both recorded in pollfd->revents and returned. The
  * pwait poll_table will be used by the fd-provided poll handler for waiting,
- * if non-NULL.
+ * if pwait->pq_proc is non-NULL.
  */
 static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait)
 {
@@ -738,9 +736,7 @@ static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait)
 		if (file != NULL) {
 			mask = DEFAULT_POLLMASK;
 			if (file->f_op && file->f_op->poll) {
-				if (pwait)
-					pwait->key = pollfd->events |
-							POLLERR | POLLHUP;
+				pwait->key = pollfd->events | POLLERR | POLLHUP;
 				mask = file->f_op->poll(file, pwait);
 			}
 			/* Mask out unneeded events. */
@@ -763,7 +759,7 @@ static int do_poll(unsigned int nfds,  struct poll_list *list,
 
 	/* Optimise the no-wait case */
 	if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
-		pt = NULL;
+		pt->pq_proc = NULL;
 		timed_out = 1;
 	}
 
@@ -781,22 +777,22 @@ static int do_poll(unsigned int nfds,  struct poll_list *list,
 			for (; pfd != pfd_end; pfd++) {
 				/*
 				 * Fish for events. If we found one, record it
-				 * and kill the poll_table, so we don't
+				 * and kill poll_table->pq_proc, so we don't
 				 * needlessly register any other waiters after
 				 * this. They'll get immediately deregistered
 				 * when we break out and return.
 				 */
 				if (do_pollfd(pfd, pt)) {
 					count++;
-					pt = NULL;
+					pt->pq_proc = NULL;
 				}
 			}
 		}
 		/*
 		 * All waiters have already been registered, so don't provide
-		 * a poll_table to them on the next loop iteration.
+		 * a poll_table->pq_proc to them on the next loop iteration.
 		 */
-		pt = NULL;
+		pt->pq_proc = NULL;
 		if (!count) {
 			count = wait->error;
 			if (signal_pending(current))
diff --git a/include/linux/poll.h b/include/linux/poll.h
index cf40010..08b7ea5 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -32,20 +32,45 @@ struct poll_table_struct;
  */
 typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_table_struct *);
 
+/*
+ * Do not touch the structure directly, use the access functions
+ * poll_does_not_wait() and poll_requested_events() instead.
+ */
 typedef struct poll_table_struct {
-	poll_queue_proc qproc;
+	poll_queue_proc pq_proc;
 	unsigned long key;
 } poll_table;
 
 static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
 {
-	if (p && wait_address)
-		p->qproc(filp, wait_address, p);
+	if (p && p->pq_proc && wait_address)
+		p->pq_proc(filp, wait_address, p);
+}
+
+/*
+ * Return true if it is guaranteed that poll will not wait. This is the case
+ * if the poll() of another file descriptor in the set got an event, so there
+ * is no need for waiting.
+ */
+static inline bool poll_does_not_wait(const poll_table *p)
+{
+	return p == NULL || p->pq_proc == NULL;
+}
+
+/*
+ * Return the set of events that the application wants to poll for.
+ * This is useful for drivers that need to know whether a DMA transfer has
+ * to be started implicitly on poll(). You typically only want to do that
+ * if the application is actually polling for POLLIN and/or POLLOUT.
+ */
+static inline unsigned long poll_requested_events(const poll_table *p)
+{
+	return p ? p->key : ~0UL;
 }
 
-static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
+static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc pq_proc)
 {
-	pt->qproc = qproc;
+	pt->pq_proc = pq_proc;
 	pt->key   = ~0UL; /* all events enabled */
 }
 
-- 
1.7.8.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFCv7 PATCH 3/4] net/sock.h: use poll_does_not_wait() in sock_poll_wait()
  2012-02-02 10:26 ` [RFCv7 PATCH 1/4] eventpoll: set key to the correct events mask Hans Verkuil
  2012-02-02 10:26   ` [RFCv7 PATCH 2/4] poll: add poll_requested_events() and poll_does_not_wait() functions Hans Verkuil
@ 2012-02-02 10:26   ` Hans Verkuil
  2012-02-02 10:26   ` [RFCv7 PATCH 4/4] unix/af_unix.c: use poll_requested_events() in unix_dgram_poll() Hans Verkuil
  2 siblings, 0 replies; 9+ messages in thread
From: Hans Verkuil @ 2012-02-02 10:26 UTC (permalink / raw)
  To: linux-media
  Cc: Al Viro, Jonathan Corbet, Andrew Morton, Davide Libenzi,
	linux-kernel, linux-fsdevel, David S. Miller, Enke Chen,
	Hans Verkuil

From: Hans Verkuil <hans.verkuil@cisco.com>

In order to determine whether poll_wait() might actually wait the
poll_table pointer was tested in sock_poll_wait(). This is no longer
sufficient, instead poll_does_not_wait() should be called. That function
also tests whether pt->pq_proc is non-NULL.

Without this change smp_mb() could be called unnecessarily in some
circumstances, causing a performance hit.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
---
 include/net/sock.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 91c1c8b..da7f2ec 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1824,7 +1824,7 @@ static inline bool wq_has_sleeper(struct socket_wq *wq)
 static inline void sock_poll_wait(struct file *filp,
 		wait_queue_head_t *wait_address, poll_table *p)
 {
-	if (p && wait_address) {
+	if (!poll_does_not_wait(p) && wait_address) {
 		poll_wait(filp, wait_address, p);
 		/*
 		 * We need to be sure we are in sync with the
-- 
1.7.8.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFCv7 PATCH 4/4] unix/af_unix.c: use poll_requested_events() in unix_dgram_poll()
  2012-02-02 10:26 ` [RFCv7 PATCH 1/4] eventpoll: set key to the correct events mask Hans Verkuil
  2012-02-02 10:26   ` [RFCv7 PATCH 2/4] poll: add poll_requested_events() and poll_does_not_wait() functions Hans Verkuil
  2012-02-02 10:26   ` [RFCv7 PATCH 3/4] net/sock.h: use poll_does_not_wait() in sock_poll_wait() Hans Verkuil
@ 2012-02-02 10:26   ` Hans Verkuil
  2 siblings, 0 replies; 9+ messages in thread
From: Hans Verkuil @ 2012-02-02 10:26 UTC (permalink / raw)
  To: linux-media
  Cc: Al Viro, Jonathan Corbet, Andrew Morton, Davide Libenzi,
	linux-kernel, linux-fsdevel, David S. Miller, Enke Chen,
	Hans Verkuil

From: Hans Verkuil <hans.verkuil@cisco.com>

Rather than accessing the poll_table internals use the new
poll_requested_events() function.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
---
 net/unix/af_unix.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 85d3bb7..59f5202 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2175,7 +2175,7 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 	}
 
 	/* No write status requested, avoid expensive OUT tests. */
-	if (wait && !(wait->key & (POLLWRBAND | POLLWRNORM | POLLOUT)))
+	if (!(poll_requested_events(wait) & (POLLWRBAND | POLLWRNORM | POLLOUT)))
 		return mask;
 
 	writable = unix_writable(sk);
-- 
1.7.8.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFCv7 PATCH 0/4] Add poll_requested_events() function.
  2012-02-02 10:26 [RFCv7 PATCH 0/4] Add poll_requested_events() function Hans Verkuil
  2012-02-02 10:26 ` [RFCv7 PATCH 1/4] eventpoll: set key to the correct events mask Hans Verkuil
@ 2012-02-02 22:48 ` Andrew Morton
  2012-02-03  9:30   ` Hans Verkuil
  2012-02-03  1:58 ` Enke Chen
  2 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2012-02-02 22:48 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: linux-media, Al Viro, Jonathan Corbet, Davide Libenzi,
	linux-kernel, linux-fsdevel, David S. Miller, Enke Chen

On Thu,  2 Feb 2012 11:26:53 +0100
Hans Verkuil <hverkuil@xs4all.nl> wrote:

> The first version of this patch was posted July 1st, 2011. I really hope that
> it won't take another six months to get a review from a fs developer. As this
> LWN article (http://lwn.net/Articles/450658/) said: 'There has been little
> discussion of the patch; it doesn't seem like there is any real reason for it
> not to go in for 3.1.'
> 
> The earliest this can go in now is 3.4. The only reason it takes so long is
> that it has been almost impossible to get a Ack or comments or even just a
> simple reply from the fs developers. That is really frustrating, I'm sorry
> to say.

Yup.  Nobody really maintains the poll/select code.  It happens to sit
under fs/ so nominally belongs to the "fs maintainers".  The logs for
fs/select.c seem to show me as the usual committer, but I wouldn't
claim particular expertise in this area - I'm more a tube-unclogger
here.  Probably Al knows the code as well or better than anyone else. 
It's good that he looked at an earlier version of the patches.

fs/eventpoll.c has an identified maintainer, but he has been vigorously
hiding from us for a year or so.  I'm the commit monkey for eventpoll,
in a similar state to fs/select.c.

So ho hum, all we can do is our best.  You're an experienced kernel
developer who has put a lot of work into the code.  I suggest that you
get your preferred version into linux-next ASAP then send Linus a pull
request for 3.4-rc1, explaining the situation.  If the code wasn't
already in linux-next I would put it in -mm today, for 3.4-rc1.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFCv7 PATCH 2/4] poll: add poll_requested_events() and poll_does_not_wait() functions
  2012-02-02 10:26   ` [RFCv7 PATCH 2/4] poll: add poll_requested_events() and poll_does_not_wait() functions Hans Verkuil
@ 2012-02-02 22:48     ` Andrew Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2012-02-02 22:48 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: linux-media, Al Viro, Jonathan Corbet, Davide Libenzi,
	linux-kernel, linux-fsdevel, David S. Miller, Enke Chen,
	Hans Verkuil

On Thu,  2 Feb 2012 11:26:55 +0100
Hans Verkuil <hverkuil@xs4all.nl> wrote:

> From: Hans Verkuil <hans.verkuil@cisco.com>
> 
> In some cases the poll() implementation in a driver has to do different
> things depending on the events the caller wants to poll for. An example is
> when a driver needs to start a DMA engine if the caller polls for POLLIN,
> but doesn't want to do that if POLLIN is not requested but instead only
> POLLOUT or POLLPRI is requested. This is something that can happen in the
> video4linux subsystem.
> 
> Unfortunately, the current epoll/poll/select implementation doesn't provide
> that information reliably. The poll_table_struct does have it: it has a key
> field with the event mask. But once a poll() call matches one or more bits
> of that mask any following poll() calls are passed a NULL poll_table_struct
> pointer.
> 
> The solution is to set the qproc field to NULL in poll_table_struct once
> poll() matches the events, not the poll_table_struct pointer itself. That
> way drivers can obtain the mask through a new poll_requested_events inline.
> 
> The poll_table_struct can still be NULL since some kernel code calls it
> internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h). In
> that case poll_requested_events() returns ~0 (i.e. all events).
> 
> Very rarely drivers might want to know whether poll_wait will actually wait.
> If another earlier file descriptor in the set already matched the events the
> caller wanted to wait for, then the kernel will return from the select() call
> without waiting.
> 
> A new helper function poll_does_not_wait() is added that drivers can use to
> detect this situation.
> 
> Drivers should no longer access any of the poll_table internals, but use the
> poll_requested_events() and poll_does_not_wait() access functions instead.

A way to communicate and enforce this is to rename the relevant fields.  Prepend
a "_" to them and add a stern comment.


> Since the behavior of the qproc field changes with this patch (since this
> function pointer can now be NULL when that wasn't possible in the past) I
> have renamed that field from qproc to pq_proc. Any out-of-tree driver that
> uses it will now fail to compile.
> 
> Some notes regarding the correctness of this patch: the driver's poll()
> function is called with a 'struct poll_table_struct *wait' argument. This
> pointer may or may not be NULL, drivers can never rely on it being one or
> the other as that depends on whether or not an earlier file descriptor in
> the select()'s fdset matched the requested events.
> 
> ...
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFCv7 PATCH 0/4] Add poll_requested_events() function.
  2012-02-02 10:26 [RFCv7 PATCH 0/4] Add poll_requested_events() function Hans Verkuil
  2012-02-02 10:26 ` [RFCv7 PATCH 1/4] eventpoll: set key to the correct events mask Hans Verkuil
  2012-02-02 22:48 ` [RFCv7 PATCH 0/4] Add poll_requested_events() function Andrew Morton
@ 2012-02-03  1:58 ` Enke Chen
  2 siblings, 0 replies; 9+ messages in thread
From: Enke Chen @ 2012-02-03  1:58 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: linux-media, Al Viro, Jonathan Corbet, Andrew Morton,
	Davide Libenzi, linux-kernel, linux-fsdevel, David S. Miller,
	Enke Chen

Hi, folks:

I would like to voice my support for Hans' patch.

1) The functionality provided by this patch is needed.  I have been 
involved in an app that implements a use-land sockets (using FUSE). 
Passing the accurate poll events is essential in the app.  We have been 
using a local patch that is similar (but not identical) for more than a 
year.  IMO the functionality of passing accurate and consistent events 
to a driver is basic, and should be provided by Linux.

2) The patch is safe as far as I can tell.  Without the patch, unwanted 
events may be passed to a driver. However once the poll returns to the 
kernel, the unwanted events would be masked out by the kernel anyway and 
would not be passed to an app.  Thus a driver that relies on the 
unwanted events would not work anyway.

Thanks.   -- Enke

On 2/2/12 2:26 AM, Hans Verkuil wrote:
> Hi all,
>
> This is the seventh version of this patch series (the fifth and sixth where
> never posted and where internal iterations only).
>
> Al Viro had concerns about silent API changes. I have made an extensive
> analysis of that in my comments in patch 2/4.
>
> This patch series is rebased to v3.3-rc2. The changes compared to the
> previously posted version are:
>
> - I have renamed the qproc field to pq_proc to prevent any driver that tries
>    to access that directly to fail. No kernel driver does this, BTW.
>
> - I added a new poll_does_not_wait() inline that returns true if it is known
>    that poll() will not wait on return. This removes the last reason for
>    looking inside the poll_table struct. include/net/sock.h has been adapted
>    to use this new inline (and it is the only place inside the kernel that
>    need this).
>
> I hope that the analysis I made answers any remaining concerns about possible
> silent API changes.
>
> This patch series is also available here:
>
> http://git.linuxtv.org/hverkuil/media_tree.git/shortlog/refs/heads/pollv7
>
> It was suggested to me that creating a new poll system call might be an option
> as well. I've attempted that as well and code implementing that can be found
> here:
>
> http://git.linuxtv.org/hverkuil/media_tree.git/shortlog/refs/heads/pollwithkey
>
> However, I think this turned out to be very messy. And because some drivers
> call the poll fop directly or through some framework I could not be certain I
> was not introducing any errors.
>
> If it is really required to change the API in some way, then I would suggest
> changing this:
>
> typedef struct poll_table_struct {
>          poll_queue_proc pq_proc;
>          unsigned long key;
> } poll_table;
>
> to this:
>
> struct poll_table {
>          poll_queue_proc pq_proc;
>          unsigned long key;
> };
>
> and adapting all users.
>
> However, I honestly do not think this is necessary at all. But if it is the
> only way to get this in, then I'll do the work. The media/video subsystem really
> needs this functionality. Also note that previous versions of this patch have
> been in linux-next for months now.
>
> The first version of this patch was posted July 1st, 2011. I really hope that
> it won't take another six months to get a review from a fs developer. As this
> LWN article (http://lwn.net/Articles/450658/) said: 'There has been little
> discussion of the patch; it doesn't seem like there is any real reason for it
> not to go in for 3.1.'
>
> The earliest this can go in now is 3.4. The only reason it takes so long is
> that it has been almost impossible to get a Ack or comments or even just a
> simple reply from the fs developers. That is really frustrating, I'm sorry
> to say.
>
> Anyway, comments, reviews, etc. are very welcome.
>
> Regards,
>
> 	Hans
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFCv7 PATCH 0/4] Add poll_requested_events() function.
  2012-02-02 22:48 ` [RFCv7 PATCH 0/4] Add poll_requested_events() function Andrew Morton
@ 2012-02-03  9:30   ` Hans Verkuil
  0 siblings, 0 replies; 9+ messages in thread
From: Hans Verkuil @ 2012-02-03  9:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-media, Al Viro, Jonathan Corbet, Davide Libenzi,
	linux-kernel, linux-fsdevel, David S. Miller, Enke Chen

On Thursday, February 02, 2012 23:48:23 Andrew Morton wrote:
> On Thu,  2 Feb 2012 11:26:53 +0100
> Hans Verkuil <hverkuil@xs4all.nl> wrote:
> 
> > The first version of this patch was posted July 1st, 2011. I really hope that
> > it won't take another six months to get a review from a fs developer. As this
> > LWN article (http://lwn.net/Articles/450658/) said: 'There has been little
> > discussion of the patch; it doesn't seem like there is any real reason for it
> > not to go in for 3.1.'
> > 
> > The earliest this can go in now is 3.4. The only reason it takes so long is
> > that it has been almost impossible to get a Ack or comments or even just a
> > simple reply from the fs developers. That is really frustrating, I'm sorry
> > to say.
> 
> Yup.  Nobody really maintains the poll/select code.  It happens to sit
> under fs/ so nominally belongs to the "fs maintainers".  The logs for
> fs/select.c seem to show me as the usual committer, but I wouldn't
> claim particular expertise in this area - I'm more a tube-unclogger
> here.  Probably Al knows the code as well or better than anyone else. 
> It's good that he looked at an earlier version of the patches.
> 
> fs/eventpoll.c has an identified maintainer, but he has been vigorously
> hiding from us for a year or so.  I'm the commit monkey for eventpoll,
> in a similar state to fs/select.c.
> 
> So ho hum, all we can do is our best.  You're an experienced kernel
> developer who has put a lot of work into the code.  I suggest that you
> get your preferred version into linux-next ASAP then send Linus a pull
> request for 3.4-rc1, explaining the situation.  If the code wasn't
> already in linux-next I would put it in -mm today, for 3.4-rc1.

Thank you very much for your reply. This was very helpful!

Regards,

	Hans

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-02-03  9:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-02 10:26 [RFCv7 PATCH 0/4] Add poll_requested_events() function Hans Verkuil
2012-02-02 10:26 ` [RFCv7 PATCH 1/4] eventpoll: set key to the correct events mask Hans Verkuil
2012-02-02 10:26   ` [RFCv7 PATCH 2/4] poll: add poll_requested_events() and poll_does_not_wait() functions Hans Verkuil
2012-02-02 22:48     ` Andrew Morton
2012-02-02 10:26   ` [RFCv7 PATCH 3/4] net/sock.h: use poll_does_not_wait() in sock_poll_wait() Hans Verkuil
2012-02-02 10:26   ` [RFCv7 PATCH 4/4] unix/af_unix.c: use poll_requested_events() in unix_dgram_poll() Hans Verkuil
2012-02-02 22:48 ` [RFCv7 PATCH 0/4] Add poll_requested_events() function Andrew Morton
2012-02-03  9:30   ` Hans Verkuil
2012-02-03  1:58 ` Enke Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).