All of lore.kernel.org
 help / color / mirror / Atom feed
* aio poll, io_pgetevents and a new in-kernel poll API V5
@ 2018-03-05 21:27 ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Hi all,

this series adds support for the IOCB_CMD_POLL operation to poll for the
readyness of file descriptors using the aio subsystem.  The API is based
on patches that existed in RHAS2.1 and RHEL3, which means it already is
supported by libaio.  To implement the poll support efficiently new
methods to poll are introduced in struct file_operations:  get_poll_head
and poll_mask.  The first one returns a wait_queue_head to wait on
(lifetime is bound by the file), and the second does a non-blocking
check for the POLL* events.  This allows aio poll to work without
any additional context switches, unlike epoll.

To make the interface fully useful a new io_pgetevents system call is
added, which atomically saves and restores the signal mask over the
io_pgetevents system call.  It it the logical equivalent to pselect and
ppoll for io_pgetevents.

The corresponding libaio changes for io_pgetevents support and
documentation, as well as a test case will be posted in a separate
series.

The changes were sponsored by Scylladb, and improve performance
of the seastar framework up to 10%, while also removing the need
for a privileged SCHED_FIFO epoll listener thread.

    git://git.infradead.org/users/hch/vfs.git aio-poll.5

Gitweb:

    http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.5

Libaio changes:

    https://pagure.io/libaio.git io-poll

Seastar changes (not updated for the new io_pgetevens ABI yet):

    https://github.com/avikivity/seastar/commits/aio

Changes since V4:
 - rebased ontop of Linux 4.16-rc4

Changes since V3:
 - remove the pre-sleep ->poll_mask call in vfs_poll,
   allow ->get_poll_head to return POLL* values.

Changes since V2:
 - removed a double initialization
 - new vfs_get_poll_head helper
 - document that ->get_poll_head can return NULL
 - call ->poll_mask before sleeping
 - various ACKs
 - add conversion of random to ->poll_mask
 - add conversion of af_alg to ->poll_mask
 - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
 - reshuffled the series so that prep patches and everything not
   requiring the new in-kernel poll API is in the beginning

Changes since V1:
 - handle the NULL ->poll case in vfs_poll
 - dropped the file argument to the ->poll_mask socket operation
 - replace the ->pre_poll socket operation with ->get_poll_head as
   in the file operations

^ permalink raw reply	[flat|nested] 120+ messages in thread

* aio poll, io_pgetevents and a new in-kernel poll API V5
@ 2018-03-05 21:27 ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Hi all,

this series adds support for the IOCB_CMD_POLL operation to poll for the
readyness of file descriptors using the aio subsystem.  The API is based
on patches that existed in RHAS2.1 and RHEL3, which means it already is
supported by libaio.  To implement the poll support efficiently new
methods to poll are introduced in struct file_operations:  get_poll_head
and poll_mask.  The first one returns a wait_queue_head to wait on
(lifetime is bound by the file), and the second does a non-blocking
check for the POLL* events.  This allows aio poll to work without
any additional context switches, unlike epoll.

To make the interface fully useful a new io_pgetevents system call is
added, which atomically saves and restores the signal mask over the
io_pgetevents system call.  It it the logical equivalent to pselect and
ppoll for io_pgetevents.

The corresponding libaio changes for io_pgetevents support and
documentation, as well as a test case will be posted in a separate
series.

The changes were sponsored by Scylladb, and improve performance
of the seastar framework up to 10%, while also removing the need
for a privileged SCHED_FIFO epoll listener thread.

    git://git.infradead.org/users/hch/vfs.git aio-poll.5

Gitweb:

    http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.5

Libaio changes:

    https://pagure.io/libaio.git io-poll

Seastar changes (not updated for the new io_pgetevens ABI yet):

    https://github.com/avikivity/seastar/commits/aio

Changes since V4:
 - rebased ontop of Linux 4.16-rc4

Changes since V3:
 - remove the pre-sleep ->poll_mask call in vfs_poll,
   allow ->get_poll_head to return POLL* values.

Changes since V2:
 - removed a double initialization
 - new vfs_get_poll_head helper
 - document that ->get_poll_head can return NULL
 - call ->poll_mask before sleeping
 - various ACKs
 - add conversion of random to ->poll_mask
 - add conversion of af_alg to ->poll_mask
 - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
 - reshuffled the series so that prep patches and everything not
   requiring the new in-kernel poll API is in the beginning

Changes since V1:
 - handle the NULL ->poll case in vfs_poll
 - dropped the file argument to the ->poll_mask socket operation
 - replace the ->pre_poll socket operation with ->get_poll_head as
   in the file operations

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 01/36] aio: don't print the page size at boot time
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The page size is in no way related to the aio code, and printing it in
the (debug) dmesg at every boot serves no purpose.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index a062d75109cb..03d59593912d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -264,9 +264,6 @@ static int __init aio_setup(void)
 
 	kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
 	kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
-
-	pr_debug("sizeof(struct page) = %zu\n", sizeof(struct page));
-
 	return 0;
 }
 __initcall(aio_setup);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 01/36] aio: don't print the page size at boot time
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The page size is in no way related to the aio code, and printing it in
the (debug) dmesg at every boot serves no purpose.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index a062d75109cb..03d59593912d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -264,9 +264,6 @@ static int __init aio_setup(void)
 
 	kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
 	kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
-
-	pr_debug("sizeof(struct page) = %zu\n", sizeof(struct page));
-
 	return 0;
 }
 __initcall(aio_setup);
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 02/36] aio: remove an outdated comment in aio_complete
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

These days we don't treat sync iocbs special in the aio completion code as
they never use it.  Remove the old comment, and move the BUG_ON for a sync
iocb to the top of the function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 03d59593912d..41fc8ce6bc7f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1088,6 +1088,8 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
+	BUG_ON(is_sync_kiocb(kiocb));
+
 	if (kiocb->ki_flags & IOCB_WRITE) {
 		struct file *file = kiocb->ki_filp;
 
@@ -1100,15 +1102,6 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 		file_end_write(file);
 	}
 
-	/*
-	 * Special case handling for sync iocbs:
-	 *  - events go directly into the iocb for fast handling
-	 *  - the sync task with the iocb in its stack holds the single iocb
-	 *    ref, no other paths have a way to get another ref
-	 *  - the sync task helpfully left a reference to itself in the iocb
-	 */
-	BUG_ON(is_sync_kiocb(kiocb));
-
 	if (iocb->ki_list.next) {
 		unsigned long flags;
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 02/36] aio: remove an outdated comment in aio_complete
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

These days we don't treat sync iocbs special in the aio completion code as
they never use it.  Remove the old comment, and move the BUG_ON for a sync
iocb to the top of the function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 03d59593912d..41fc8ce6bc7f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1088,6 +1088,8 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
+	BUG_ON(is_sync_kiocb(kiocb));
+
 	if (kiocb->ki_flags & IOCB_WRITE) {
 		struct file *file = kiocb->ki_filp;
 
@@ -1100,15 +1102,6 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 		file_end_write(file);
 	}
 
-	/*
-	 * Special case handling for sync iocbs:
-	 *  - events go directly into the iocb for fast handling
-	 *  - the sync task with the iocb in its stack holds the single iocb
-	 *    ref, no other paths have a way to get another ref
-	 *  - the sync task helpfully left a reference to itself in the iocb
-	 */
-	BUG_ON(is_sync_kiocb(kiocb));
-
 	if (iocb->ki_list.next) {
 		unsigned long flags;
 
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 03/36] aio: refactor read/write iocb setup
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Don't reference the kiocb structure from the common aio code, and move
any use of it into helper specific to the read/write path.  This is in
preparation for aio_poll support that wants to use the space for different
fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 171 ++++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 97 insertions(+), 74 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 41fc8ce6bc7f..6295fc00f104 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -170,7 +170,9 @@ struct kioctx {
 #define KIOCB_CANCELLED		((void *) (~0ULL))
 
 struct aio_kiocb {
-	struct kiocb		common;
+	union {
+		struct kiocb		rw;
+	};
 
 	struct kioctx		*ki_ctx;
 	kiocb_cancel_fn		*ki_cancel;
@@ -549,7 +551,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 
 void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 {
-	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, common);
+	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
 	struct kioctx *ctx = req->ki_ctx;
 	unsigned long flags;
 
@@ -582,7 +584,7 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
 		cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
 	} while (cancel != old);
 
-	return cancel(&kiocb->common);
+	return cancel(&kiocb->rw);
 }
 
 static void free_ioctx(struct work_struct *work)
@@ -1040,15 +1042,6 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 	return NULL;
 }
 
-static void kiocb_free(struct aio_kiocb *req)
-{
-	if (req->common.ki_filp)
-		fput(req->common.ki_filp);
-	if (req->ki_eventfd != NULL)
-		eventfd_ctx_put(req->ki_eventfd);
-	kmem_cache_free(kiocb_cachep, req);
-}
-
 static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 {
 	struct aio_ring __user *ring  = (void __user *)ctx_id;
@@ -1079,29 +1072,14 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 /* aio_complete
  *	Called when the io request on the given iocb is complete.
  */
-static void aio_complete(struct kiocb *kiocb, long res, long res2)
+static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 {
-	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, common);
 	struct kioctx	*ctx = iocb->ki_ctx;
 	struct aio_ring	*ring;
 	struct io_event	*ev_page, *event;
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
-	BUG_ON(is_sync_kiocb(kiocb));
-
-	if (kiocb->ki_flags & IOCB_WRITE) {
-		struct file *file = kiocb->ki_filp;
-
-		/*
-		 * Tell lockdep we inherited freeze protection from submission
-		 * thread.
-		 */
-		if (S_ISREG(file_inode(file)->i_mode))
-			__sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE);
-		file_end_write(file);
-	}
-
 	if (iocb->ki_list.next) {
 		unsigned long flags;
 
@@ -1163,11 +1141,12 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 	 * eventfd. The eventfd_signal() function is safe to be called
 	 * from IRQ context.
 	 */
-	if (iocb->ki_eventfd != NULL)
+	if (iocb->ki_eventfd) {
 		eventfd_signal(iocb->ki_eventfd, 1);
+		eventfd_ctx_put(iocb->ki_eventfd);
+	}
 
-	/* everything turned out well, dispose of the aiocb. */
-	kiocb_free(iocb);
+	kmem_cache_free(kiocb_cachep, iocb);
 
 	/*
 	 * We have to order our ring_info tail store above and test
@@ -1430,6 +1409,47 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
 	return -EINVAL;
 }
 
+static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
+{
+	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
+
+	WARN_ON_ONCE(is_sync_kiocb(kiocb));
+
+	if (kiocb->ki_flags & IOCB_WRITE) {
+		struct inode *inode = file_inode(kiocb->ki_filp);
+
+		/*
+		 * Tell lockdep we inherited freeze protection from submission
+		 * thread.
+		 */
+		if (S_ISREG(inode->i_mode))
+			__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+		file_end_write(kiocb->ki_filp);
+	}
+
+	fput(kiocb->ki_filp);
+	aio_complete(iocb, res, res2);
+}
+
+static int aio_prep_rw(struct kiocb *req, struct iocb *iocb)
+{
+	int ret;
+
+	req->ki_filp = fget(iocb->aio_fildes);
+	if (unlikely(!req->ki_filp))
+		return -EBADF;
+	req->ki_complete = aio_complete_rw;
+	req->ki_pos = iocb->aio_offset;
+	req->ki_flags = iocb_flags(req->ki_filp);
+	if (iocb->aio_flags & IOCB_FLAG_RESFD)
+		req->ki_flags |= IOCB_EVENTFD;
+	req->ki_hint = file_write_hint(req->ki_filp);
+	ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
+	if (unlikely(ret))
+		fput(req->ki_filp);
+	return ret;
+}
+
 static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
 		bool vectored, bool compat, struct iov_iter *iter)
 {
@@ -1449,7 +1469,7 @@ static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
 	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
-static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
+static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
 {
 	switch (ret) {
 	case -EIOCBQUEUED:
@@ -1465,7 +1485,7 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
 		ret = -EINTR;
 		/*FALLTHRU*/
 	default:
-		aio_complete(req, ret, 0);
+		aio_complete_rw(req, ret, 0);
 		return 0;
 	}
 }
@@ -1473,56 +1493,78 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
 static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
 		bool compat)
 {
-	struct file *file = req->ki_filp;
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
+	struct file *file;
 	ssize_t ret;
 
+	ret = aio_prep_rw(req, iocb);
+	if (ret)
+		return ret;
+	file = req->ki_filp;
+
+	ret = -EBADF;
 	if (unlikely(!(file->f_mode & FMODE_READ)))
-		return -EBADF;
+		goto out_fput;
+	ret = -EINVAL;
 	if (unlikely(!file->f_op->read_iter))
-		return -EINVAL;
+		goto out_fput;
 
 	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
 	if (ret)
-		return ret;
+		goto out_fput;
 	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret)
-		ret = aio_ret(req, call_read_iter(file, req, &iter));
+		ret = aio_rw_ret(req, call_read_iter(file, req, &iter));
 	kfree(iovec);
+out_fput:
+	if (unlikely(ret && ret != -EIOCBQUEUED))
+		fput(file);
 	return ret;
 }
 
 static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
 		bool compat)
 {
-	struct file *file = req->ki_filp;
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
+	struct file *file;
 	ssize_t ret;
 
+	ret = aio_prep_rw(req, iocb);
+	if (ret)
+		return ret;
+	file = req->ki_filp;
+
+	ret = -EBADF;
 	if (unlikely(!(file->f_mode & FMODE_WRITE)))
-		return -EBADF;
+		goto out_fput;
+	ret = -EINVAL;
 	if (unlikely(!file->f_op->write_iter))
-		return -EINVAL;
+		goto out_fput;
 
 	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
 	if (ret)
-		return ret;
+		goto out_fput;
 	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret) {
+		struct inode *inode = file_inode(file);
+
 		req->ki_flags |= IOCB_WRITE;
 		file_start_write(file);
-		ret = aio_ret(req, call_write_iter(file, req, &iter));
+		ret = aio_rw_ret(req, call_write_iter(file, req, &iter));
 		/*
-		 * We release freeze protection in aio_complete().  Fool lockdep
-		 * by telling it the lock got released so that it doesn't
-		 * complain about held lock when we return to userspace.
+		 * We release freeze protection in aio_complete_rw().  Fool
+		 * lockdep by telling it the lock got released so that it
+		 * doesn't complain about held lock when we return to userspace.
 		 */
-		if (S_ISREG(file_inode(file)->i_mode))
-			__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
+		if (S_ISREG(inode->i_mode))
+			__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
 	}
 	kfree(iovec);
+out_fput:
+	if (unlikely(ret && ret != -EIOCBQUEUED))
+		fput(file);
 	return ret;
 }
 
@@ -1530,7 +1572,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 struct iocb *iocb, bool compat)
 {
 	struct aio_kiocb *req;
-	struct file *file;
 	ssize_t ret;
 
 	/* enforce forwards compatibility on users */
@@ -1553,16 +1594,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	if (unlikely(!req))
 		return -EAGAIN;
 
-	req->common.ki_filp = file = fget(iocb->aio_fildes);
-	if (unlikely(!req->common.ki_filp)) {
-		ret = -EBADF;
-		goto out_put_req;
-	}
-	req->common.ki_pos = iocb->aio_offset;
-	req->common.ki_complete = aio_complete;
-	req->common.ki_flags = iocb_flags(req->common.ki_filp);
-	req->common.ki_hint = file_write_hint(file);
-
 	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
 		/*
 		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
@@ -1576,14 +1607,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			req->ki_eventfd = NULL;
 			goto out_put_req;
 		}
-
-		req->common.ki_flags |= IOCB_EVENTFD;
-	}
-
-	ret = kiocb_set_rw_flags(&req->common, iocb->aio_rw_flags);
-	if (unlikely(ret)) {
-		pr_debug("EINVAL: aio_rw_flags\n");
-		goto out_put_req;
 	}
 
 	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
@@ -1595,26 +1618,24 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	req->ki_user_iocb = user_iocb;
 	req->ki_user_data = iocb->aio_data;
 
-	get_file(file);
 	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
-		ret = aio_read(&req->common, iocb, false, compat);
+		ret = aio_read(&req->rw, iocb, false, compat);
 		break;
 	case IOCB_CMD_PWRITE:
-		ret = aio_write(&req->common, iocb, false, compat);
+		ret = aio_write(&req->rw, iocb, false, compat);
 		break;
 	case IOCB_CMD_PREADV:
-		ret = aio_read(&req->common, iocb, true, compat);
+		ret = aio_read(&req->rw, iocb, true, compat);
 		break;
 	case IOCB_CMD_PWRITEV:
-		ret = aio_write(&req->common, iocb, true, compat);
+		ret = aio_write(&req->rw, iocb, true, compat);
 		break;
 	default:
 		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
 		ret = -EINVAL;
 		break;
 	}
-	fput(file);
 
 	if (ret && ret != -EIOCBQUEUED)
 		goto out_put_req;
@@ -1622,7 +1643,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 out_put_req:
 	put_reqs_available(ctx, 1);
 	percpu_ref_put(&ctx->reqs);
-	kiocb_free(req);
+	if (req->ki_eventfd)
+		eventfd_ctx_put(req->ki_eventfd);
+	kmem_cache_free(kiocb_cachep, req);
 	return ret;
 }
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 03/36] aio: refactor read/write iocb setup
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Don't reference the kiocb structure from the common aio code, and move
any use of it into helper specific to the read/write path.  This is in
preparation for aio_poll support that wants to use the space for different
fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 171 ++++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 97 insertions(+), 74 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 41fc8ce6bc7f..6295fc00f104 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -170,7 +170,9 @@ struct kioctx {
 #define KIOCB_CANCELLED		((void *) (~0ULL))
 
 struct aio_kiocb {
-	struct kiocb		common;
+	union {
+		struct kiocb		rw;
+	};
 
 	struct kioctx		*ki_ctx;
 	kiocb_cancel_fn		*ki_cancel;
@@ -549,7 +551,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 
 void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 {
-	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, common);
+	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
 	struct kioctx *ctx = req->ki_ctx;
 	unsigned long flags;
 
@@ -582,7 +584,7 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
 		cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
 	} while (cancel != old);
 
-	return cancel(&kiocb->common);
+	return cancel(&kiocb->rw);
 }
 
 static void free_ioctx(struct work_struct *work)
@@ -1040,15 +1042,6 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 	return NULL;
 }
 
-static void kiocb_free(struct aio_kiocb *req)
-{
-	if (req->common.ki_filp)
-		fput(req->common.ki_filp);
-	if (req->ki_eventfd != NULL)
-		eventfd_ctx_put(req->ki_eventfd);
-	kmem_cache_free(kiocb_cachep, req);
-}
-
 static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 {
 	struct aio_ring __user *ring  = (void __user *)ctx_id;
@@ -1079,29 +1072,14 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 /* aio_complete
  *	Called when the io request on the given iocb is complete.
  */
-static void aio_complete(struct kiocb *kiocb, long res, long res2)
+static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 {
-	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, common);
 	struct kioctx	*ctx = iocb->ki_ctx;
 	struct aio_ring	*ring;
 	struct io_event	*ev_page, *event;
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
-	BUG_ON(is_sync_kiocb(kiocb));
-
-	if (kiocb->ki_flags & IOCB_WRITE) {
-		struct file *file = kiocb->ki_filp;
-
-		/*
-		 * Tell lockdep we inherited freeze protection from submission
-		 * thread.
-		 */
-		if (S_ISREG(file_inode(file)->i_mode))
-			__sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE);
-		file_end_write(file);
-	}
-
 	if (iocb->ki_list.next) {
 		unsigned long flags;
 
@@ -1163,11 +1141,12 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 	 * eventfd. The eventfd_signal() function is safe to be called
 	 * from IRQ context.
 	 */
-	if (iocb->ki_eventfd != NULL)
+	if (iocb->ki_eventfd) {
 		eventfd_signal(iocb->ki_eventfd, 1);
+		eventfd_ctx_put(iocb->ki_eventfd);
+	}
 
-	/* everything turned out well, dispose of the aiocb. */
-	kiocb_free(iocb);
+	kmem_cache_free(kiocb_cachep, iocb);
 
 	/*
 	 * We have to order our ring_info tail store above and test
@@ -1430,6 +1409,47 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
 	return -EINVAL;
 }
 
+static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
+{
+	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
+
+	WARN_ON_ONCE(is_sync_kiocb(kiocb));
+
+	if (kiocb->ki_flags & IOCB_WRITE) {
+		struct inode *inode = file_inode(kiocb->ki_filp);
+
+		/*
+		 * Tell lockdep we inherited freeze protection from submission
+		 * thread.
+		 */
+		if (S_ISREG(inode->i_mode))
+			__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+		file_end_write(kiocb->ki_filp);
+	}
+
+	fput(kiocb->ki_filp);
+	aio_complete(iocb, res, res2);
+}
+
+static int aio_prep_rw(struct kiocb *req, struct iocb *iocb)
+{
+	int ret;
+
+	req->ki_filp = fget(iocb->aio_fildes);
+	if (unlikely(!req->ki_filp))
+		return -EBADF;
+	req->ki_complete = aio_complete_rw;
+	req->ki_pos = iocb->aio_offset;
+	req->ki_flags = iocb_flags(req->ki_filp);
+	if (iocb->aio_flags & IOCB_FLAG_RESFD)
+		req->ki_flags |= IOCB_EVENTFD;
+	req->ki_hint = file_write_hint(req->ki_filp);
+	ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
+	if (unlikely(ret))
+		fput(req->ki_filp);
+	return ret;
+}
+
 static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
 		bool vectored, bool compat, struct iov_iter *iter)
 {
@@ -1449,7 +1469,7 @@ static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
 	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
-static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
+static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
 {
 	switch (ret) {
 	case -EIOCBQUEUED:
@@ -1465,7 +1485,7 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
 		ret = -EINTR;
 		/*FALLTHRU*/
 	default:
-		aio_complete(req, ret, 0);
+		aio_complete_rw(req, ret, 0);
 		return 0;
 	}
 }
@@ -1473,56 +1493,78 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
 static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
 		bool compat)
 {
-	struct file *file = req->ki_filp;
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
+	struct file *file;
 	ssize_t ret;
 
+	ret = aio_prep_rw(req, iocb);
+	if (ret)
+		return ret;
+	file = req->ki_filp;
+
+	ret = -EBADF;
 	if (unlikely(!(file->f_mode & FMODE_READ)))
-		return -EBADF;
+		goto out_fput;
+	ret = -EINVAL;
 	if (unlikely(!file->f_op->read_iter))
-		return -EINVAL;
+		goto out_fput;
 
 	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
 	if (ret)
-		return ret;
+		goto out_fput;
 	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret)
-		ret = aio_ret(req, call_read_iter(file, req, &iter));
+		ret = aio_rw_ret(req, call_read_iter(file, req, &iter));
 	kfree(iovec);
+out_fput:
+	if (unlikely(ret && ret != -EIOCBQUEUED))
+		fput(file);
 	return ret;
 }
 
 static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
 		bool compat)
 {
-	struct file *file = req->ki_filp;
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
+	struct file *file;
 	ssize_t ret;
 
+	ret = aio_prep_rw(req, iocb);
+	if (ret)
+		return ret;
+	file = req->ki_filp;
+
+	ret = -EBADF;
 	if (unlikely(!(file->f_mode & FMODE_WRITE)))
-		return -EBADF;
+		goto out_fput;
+	ret = -EINVAL;
 	if (unlikely(!file->f_op->write_iter))
-		return -EINVAL;
+		goto out_fput;
 
 	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
 	if (ret)
-		return ret;
+		goto out_fput;
 	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret) {
+		struct inode *inode = file_inode(file);
+
 		req->ki_flags |= IOCB_WRITE;
 		file_start_write(file);
-		ret = aio_ret(req, call_write_iter(file, req, &iter));
+		ret = aio_rw_ret(req, call_write_iter(file, req, &iter));
 		/*
-		 * We release freeze protection in aio_complete().  Fool lockdep
-		 * by telling it the lock got released so that it doesn't
-		 * complain about held lock when we return to userspace.
+		 * We release freeze protection in aio_complete_rw().  Fool
+		 * lockdep by telling it the lock got released so that it
+		 * doesn't complain about held lock when we return to userspace.
 		 */
-		if (S_ISREG(file_inode(file)->i_mode))
-			__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
+		if (S_ISREG(inode->i_mode))
+			__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
 	}
 	kfree(iovec);
+out_fput:
+	if (unlikely(ret && ret != -EIOCBQUEUED))
+		fput(file);
 	return ret;
 }
 
@@ -1530,7 +1572,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 struct iocb *iocb, bool compat)
 {
 	struct aio_kiocb *req;
-	struct file *file;
 	ssize_t ret;
 
 	/* enforce forwards compatibility on users */
@@ -1553,16 +1594,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	if (unlikely(!req))
 		return -EAGAIN;
 
-	req->common.ki_filp = file = fget(iocb->aio_fildes);
-	if (unlikely(!req->common.ki_filp)) {
-		ret = -EBADF;
-		goto out_put_req;
-	}
-	req->common.ki_pos = iocb->aio_offset;
-	req->common.ki_complete = aio_complete;
-	req->common.ki_flags = iocb_flags(req->common.ki_filp);
-	req->common.ki_hint = file_write_hint(file);
-
 	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
 		/*
 		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
@@ -1576,14 +1607,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			req->ki_eventfd = NULL;
 			goto out_put_req;
 		}
-
-		req->common.ki_flags |= IOCB_EVENTFD;
-	}
-
-	ret = kiocb_set_rw_flags(&req->common, iocb->aio_rw_flags);
-	if (unlikely(ret)) {
-		pr_debug("EINVAL: aio_rw_flags\n");
-		goto out_put_req;
 	}
 
 	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
@@ -1595,26 +1618,24 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	req->ki_user_iocb = user_iocb;
 	req->ki_user_data = iocb->aio_data;
 
-	get_file(file);
 	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
-		ret = aio_read(&req->common, iocb, false, compat);
+		ret = aio_read(&req->rw, iocb, false, compat);
 		break;
 	case IOCB_CMD_PWRITE:
-		ret = aio_write(&req->common, iocb, false, compat);
+		ret = aio_write(&req->rw, iocb, false, compat);
 		break;
 	case IOCB_CMD_PREADV:
-		ret = aio_read(&req->common, iocb, true, compat);
+		ret = aio_read(&req->rw, iocb, true, compat);
 		break;
 	case IOCB_CMD_PWRITEV:
-		ret = aio_write(&req->common, iocb, true, compat);
+		ret = aio_write(&req->rw, iocb, true, compat);
 		break;
 	default:
 		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
 		ret = -EINVAL;
 		break;
 	}
-	fput(file);
 
 	if (ret && ret != -EIOCBQUEUED)
 		goto out_put_req;
@@ -1622,7 +1643,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 out_put_req:
 	put_reqs_available(ctx, 1);
 	percpu_ref_put(&ctx->reqs);
-	kiocb_free(req);
+	if (req->ki_eventfd)
+		eventfd_ctx_put(req->ki_eventfd);
+	kmem_cache_free(kiocb_cachep, req);
 	return ret;
 }
 
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 04/36] aio: sanitize ki_list handling
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Instead of handcoded non-null checks always initialize ki_list to an
empty list and use list_empty / list_empty_careful on it.  While we're
at it also error out on a double call to kiocb_set_cancel_fn instead
of ignoring it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 6295fc00f104..c32c315f05b5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -555,13 +555,12 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 	struct kioctx *ctx = req->ki_ctx;
 	unsigned long flags;
 
-	spin_lock_irqsave(&ctx->ctx_lock, flags);
-
-	if (!req->ki_list.next)
-		list_add(&req->ki_list, &ctx->active_reqs);
+	if (WARN_ON_ONCE(!list_empty(&req->ki_list)))
+		return;
 
+	spin_lock_irqsave(&ctx->ctx_lock, flags);
+	list_add_tail(&req->ki_list, &ctx->active_reqs);
 	req->ki_cancel = cancel;
-
 	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
@@ -1034,7 +1033,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 		goto out_put;
 
 	percpu_ref_get(&ctx->reqs);
-
+	INIT_LIST_HEAD(&req->ki_list);
 	req->ki_ctx = ctx;
 	return req;
 out_put:
@@ -1080,7 +1079,7 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
-	if (iocb->ki_list.next) {
+	if (!list_empty_careful(iocb->ki_list.next)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&ctx->ctx_lock, flags);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 04/36] aio: sanitize ki_list handling
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Instead of handcoded non-null checks always initialize ki_list to an
empty list and use list_empty / list_empty_careful on it.  While we're
at it also error out on a double call to kiocb_set_cancel_fn instead
of ignoring it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 6295fc00f104..c32c315f05b5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -555,13 +555,12 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 	struct kioctx *ctx = req->ki_ctx;
 	unsigned long flags;
 
-	spin_lock_irqsave(&ctx->ctx_lock, flags);
-
-	if (!req->ki_list.next)
-		list_add(&req->ki_list, &ctx->active_reqs);
+	if (WARN_ON_ONCE(!list_empty(&req->ki_list)))
+		return;
 
+	spin_lock_irqsave(&ctx->ctx_lock, flags);
+	list_add_tail(&req->ki_list, &ctx->active_reqs);
 	req->ki_cancel = cancel;
-
 	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
@@ -1034,7 +1033,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 		goto out_put;
 
 	percpu_ref_get(&ctx->reqs);
-
+	INIT_LIST_HEAD(&req->ki_list);
 	req->ki_ctx = ctx;
 	return req;
 out_put:
@@ -1080,7 +1079,7 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
-	if (iocb->ki_list.next) {
+	if (!list_empty_careful(iocb->ki_list.next)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&ctx->ctx_lock, flags);
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 05/36] aio: simplify cancellation
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

With the current aio code there is no need for the magic KIOCB_CANCELLED
value, as a cancelation just kicks the driver to queue the completion
ASAP, with all actual completion handling done in another thread. Given
that both the completion path and cancelation take the context lock there
is no need for magic cmpxchg loops either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 37 +++++++++----------------------------
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index c32c315f05b5..2d40cf5dd4ec 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -156,19 +156,6 @@ struct kioctx {
 	unsigned		id;
 };
 
-/*
- * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
- * cancelled or completed (this makes a certain amount of sense because
- * successful cancellation - io_cancel() - does deliver the completion to
- * userspace).
- *
- * And since most things don't implement kiocb cancellation and we'd really like
- * kiocb completion to be lockless when possible, we use ki_cancel to
- * synchronize cancellation and completion - we only set it to KIOCB_CANCELLED
- * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
- */
-#define KIOCB_CANCELLED		((void *) (~0ULL))
-
 struct aio_kiocb {
 	union {
 		struct kiocb		rw;
@@ -565,24 +552,18 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
+/*
+ * Only cancel if there ws a ki_cancel function to start with, and we
+ * are the one how managed to clear it (to protect against simulatinious
+ * cancel calls).
+ */
 static int kiocb_cancel(struct aio_kiocb *kiocb)
 {
-	kiocb_cancel_fn *old, *cancel;
-
-	/*
-	 * Don't want to set kiocb->ki_cancel = KIOCB_CANCELLED unless it
-	 * actually has a cancel function, hence the cmpxchg()
-	 */
-
-	cancel = READ_ONCE(kiocb->ki_cancel);
-	do {
-		if (!cancel || cancel == KIOCB_CANCELLED)
-			return -EINVAL;
-
-		old = cancel;
-		cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
-	} while (cancel != old);
+	kiocb_cancel_fn *cancel = kiocb->ki_cancel;
 
+	if (!cancel)
+		return -EINVAL;
+	kiocb->ki_cancel = NULL;
 	return cancel(&kiocb->rw);
 }
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 05/36] aio: simplify cancellation
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

With the current aio code there is no need for the magic KIOCB_CANCELLED
value, as a cancelation just kicks the driver to queue the completion
ASAP, with all actual completion handling done in another thread. Given
that both the completion path and cancelation take the context lock there
is no need for magic cmpxchg loops either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 37 +++++++++----------------------------
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index c32c315f05b5..2d40cf5dd4ec 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -156,19 +156,6 @@ struct kioctx {
 	unsigned		id;
 };
 
-/*
- * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
- * cancelled or completed (this makes a certain amount of sense because
- * successful cancellation - io_cancel() - does deliver the completion to
- * userspace).
- *
- * And since most things don't implement kiocb cancellation and we'd really like
- * kiocb completion to be lockless when possible, we use ki_cancel to
- * synchronize cancellation and completion - we only set it to KIOCB_CANCELLED
- * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
- */
-#define KIOCB_CANCELLED		((void *) (~0ULL))
-
 struct aio_kiocb {
 	union {
 		struct kiocb		rw;
@@ -565,24 +552,18 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
+/*
+ * Only cancel if there ws a ki_cancel function to start with, and we
+ * are the one how managed to clear it (to protect against simulatinious
+ * cancel calls).
+ */
 static int kiocb_cancel(struct aio_kiocb *kiocb)
 {
-	kiocb_cancel_fn *old, *cancel;
-
-	/*
-	 * Don't want to set kiocb->ki_cancel = KIOCB_CANCELLED unless it
-	 * actually has a cancel function, hence the cmpxchg()
-	 */
-
-	cancel = READ_ONCE(kiocb->ki_cancel);
-	do {
-		if (!cancel || cancel == KIOCB_CANCELLED)
-			return -EINVAL;
-
-		old = cancel;
-		cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
-	} while (cancel != old);
+	kiocb_cancel_fn *cancel = kiocb->ki_cancel;
 
+	if (!cancel)
+		return -EINVAL;
+	kiocb->ki_cancel = NULL;
 	return cancel(&kiocb->rw);
 }
 
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 06/36] aio: delete iocbs from the active_reqs list in kiocb_cancel
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

One we cancel an iocb there is no reason to keep it on the active_reqs
list, given that the list is only used to look for cancelation candidates.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 2d40cf5dd4ec..0b6394b4e528 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -561,6 +561,8 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
 {
 	kiocb_cancel_fn *cancel = kiocb->ki_cancel;
 
+	list_del_init(&kiocb->ki_list);
+
 	if (!cancel)
 		return -EINVAL;
 	kiocb->ki_cancel = NULL;
@@ -607,8 +609,6 @@ static void free_ioctx_users(struct percpu_ref *ref)
 	while (!list_empty(&ctx->active_reqs)) {
 		req = list_first_entry(&ctx->active_reqs,
 				       struct aio_kiocb, ki_list);
-
-		list_del_init(&req->ki_list);
 		kiocb_cancel(req);
 	}
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 06/36] aio: delete iocbs from the active_reqs list in kiocb_cancel
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

One we cancel an iocb there is no reason to keep it on the active_reqs
list, given that the list is only used to look for cancelation candidates.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 2d40cf5dd4ec..0b6394b4e528 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -561,6 +561,8 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
 {
 	kiocb_cancel_fn *cancel = kiocb->ki_cancel;
 
+	list_del_init(&kiocb->ki_list);
+
 	if (!cancel)
 		return -EINVAL;
 	kiocb->ki_cancel = NULL;
@@ -607,8 +609,6 @@ static void free_ioctx_users(struct percpu_ref *ref)
 	while (!list_empty(&ctx->active_reqs)) {
 		req = list_first_entry(&ctx->active_reqs,
 				       struct aio_kiocb, ki_list);
-
-		list_del_init(&req->ki_list);
 		kiocb_cancel(req);
 	}
 
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 07/36] aio: add delayed cancel support
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The upcoming aio poll support would like to be able to complete the
iocb inline from the cancellation context, but that would cause
a lock order reversal.  Add support for optionally moving the cancelation
outside the context lock to avoid this reversal.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 49 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 0b6394b4e528..9d7d6e4cde87 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -170,6 +170,10 @@ struct aio_kiocb {
 	struct list_head	ki_list;	/* the aio core uses this
 						 * for cancellation */
 
+	unsigned int		flags;		/* protected by ctx->ctx_lock */
+#define AIO_IOCB_DELAYED_CANCEL	(1 << 0)
+#define AIO_IOCB_CANCELLED	(1 << 1)
+
 	/*
 	 * If the aio_resfd field of the userspace iocb is not zero,
 	 * this is the underlying eventfd context to deliver events to.
@@ -536,9 +540,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 #define AIO_EVENTS_FIRST_PAGE	((PAGE_SIZE - sizeof(struct aio_ring)) / sizeof(struct io_event))
 #define AIO_EVENTS_OFFSET	(AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE)
 
-void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
+static void __kiocb_set_cancel_fn(struct aio_kiocb *req,
+		kiocb_cancel_fn *cancel, unsigned int iocb_flags)
 {
-	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
 	struct kioctx *ctx = req->ki_ctx;
 	unsigned long flags;
 
@@ -548,8 +552,15 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 	spin_lock_irqsave(&ctx->ctx_lock, flags);
 	list_add_tail(&req->ki_list, &ctx->active_reqs);
 	req->ki_cancel = cancel;
+	req->flags |= iocb_flags;
 	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 }
+
+void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
+{
+	return __kiocb_set_cancel_fn(container_of(iocb, struct aio_kiocb, rw),
+			cancel, 0);
+}
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
 /*
@@ -603,17 +614,27 @@ static void free_ioctx_users(struct percpu_ref *ref)
 {
 	struct kioctx *ctx = container_of(ref, struct kioctx, users);
 	struct aio_kiocb *req;
+	LIST_HEAD(list);
 
 	spin_lock_irq(&ctx->ctx_lock);
-
 	while (!list_empty(&ctx->active_reqs)) {
 		req = list_first_entry(&ctx->active_reqs,
 				       struct aio_kiocb, ki_list);
-		kiocb_cancel(req);
-	}
 
+		if (req->flags & AIO_IOCB_DELAYED_CANCEL) {
+			req->flags |= AIO_IOCB_CANCELLED;
+			list_move_tail(&req->ki_list, &list);
+		} else {
+			kiocb_cancel(req);
+		}
+	}
 	spin_unlock_irq(&ctx->ctx_lock);
 
+	while (!list_empty(&list)) {
+		req = list_first_entry(&list, struct aio_kiocb, ki_list);
+		kiocb_cancel(req);
+	}
+
 	percpu_ref_kill(&ctx->reqs);
 	percpu_ref_put(&ctx->reqs);
 }
@@ -1785,15 +1806,22 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 	if (unlikely(!ctx))
 		return -EINVAL;
 
-	spin_lock_irq(&ctx->ctx_lock);
+	ret = -EINVAL;
 
+	spin_lock_irq(&ctx->ctx_lock);
 	kiocb = lookup_kiocb(ctx, iocb, key);
+	if (kiocb) {
+		if (kiocb->flags & AIO_IOCB_DELAYED_CANCEL) {
+			kiocb->flags |= AIO_IOCB_CANCELLED;
+		} else {
+			ret = kiocb_cancel(kiocb);
+			kiocb = NULL;
+		}
+	}
+	spin_unlock_irq(&ctx->ctx_lock);
+
 	if (kiocb)
 		ret = kiocb_cancel(kiocb);
-	else
-		ret = -EINVAL;
-
-	spin_unlock_irq(&ctx->ctx_lock);
 
 	if (!ret) {
 		/*
@@ -1805,7 +1833,6 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 	}
 
 	percpu_ref_put(&ctx->users);
-
 	return ret;
 }
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 07/36] aio: add delayed cancel support
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The upcoming aio poll support would like to be able to complete the
iocb inline from the cancellation context, but that would cause
a lock order reversal.  Add support for optionally moving the cancelation
outside the context lock to avoid this reversal.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c | 49 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 0b6394b4e528..9d7d6e4cde87 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -170,6 +170,10 @@ struct aio_kiocb {
 	struct list_head	ki_list;	/* the aio core uses this
 						 * for cancellation */
 
+	unsigned int		flags;		/* protected by ctx->ctx_lock */
+#define AIO_IOCB_DELAYED_CANCEL	(1 << 0)
+#define AIO_IOCB_CANCELLED	(1 << 1)
+
 	/*
 	 * If the aio_resfd field of the userspace iocb is not zero,
 	 * this is the underlying eventfd context to deliver events to.
@@ -536,9 +540,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 #define AIO_EVENTS_FIRST_PAGE	((PAGE_SIZE - sizeof(struct aio_ring)) / sizeof(struct io_event))
 #define AIO_EVENTS_OFFSET	(AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE)
 
-void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
+static void __kiocb_set_cancel_fn(struct aio_kiocb *req,
+		kiocb_cancel_fn *cancel, unsigned int iocb_flags)
 {
-	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
 	struct kioctx *ctx = req->ki_ctx;
 	unsigned long flags;
 
@@ -548,8 +552,15 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 	spin_lock_irqsave(&ctx->ctx_lock, flags);
 	list_add_tail(&req->ki_list, &ctx->active_reqs);
 	req->ki_cancel = cancel;
+	req->flags |= iocb_flags;
 	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 }
+
+void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
+{
+	return __kiocb_set_cancel_fn(container_of(iocb, struct aio_kiocb, rw),
+			cancel, 0);
+}
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
 /*
@@ -603,17 +614,27 @@ static void free_ioctx_users(struct percpu_ref *ref)
 {
 	struct kioctx *ctx = container_of(ref, struct kioctx, users);
 	struct aio_kiocb *req;
+	LIST_HEAD(list);
 
 	spin_lock_irq(&ctx->ctx_lock);
-
 	while (!list_empty(&ctx->active_reqs)) {
 		req = list_first_entry(&ctx->active_reqs,
 				       struct aio_kiocb, ki_list);
-		kiocb_cancel(req);
-	}
 
+		if (req->flags & AIO_IOCB_DELAYED_CANCEL) {
+			req->flags |= AIO_IOCB_CANCELLED;
+			list_move_tail(&req->ki_list, &list);
+		} else {
+			kiocb_cancel(req);
+		}
+	}
 	spin_unlock_irq(&ctx->ctx_lock);
 
+	while (!list_empty(&list)) {
+		req = list_first_entry(&list, struct aio_kiocb, ki_list);
+		kiocb_cancel(req);
+	}
+
 	percpu_ref_kill(&ctx->reqs);
 	percpu_ref_put(&ctx->reqs);
 }
@@ -1785,15 +1806,22 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 	if (unlikely(!ctx))
 		return -EINVAL;
 
-	spin_lock_irq(&ctx->ctx_lock);
+	ret = -EINVAL;
 
+	spin_lock_irq(&ctx->ctx_lock);
 	kiocb = lookup_kiocb(ctx, iocb, key);
+	if (kiocb) {
+		if (kiocb->flags & AIO_IOCB_DELAYED_CANCEL) {
+			kiocb->flags |= AIO_IOCB_CANCELLED;
+		} else {
+			ret = kiocb_cancel(kiocb);
+			kiocb = NULL;
+		}
+	}
+	spin_unlock_irq(&ctx->ctx_lock);
+
 	if (kiocb)
 		ret = kiocb_cancel(kiocb);
-	else
-		ret = -EINVAL;
-
-	spin_unlock_irq(&ctx->ctx_lock);
 
 	if (!ret) {
 		/*
@@ -1805,7 +1833,6 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 	}
 
 	percpu_ref_put(&ctx->users);
-
 	return ret;
 }
 
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 08/36] aio: implement io_pgetevents
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

This is the io_getevents equivalent of ppoll/pselect and allows to
properly mix signals and aio completions (especially with IOCB_CMD_POLL)
and atomically executes the following sequence:

	sigset_t origmask;

	pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
	ret = io_getevents(ctx, min_nr, nr, events, timeout);
	pthread_sigmask(SIG_SETMASK, &origmask, NULL);

Note that unlike many other signal related calls we do not pass a sigmask
size, as that would get us to 7 arguments, which aren't easily supported
by the syscall infrastructure.  It seems a lot less painful to just add a
new syscall variant in the unlikely case we're going to increase the
sigset size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/aio.c                               | 114 ++++++++++++++++++++++++++++++---
 include/linux/compat.h                 |   7 ++
 include/linux/syscalls.h               |   6 ++
 include/uapi/asm-generic/unistd.h      |   4 +-
 include/uapi/linux/aio_abi.h           |   6 ++
 kernel/sys_ni.c                        |   2 +
 8 files changed, 130 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..5997c3e9ac3e 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382	i386	pkey_free		sys_pkey_free
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
+385	i386	io_pgetevents		sys_io_pgetevents		compat_sys_io_pgetevents
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..e995cd2b4e65 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330	common	pkey_alloc		sys_pkey_alloc
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
+333	common	io_pgetevents		sys_io_pgetevents
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/aio.c b/fs/aio.c
index 9d7d6e4cde87..da87cbf7c67a 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1291,10 +1291,6 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
 		wait_event_interruptible_hrtimeout(ctx->wait,
 				aio_read_events(ctx, min_nr, nr, event, &ret),
 				until);
-
-	if (!ret && signal_pending(current))
-		ret = -EINTR;
-
 	return ret;
 }
 
@@ -1874,13 +1870,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
 		struct timespec __user *, timeout)
 {
 	struct timespec64	ts;
+	int			ret;
+
+	if (timeout && unlikely(get_timespec64(&ts, timeout)))
+		return -EFAULT;
+
+	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+	if (!ret && signal_pending(current))
+		ret = -EINTR;
+	return ret;
+}
+
+SYSCALL_DEFINE6(io_pgetevents,
+		aio_context_t, ctx_id,
+		long, min_nr,
+		long, nr,
+		struct io_event __user *, events,
+		struct timespec __user *, timeout,
+		const struct __aio_sigset __user *, usig)
+{
+	struct __aio_sigset	ksig = { NULL, };
+	sigset_t		ksigmask, sigsaved;
+	struct timespec64	ts;
+	int ret;
+
+	if (timeout && unlikely(get_timespec64(&ts, timeout)))
+		return -EFAULT;
 
-	if (timeout) {
-		if (unlikely(get_timespec64(&ts, timeout)))
+	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+		return -EFAULT;
+
+	if (ksig.sigmask) {
+		if (ksig.sigsetsize != sizeof(sigset_t))
+			return -EINVAL;
+		if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
 			return -EFAULT;
+		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+	}
+
+	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+	if (signal_pending(current)) {
+		if (ksig.sigmask) {
+			current->saved_sigmask = sigsaved;
+			set_restore_sigmask();
+		}
+
+		if (!ret)
+			ret = -ERESTARTNOHAND;
+	} else {
+		if (ksig.sigmask)
+			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
 	}
 
-	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+	return ret;
 }
 
 #ifdef CONFIG_COMPAT
@@ -1891,13 +1934,64 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id,
 		       struct compat_timespec __user *, timeout)
 {
 	struct timespec64 t;
+	int ret;
+
+	if (timeout && compat_get_timespec64(&t, timeout))
+		return -EFAULT;
+
+	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+	if (!ret && signal_pending(current))
+		ret = -EINTR;
+	return ret;
+}
+
+
+struct __compat_aio_sigset {
+	compat_sigset_t __user	*sigmask;
+	compat_size_t		sigsetsize;
+};
+
+COMPAT_SYSCALL_DEFINE6(io_pgetevents,
+		compat_aio_context_t, ctx_id,
+		compat_long_t, min_nr,
+		compat_long_t, nr,
+		struct io_event __user *, events,
+		struct compat_timespec __user *, timeout,
+		const struct __compat_aio_sigset __user *, usig)
+{
+	struct __compat_aio_sigset ksig = { NULL, };
+	sigset_t ksigmask, sigsaved;
+	struct timespec64 t;
+	int ret;
+
+	if (timeout && compat_get_timespec64(&t, timeout))
+		return -EFAULT;
 
-	if (timeout) {
-		if (compat_get_timespec64(&t, timeout))
+	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+		return -EFAULT;
+
+	if (ksig.sigmask) {
+		if (ksig.sigsetsize != sizeof(compat_sigset_t))
+			return -EINVAL;
+		if (get_compat_sigset(&ksigmask, ksig.sigmask))
 			return -EFAULT;
+		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+	}
 
+	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+	if (signal_pending(current)) {
+		if (ksig.sigmask) {
+			current->saved_sigmask = sigsaved;
+			set_restore_sigmask();
+		}
+		if (!ret)
+			ret = -ERESTARTNOHAND;
+	} else {
+		if (ksig.sigmask)
+			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
 	}
 
-	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+	return ret;
 }
 #endif
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 8a9643857c4a..bfb8a94fbabd 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -303,6 +303,7 @@ extern int put_compat_rusage(const struct rusage *,
 			     struct compat_rusage __user *);
 
 struct compat_siginfo;
+struct __compat_aio_sigset;
 
 extern asmlinkage long compat_sys_waitid(int, compat_pid_t,
 		struct compat_siginfo __user *, int,
@@ -634,6 +635,12 @@ asmlinkage long compat_sys_io_getevents(compat_aio_context_t ctx_id,
 					compat_long_t nr,
 					struct io_event __user *events,
 					struct compat_timespec __user *timeout);
+asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
+					compat_long_t min_nr,
+					compat_long_t nr,
+					struct io_event __user *events,
+					struct compat_timespec __user *timeout,
+					const struct __compat_aio_sigset __user *usig);
 asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr,
 				     u32 __user *iocb);
 asmlinkage long compat_sys_mount(const char __user *dev_name,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..8515ec53c81b 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -539,6 +539,12 @@ asmlinkage long sys_io_getevents(aio_context_t ctx_id,
 				long nr,
 				struct io_event __user *events,
 				struct timespec __user *timeout);
+asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
+				long min_nr,
+				long nr,
+				struct io_event __user *events,
+				struct timespec __user *timeout,
+				const struct __aio_sigset *sig);
 asmlinkage long sys_io_submit(aio_context_t, long,
 				struct iocb __user * __user *);
 asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8b87de067bc7..ce2ebbeece10 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
 __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 #define __NR_statx 291
 __SYSCALL(__NR_statx,     sys_statx)
+#define __NR_io_pgetevents 292
+__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
 
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * All syscalls below here should go away really,
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index a04adbc70ddf..2c0a3415beee 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -29,6 +29,7 @@
 
 #include <linux/types.h>
 #include <linux/fs.h>
+#include <linux/signal.h>
 #include <asm/byteorder.h>
 
 typedef __kernel_ulong_t aio_context_t;
@@ -108,5 +109,10 @@ struct iocb {
 #undef IFBIG
 #undef IFLITTLE
 
+struct __aio_sigset {
+	sigset_t __user	*sigmask;
+	size_t		sigsetsize;
+};
+
 #endif /* __LINUX__AIO_ABI_H */
 
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index b5189762d275..8f7705559b38 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -151,9 +151,11 @@ cond_syscall(sys_io_destroy);
 cond_syscall(sys_io_submit);
 cond_syscall(sys_io_cancel);
 cond_syscall(sys_io_getevents);
+cond_syscall(sys_io_pgetevents);
 cond_syscall(compat_sys_io_setup);
 cond_syscall(compat_sys_io_submit);
 cond_syscall(compat_sys_io_getevents);
+cond_syscall(compat_sys_io_pgetevents);
 cond_syscall(sys_sysfs);
 cond_syscall(sys_syslog);
 cond_syscall(sys_process_vm_readv);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

This is the io_getevents equivalent of ppoll/pselect and allows to
properly mix signals and aio completions (especially with IOCB_CMD_POLL)
and atomically executes the following sequence:

	sigset_t origmask;

	pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
	ret = io_getevents(ctx, min_nr, nr, events, timeout);
	pthread_sigmask(SIG_SETMASK, &origmask, NULL);

Note that unlike many other signal related calls we do not pass a sigmask
size, as that would get us to 7 arguments, which aren't easily supported
by the syscall infrastructure.  It seems a lot less painful to just add a
new syscall variant in the unlikely case we're going to increase the
sigset size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/aio.c                               | 114 ++++++++++++++++++++++++++++++---
 include/linux/compat.h                 |   7 ++
 include/linux/syscalls.h               |   6 ++
 include/uapi/asm-generic/unistd.h      |   4 +-
 include/uapi/linux/aio_abi.h           |   6 ++
 kernel/sys_ni.c                        |   2 +
 8 files changed, 130 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..5997c3e9ac3e 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382	i386	pkey_free		sys_pkey_free
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
+385	i386	io_pgetevents		sys_io_pgetevents		compat_sys_io_pgetevents
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..e995cd2b4e65 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330	common	pkey_alloc		sys_pkey_alloc
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
+333	common	io_pgetevents		sys_io_pgetevents
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/aio.c b/fs/aio.c
index 9d7d6e4cde87..da87cbf7c67a 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1291,10 +1291,6 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
 		wait_event_interruptible_hrtimeout(ctx->wait,
 				aio_read_events(ctx, min_nr, nr, event, &ret),
 				until);
-
-	if (!ret && signal_pending(current))
-		ret = -EINTR;
-
 	return ret;
 }
 
@@ -1874,13 +1870,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
 		struct timespec __user *, timeout)
 {
 	struct timespec64	ts;
+	int			ret;
+
+	if (timeout && unlikely(get_timespec64(&ts, timeout)))
+		return -EFAULT;
+
+	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+	if (!ret && signal_pending(current))
+		ret = -EINTR;
+	return ret;
+}
+
+SYSCALL_DEFINE6(io_pgetevents,
+		aio_context_t, ctx_id,
+		long, min_nr,
+		long, nr,
+		struct io_event __user *, events,
+		struct timespec __user *, timeout,
+		const struct __aio_sigset __user *, usig)
+{
+	struct __aio_sigset	ksig = { NULL, };
+	sigset_t		ksigmask, sigsaved;
+	struct timespec64	ts;
+	int ret;
+
+	if (timeout && unlikely(get_timespec64(&ts, timeout)))
+		return -EFAULT;
 
-	if (timeout) {
-		if (unlikely(get_timespec64(&ts, timeout)))
+	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+		return -EFAULT;
+
+	if (ksig.sigmask) {
+		if (ksig.sigsetsize != sizeof(sigset_t))
+			return -EINVAL;
+		if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
 			return -EFAULT;
+		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+	}
+
+	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+	if (signal_pending(current)) {
+		if (ksig.sigmask) {
+			current->saved_sigmask = sigsaved;
+			set_restore_sigmask();
+		}
+
+		if (!ret)
+			ret = -ERESTARTNOHAND;
+	} else {
+		if (ksig.sigmask)
+			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
 	}
 
-	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+	return ret;
 }
 
 #ifdef CONFIG_COMPAT
@@ -1891,13 +1934,64 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id,
 		       struct compat_timespec __user *, timeout)
 {
 	struct timespec64 t;
+	int ret;
+
+	if (timeout && compat_get_timespec64(&t, timeout))
+		return -EFAULT;
+
+	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+	if (!ret && signal_pending(current))
+		ret = -EINTR;
+	return ret;
+}
+
+
+struct __compat_aio_sigset {
+	compat_sigset_t __user	*sigmask;
+	compat_size_t		sigsetsize;
+};
+
+COMPAT_SYSCALL_DEFINE6(io_pgetevents,
+		compat_aio_context_t, ctx_id,
+		compat_long_t, min_nr,
+		compat_long_t, nr,
+		struct io_event __user *, events,
+		struct compat_timespec __user *, timeout,
+		const struct __compat_aio_sigset __user *, usig)
+{
+	struct __compat_aio_sigset ksig = { NULL, };
+	sigset_t ksigmask, sigsaved;
+	struct timespec64 t;
+	int ret;
+
+	if (timeout && compat_get_timespec64(&t, timeout))
+		return -EFAULT;
 
-	if (timeout) {
-		if (compat_get_timespec64(&t, timeout))
+	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+		return -EFAULT;
+
+	if (ksig.sigmask) {
+		if (ksig.sigsetsize != sizeof(compat_sigset_t))
+			return -EINVAL;
+		if (get_compat_sigset(&ksigmask, ksig.sigmask))
 			return -EFAULT;
+		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+	}
 
+	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+	if (signal_pending(current)) {
+		if (ksig.sigmask) {
+			current->saved_sigmask = sigsaved;
+			set_restore_sigmask();
+		}
+		if (!ret)
+			ret = -ERESTARTNOHAND;
+	} else {
+		if (ksig.sigmask)
+			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
 	}
 
-	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+	return ret;
 }
 #endif
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 8a9643857c4a..bfb8a94fbabd 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -303,6 +303,7 @@ extern int put_compat_rusage(const struct rusage *,
 			     struct compat_rusage __user *);
 
 struct compat_siginfo;
+struct __compat_aio_sigset;
 
 extern asmlinkage long compat_sys_waitid(int, compat_pid_t,
 		struct compat_siginfo __user *, int,
@@ -634,6 +635,12 @@ asmlinkage long compat_sys_io_getevents(compat_aio_context_t ctx_id,
 					compat_long_t nr,
 					struct io_event __user *events,
 					struct compat_timespec __user *timeout);
+asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
+					compat_long_t min_nr,
+					compat_long_t nr,
+					struct io_event __user *events,
+					struct compat_timespec __user *timeout,
+					const struct __compat_aio_sigset __user *usig);
 asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr,
 				     u32 __user *iocb);
 asmlinkage long compat_sys_mount(const char __user *dev_name,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..8515ec53c81b 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -539,6 +539,12 @@ asmlinkage long sys_io_getevents(aio_context_t ctx_id,
 				long nr,
 				struct io_event __user *events,
 				struct timespec __user *timeout);
+asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
+				long min_nr,
+				long nr,
+				struct io_event __user *events,
+				struct timespec __user *timeout,
+				const struct __aio_sigset *sig);
 asmlinkage long sys_io_submit(aio_context_t, long,
 				struct iocb __user * __user *);
 asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8b87de067bc7..ce2ebbeece10 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
 __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 #define __NR_statx 291
 __SYSCALL(__NR_statx,     sys_statx)
+#define __NR_io_pgetevents 292
+__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
 
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * All syscalls below here should go away really,
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index a04adbc70ddf..2c0a3415beee 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -29,6 +29,7 @@
 
 #include <linux/types.h>
 #include <linux/fs.h>
+#include <linux/signal.h>
 #include <asm/byteorder.h>
 
 typedef __kernel_ulong_t aio_context_t;
@@ -108,5 +109,10 @@ struct iocb {
 #undef IFBIG
 #undef IFLITTLE
 
+struct __aio_sigset {
+	sigset_t __user	*sigmask;
+	size_t		sigsetsize;
+};
+
 #endif /* __LINUX__AIO_ABI_H */
 
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index b5189762d275..8f7705559b38 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -151,9 +151,11 @@ cond_syscall(sys_io_destroy);
 cond_syscall(sys_io_submit);
 cond_syscall(sys_io_cancel);
 cond_syscall(sys_io_getevents);
+cond_syscall(sys_io_pgetevents);
 cond_syscall(compat_sys_io_setup);
 cond_syscall(compat_sys_io_submit);
 cond_syscall(compat_sys_io_getevents);
+cond_syscall(compat_sys_io_pgetevents);
 cond_syscall(sys_sysfs);
 cond_syscall(sys_syslog);
 cond_syscall(sys_process_vm_readv);
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 09/36] fs: unexport poll_schedule_timeout
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

No users outside of select.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/select.c          | 3 +--
 include/linux/poll.h | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index b6c36254028a..686de7b3a1db 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -233,7 +233,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
 	add_wait_queue(wait_address, &entry->wait);
 }
 
-int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
+static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
 			  ktime_t *expires, unsigned long slack)
 {
 	int rc = -EINTR;
@@ -258,7 +258,6 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
 
 	return rc;
 }
-EXPORT_SYMBOL(poll_schedule_timeout);
 
 /**
  * poll_select_set_timeout - helper function to setup the timeout value
diff --git a/include/linux/poll.h b/include/linux/poll.h
index f45ebd017eaa..a3576da63377 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -96,8 +96,6 @@ struct poll_wqueues {
 
 extern void poll_initwait(struct poll_wqueues *pwq);
 extern void poll_freewait(struct poll_wqueues *pwq);
-extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
-				 ktime_t *expires, unsigned long slack);
 extern u64 select_estimate_accuracy(struct timespec64 *tv);
 
 #define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1)
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 09/36] fs: unexport poll_schedule_timeout
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

No users outside of select.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/select.c          | 3 +--
 include/linux/poll.h | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index b6c36254028a..686de7b3a1db 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -233,7 +233,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
 	add_wait_queue(wait_address, &entry->wait);
 }
 
-int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
+static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
 			  ktime_t *expires, unsigned long slack)
 {
 	int rc = -EINTR;
@@ -258,7 +258,6 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
 
 	return rc;
 }
-EXPORT_SYMBOL(poll_schedule_timeout);
 
 /**
  * poll_select_set_timeout - helper function to setup the timeout value
diff --git a/include/linux/poll.h b/include/linux/poll.h
index f45ebd017eaa..a3576da63377 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -96,8 +96,6 @@ struct poll_wqueues {
 
 extern void poll_initwait(struct poll_wqueues *pwq);
 extern void poll_freewait(struct poll_wqueues *pwq);
-extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
-				 ktime_t *expires, unsigned long slack);
 extern u64 select_estimate_accuracy(struct timespec64 *tv);
 
 #define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1)
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 10/36] fs: cleanup do_pollfd
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Use straigline code with failure handling gotos instead of a lot
of nested conditionals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/select.c | 48 +++++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 686de7b3a1db..c6c504a814f9 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -806,34 +806,32 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 				     bool *can_busy_poll,
 				     __poll_t busy_flag)
 {
-	__poll_t mask;
-	int fd;
-
-	mask = 0;
-	fd = pollfd->fd;
-	if (fd >= 0) {
-		struct fd f = fdget(fd);
-		mask = EPOLLNVAL;
-		if (f.file) {
-			/* userland u16 ->events contains POLL... bitmap */
-			__poll_t filter = demangle_poll(pollfd->events) |
-						EPOLLERR | EPOLLHUP;
-			mask = DEFAULT_POLLMASK;
-			if (f.file->f_op->poll) {
-				pwait->_key = filter;
-				pwait->_key |= busy_flag;
-				mask = f.file->f_op->poll(f.file, pwait);
-				if (mask & busy_flag)
-					*can_busy_poll = true;
-			}
-			/* Mask out unneeded events. */
-			mask &= filter;
-			fdput(f);
-		}
+	int fd = pollfd->fd;
+	__poll_t mask = 0, filter;
+	struct fd f;
+
+	if (fd < 0)
+		goto out;
+	mask = EPOLLNVAL;
+	f = fdget(fd);
+	if (!f.file)
+		goto out;
+
+	/* userland u16 ->events contains POLL... bitmap */
+	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
+	mask = DEFAULT_POLLMASK;
+	if (f.file->f_op->poll) {
+		pwait->_key = filter | busy_flag;
+		mask = f.file->f_op->poll(f.file, pwait);
+		if (mask & busy_flag)
+			*can_busy_poll = true;
 	}
+	mask &= filter;		/* Mask out unneeded events. */
+	fdput(f);
+
+out:
 	/* ... and so does ->revents */
 	pollfd->revents = mangle_poll(mask);
-
 	return mask;
 }
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 10/36] fs: cleanup do_pollfd
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Use straigline code with failure handling gotos instead of a lot
of nested conditionals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/select.c | 48 +++++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 686de7b3a1db..c6c504a814f9 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -806,34 +806,32 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 				     bool *can_busy_poll,
 				     __poll_t busy_flag)
 {
-	__poll_t mask;
-	int fd;
-
-	mask = 0;
-	fd = pollfd->fd;
-	if (fd >= 0) {
-		struct fd f = fdget(fd);
-		mask = EPOLLNVAL;
-		if (f.file) {
-			/* userland u16 ->events contains POLL... bitmap */
-			__poll_t filter = demangle_poll(pollfd->events) |
-						EPOLLERR | EPOLLHUP;
-			mask = DEFAULT_POLLMASK;
-			if (f.file->f_op->poll) {
-				pwait->_key = filter;
-				pwait->_key |= busy_flag;
-				mask = f.file->f_op->poll(f.file, pwait);
-				if (mask & busy_flag)
-					*can_busy_poll = true;
-			}
-			/* Mask out unneeded events. */
-			mask &= filter;
-			fdput(f);
-		}
+	int fd = pollfd->fd;
+	__poll_t mask = 0, filter;
+	struct fd f;
+
+	if (fd < 0)
+		goto out;
+	mask = EPOLLNVAL;
+	f = fdget(fd);
+	if (!f.file)
+		goto out;
+
+	/* userland u16 ->events contains POLL... bitmap */
+	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
+	mask = DEFAULT_POLLMASK;
+	if (f.file->f_op->poll) {
+		pwait->_key = filter | busy_flag;
+		mask = f.file->f_op->poll(f.file, pwait);
+		if (mask & busy_flag)
+			*can_busy_poll = true;
 	}
+	mask &= filter;		/* Mask out unneeded events. */
+	fdput(f);
+
+out:
 	/* ... and so does ->revents */
 	pollfd->revents = mangle_poll(mask);
-
 	return mask;
 }
 
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 11/36] fs: update documentation for __poll_t
  2018-03-05 21:27 ` Christoph Hellwig
                   ` (10 preceding siblings ...)
  (?)
@ 2018-03-05 21:27 ` Christoph Hellwig
  2018-03-20  2:19     ` Darrick J. Wong
  -1 siblings, 1 reply; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/Locking | 2 +-
 Documentation/filesystems/vfs.txt | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 75d2d57e2c44..220bba28f72b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -439,7 +439,7 @@ prototypes:
 	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
-	unsigned int (*poll) (struct file *, struct poll_table_struct *);
+	__poll_t (*poll) (struct file *, struct poll_table_struct *);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 5fd325df59e2..f608180ad59d 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -856,7 +856,7 @@ struct file_operations {
 	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
-	unsigned int (*poll) (struct file *, struct poll_table_struct *);
+	__poll_t (*poll) (struct file *, struct poll_table_struct *);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 12/36] fs: add new vfs_poll and file_can_poll helpers
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

These abstract out calls to the poll method in preparation for changes
in how we poll.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/staging/comedi/drivers/serial2002.c |  4 ++--
 drivers/vfio/virqfd.c                       |  2 +-
 drivers/vhost/vhost.c                       |  2 +-
 fs/eventpoll.c                              |  5 ++---
 fs/select.c                                 | 23 ++++++++---------------
 include/linux/poll.h                        | 12 ++++++++++++
 mm/memcontrol.c                             |  2 +-
 net/9p/trans_fd.c                           | 18 ++++--------------
 virt/kvm/eventfd.c                          |  2 +-
 9 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/comedi/drivers/serial2002.c b/drivers/staging/comedi/drivers/serial2002.c
index b3f3b4a201af..5471b2212a62 100644
--- a/drivers/staging/comedi/drivers/serial2002.c
+++ b/drivers/staging/comedi/drivers/serial2002.c
@@ -113,7 +113,7 @@ static void serial2002_tty_read_poll_wait(struct file *f, int timeout)
 		long elapsed;
 		__poll_t mask;
 
-		mask = f->f_op->poll(f, &table.pt);
+		mask = vfs_poll(f, &table.pt);
 		if (mask & (EPOLLRDNORM | EPOLLRDBAND | EPOLLIN |
 			    EPOLLHUP | EPOLLERR)) {
 			break;
@@ -136,7 +136,7 @@ static int serial2002_tty_read(struct file *f, int timeout)
 
 	result = -1;
 	if (!IS_ERR(f)) {
-		if (f->f_op->poll) {
+		if (file_can_poll(f)) {
 			serial2002_tty_read_poll_wait(f, timeout);
 
 			if (kernel_read(f, &ch, 1, &pos) == 1)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 085700f1be10..2a1be859ee71 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -166,7 +166,7 @@ int vfio_virqfd_enable(void *opaque,
 	init_waitqueue_func_entry(&virqfd->wait, virqfd_wakeup);
 	init_poll_funcptr(&virqfd->pt, virqfd_ptable_queue_proc);
 
-	events = irqfd.file->f_op->poll(irqfd.file, &virqfd->pt);
+	events = vfs_poll(irqfd.file, &virqfd->pt);
 
 	/*
 	 * Check if there was an event already pending on the eventfd
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 1b3e8d2d5c8b..4d27e288bb1d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -208,7 +208,7 @@ int vhost_poll_start(struct vhost_poll *poll, struct file *file)
 	if (poll->wqh)
 		return 0;
 
-	mask = file->f_op->poll(file, &poll->table);
+	mask = vfs_poll(file, &poll->table);
 	if (mask)
 		vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
 	if (mask & EPOLLERR) {
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 0f3494ed3ed0..2bebae5a38cf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -884,8 +884,7 @@ static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt,
 
 	pt->_key = epi->event.events;
 	if (!is_file_epoll(epi->ffd.file))
-		return epi->ffd.file->f_op->poll(epi->ffd.file, pt) &
-		       epi->event.events;
+		return vfs_poll(epi->ffd.file, pt) & epi->event.events;
 
 	ep = epi->ffd.file->private_data;
 	poll_wait(epi->ffd.file, &ep->poll_wait, pt);
@@ -2020,7 +2019,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 
 	/* The target file descriptor must support poll */
 	error = -EPERM;
-	if (!tf.file->f_op->poll)
+	if (!file_can_poll(tf.file))
 		goto error_tgt_fput;
 
 	/* Check if EPOLLWAKEUP is allowed */
diff --git a/fs/select.c b/fs/select.c
index c6c504a814f9..ba91103707ea 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -502,14 +502,10 @@ static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
 					continue;
 				f = fdget(i);
 				if (f.file) {
-					const struct file_operations *f_op;
-					f_op = f.file->f_op;
-					mask = DEFAULT_POLLMASK;
-					if (f_op->poll) {
-						wait_key_set(wait, in, out,
-							     bit, busy_flag);
-						mask = (*f_op->poll)(f.file, wait);
-					}
+					wait_key_set(wait, in, out, bit,
+						     busy_flag);
+					mask = vfs_poll(f.file, wait);
+
 					fdput(f);
 					if ((mask & POLLIN_SET) && (in & bit)) {
 						res_in |= bit;
@@ -819,13 +815,10 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 
 	/* userland u16 ->events contains POLL... bitmap */
 	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
-	mask = DEFAULT_POLLMASK;
-	if (f.file->f_op->poll) {
-		pwait->_key = filter | busy_flag;
-		mask = f.file->f_op->poll(f.file, pwait);
-		if (mask & busy_flag)
-			*can_busy_poll = true;
-	}
+	pwait->_key = filter | busy_flag;
+	mask = vfs_poll(f.file, pwait);
+	if (mask & busy_flag)
+		*can_busy_poll = true;
 	mask &= filter;		/* Mask out unneeded events. */
 	fdput(f);
 
diff --git a/include/linux/poll.h b/include/linux/poll.h
index a3576da63377..7e0fdcf905d2 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -74,6 +74,18 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 	pt->_key   = ~(__poll_t)0; /* all events enabled */
 }
 
+static inline bool file_can_poll(struct file *file)
+{
+	return file->f_op->poll;
+}
+
+static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+{
+	if (unlikely(!file->f_op->poll))
+		return DEFAULT_POLLMASK;
+	return file->f_op->poll(file, pt);
+}
+
 struct poll_table_entry {
 	struct file *filp;
 	__poll_t key;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 670e99b68aa6..8774ece5c3c3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3849,7 +3849,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	if (ret)
 		goto out_put_css;
 
-	efile.file->f_op->poll(efile.file, &event->pt);
+	vfs_poll(efile.file, &event->pt);
 
 	spin_lock(&memcg->event_list_lock);
 	list_add(&event->list, &memcg->event_list);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 0cfba919d167..3811775692d0 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -231,7 +231,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
 static __poll_t
 p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
 {
-	__poll_t ret, n;
+	__poll_t ret;
 	struct p9_trans_fd *ts = NULL;
 
 	if (client && client->status == Connected)
@@ -243,19 +243,9 @@ p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
 		return EPOLLERR;
 	}
 
-	if (!ts->rd->f_op->poll)
-		ret = DEFAULT_POLLMASK;
-	else
-		ret = ts->rd->f_op->poll(ts->rd, pt);
-
-	if (ts->rd != ts->wr) {
-		if (!ts->wr->f_op->poll)
-			n = DEFAULT_POLLMASK;
-		else
-			n = ts->wr->f_op->poll(ts->wr, pt);
-		ret = (ret & ~EPOLLOUT) | (n & ~EPOLLIN);
-	}
-
+	ret = vfs_poll(ts->rd, pt);
+	if (ts->rd != ts->wr)
+		ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
 	return ret;
 }
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 6e865e8b5b10..90d30fbe95ae 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -397,7 +397,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	 * Check if there was an event already pending on the eventfd
 	 * before we registered, and trigger it as if we didn't miss it.
 	 */
-	events = f.file->f_op->poll(f.file, &irqfd->pt);
+	events = vfs_poll(f.file, &irqfd->pt);
 
 	if (events & EPOLLIN)
 		schedule_work(&irqfd->inject);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 12/36] fs: add new vfs_poll and file_can_poll helpers
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

These abstract out calls to the poll method in preparation for changes
in how we poll.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/staging/comedi/drivers/serial2002.c |  4 ++--
 drivers/vfio/virqfd.c                       |  2 +-
 drivers/vhost/vhost.c                       |  2 +-
 fs/eventpoll.c                              |  5 ++---
 fs/select.c                                 | 23 ++++++++---------------
 include/linux/poll.h                        | 12 ++++++++++++
 mm/memcontrol.c                             |  2 +-
 net/9p/trans_fd.c                           | 18 ++++--------------
 virt/kvm/eventfd.c                          |  2 +-
 9 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/comedi/drivers/serial2002.c b/drivers/staging/comedi/drivers/serial2002.c
index b3f3b4a201af..5471b2212a62 100644
--- a/drivers/staging/comedi/drivers/serial2002.c
+++ b/drivers/staging/comedi/drivers/serial2002.c
@@ -113,7 +113,7 @@ static void serial2002_tty_read_poll_wait(struct file *f, int timeout)
 		long elapsed;
 		__poll_t mask;
 
-		mask = f->f_op->poll(f, &table.pt);
+		mask = vfs_poll(f, &table.pt);
 		if (mask & (EPOLLRDNORM | EPOLLRDBAND | EPOLLIN |
 			    EPOLLHUP | EPOLLERR)) {
 			break;
@@ -136,7 +136,7 @@ static int serial2002_tty_read(struct file *f, int timeout)
 
 	result = -1;
 	if (!IS_ERR(f)) {
-		if (f->f_op->poll) {
+		if (file_can_poll(f)) {
 			serial2002_tty_read_poll_wait(f, timeout);
 
 			if (kernel_read(f, &ch, 1, &pos) == 1)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 085700f1be10..2a1be859ee71 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -166,7 +166,7 @@ int vfio_virqfd_enable(void *opaque,
 	init_waitqueue_func_entry(&virqfd->wait, virqfd_wakeup);
 	init_poll_funcptr(&virqfd->pt, virqfd_ptable_queue_proc);
 
-	events = irqfd.file->f_op->poll(irqfd.file, &virqfd->pt);
+	events = vfs_poll(irqfd.file, &virqfd->pt);
 
 	/*
 	 * Check if there was an event already pending on the eventfd
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 1b3e8d2d5c8b..4d27e288bb1d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -208,7 +208,7 @@ int vhost_poll_start(struct vhost_poll *poll, struct file *file)
 	if (poll->wqh)
 		return 0;
 
-	mask = file->f_op->poll(file, &poll->table);
+	mask = vfs_poll(file, &poll->table);
 	if (mask)
 		vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
 	if (mask & EPOLLERR) {
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 0f3494ed3ed0..2bebae5a38cf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -884,8 +884,7 @@ static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt,
 
 	pt->_key = epi->event.events;
 	if (!is_file_epoll(epi->ffd.file))
-		return epi->ffd.file->f_op->poll(epi->ffd.file, pt) &
-		       epi->event.events;
+		return vfs_poll(epi->ffd.file, pt) & epi->event.events;
 
 	ep = epi->ffd.file->private_data;
 	poll_wait(epi->ffd.file, &ep->poll_wait, pt);
@@ -2020,7 +2019,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 
 	/* The target file descriptor must support poll */
 	error = -EPERM;
-	if (!tf.file->f_op->poll)
+	if (!file_can_poll(tf.file))
 		goto error_tgt_fput;
 
 	/* Check if EPOLLWAKEUP is allowed */
diff --git a/fs/select.c b/fs/select.c
index c6c504a814f9..ba91103707ea 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -502,14 +502,10 @@ static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
 					continue;
 				f = fdget(i);
 				if (f.file) {
-					const struct file_operations *f_op;
-					f_op = f.file->f_op;
-					mask = DEFAULT_POLLMASK;
-					if (f_op->poll) {
-						wait_key_set(wait, in, out,
-							     bit, busy_flag);
-						mask = (*f_op->poll)(f.file, wait);
-					}
+					wait_key_set(wait, in, out, bit,
+						     busy_flag);
+					mask = vfs_poll(f.file, wait);
+
 					fdput(f);
 					if ((mask & POLLIN_SET) && (in & bit)) {
 						res_in |= bit;
@@ -819,13 +815,10 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 
 	/* userland u16 ->events contains POLL... bitmap */
 	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
-	mask = DEFAULT_POLLMASK;
-	if (f.file->f_op->poll) {
-		pwait->_key = filter | busy_flag;
-		mask = f.file->f_op->poll(f.file, pwait);
-		if (mask & busy_flag)
-			*can_busy_poll = true;
-	}
+	pwait->_key = filter | busy_flag;
+	mask = vfs_poll(f.file, pwait);
+	if (mask & busy_flag)
+		*can_busy_poll = true;
 	mask &= filter;		/* Mask out unneeded events. */
 	fdput(f);
 
diff --git a/include/linux/poll.h b/include/linux/poll.h
index a3576da63377..7e0fdcf905d2 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -74,6 +74,18 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 	pt->_key   = ~(__poll_t)0; /* all events enabled */
 }
 
+static inline bool file_can_poll(struct file *file)
+{
+	return file->f_op->poll;
+}
+
+static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+{
+	if (unlikely(!file->f_op->poll))
+		return DEFAULT_POLLMASK;
+	return file->f_op->poll(file, pt);
+}
+
 struct poll_table_entry {
 	struct file *filp;
 	__poll_t key;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 670e99b68aa6..8774ece5c3c3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3849,7 +3849,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	if (ret)
 		goto out_put_css;
 
-	efile.file->f_op->poll(efile.file, &event->pt);
+	vfs_poll(efile.file, &event->pt);
 
 	spin_lock(&memcg->event_list_lock);
 	list_add(&event->list, &memcg->event_list);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 0cfba919d167..3811775692d0 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -231,7 +231,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
 static __poll_t
 p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
 {
-	__poll_t ret, n;
+	__poll_t ret;
 	struct p9_trans_fd *ts = NULL;
 
 	if (client && client->status == Connected)
@@ -243,19 +243,9 @@ p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
 		return EPOLLERR;
 	}
 
-	if (!ts->rd->f_op->poll)
-		ret = DEFAULT_POLLMASK;
-	else
-		ret = ts->rd->f_op->poll(ts->rd, pt);
-
-	if (ts->rd != ts->wr) {
-		if (!ts->wr->f_op->poll)
-			n = DEFAULT_POLLMASK;
-		else
-			n = ts->wr->f_op->poll(ts->wr, pt);
-		ret = (ret & ~EPOLLOUT) | (n & ~EPOLLIN);
-	}
-
+	ret = vfs_poll(ts->rd, pt);
+	if (ts->rd != ts->wr)
+		ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
 	return ret;
 }
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 6e865e8b5b10..90d30fbe95ae 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -397,7 +397,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	 * Check if there was an event already pending on the eventfd
 	 * before we registered, and trigger it as if we didn't miss it.
 	 */
-	events = f.file->f_op->poll(f.file, &irqfd->pt);
+	events = vfs_poll(f.file, &irqfd->pt);
 
 	if (events & EPOLLIN)
 		schedule_work(&irqfd->inject);
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 13/36] fs: introduce new ->get_poll_head and ->poll_mask methods
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

->get_poll_head returns the waitqueue that the poll operation is going
to sleep on.  Note that this means we can only use a single waitqueue
for the poll, unlike some current drivers that use two waitqueues for
different events.  But now that we have keyed wakeups and heavily use
those for poll there aren't that many good reason left to keep the
multiple waitqueues, and if there are any ->poll is still around, the
driver just won't support aio poll.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/Locking |  7 ++++++-
 Documentation/filesystems/vfs.txt | 13 +++++++++++++
 fs/select.c                       | 28 ++++++++++++++++++++++++++++
 include/linux/fs.h                |  2 ++
 include/linux/poll.h              | 27 +++++++++++++++++++++++----
 5 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 220bba28f72b..6d227f9d7bd9 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -440,6 +440,8 @@ prototypes:
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
+	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
@@ -470,7 +472,7 @@ prototypes:
 };
 
 locking rules:
-	All may block.
+	All except for ->poll_mask may block.
 
 ->llseek() locking has moved from llseek to the individual llseek
 implementations.  If your fs is not using generic_file_llseek, you
@@ -498,6 +500,9 @@ in sys_read() and friends.
 the lease within the individual filesystem to record the result of the
 operation
 
+->poll_mask can be called with or without the waitqueue lock for the waitqueue
+returned from ->get_poll_head.
+
 --------------------------- dquot_operations -------------------------------
 prototypes:
 	int (*write_dquot) (struct dquot *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index f608180ad59d..50ee13563271 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -857,6 +857,8 @@ struct file_operations {
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
+	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
@@ -901,6 +903,17 @@ otherwise noted.
 	activity on this file and (optionally) go to sleep until there
 	is activity. Called by the select(2) and poll(2) system calls
 
+  get_poll_head: Returns the struct wait_queue_head that poll, select,
+  epoll or aio poll should wait on in case this instance only has single
+  waitqueue.  Can return NULL to indicate polling is not supported,
+  or a POLL* value using the POLL_TO_PTR helper in case a grave error
+  occured and ->poll_mask shall not be called.
+
+  poll_mask: return the mask of POLL* values describing the file descriptor
+  state.  Called either before going to sleep on the waitqueue returned by
+  get_poll_head, or after it has been woken.  If ->get_poll_head and
+  ->poll_mask are implemented ->poll does not need to be implement.
+
   unlocked_ioctl: called by the ioctl(2) system call.
 
   compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
diff --git a/fs/select.c b/fs/select.c
index ba91103707ea..cc270d7f6192 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -34,6 +34,34 @@
 
 #include <linux/uaccess.h>
 
+__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+{
+	unsigned int events = poll_requested_events(pt);
+	struct wait_queue_head *head;
+
+	if (unlikely(!file_can_poll(file)))
+		return DEFAULT_POLLMASK;
+
+	if (file->f_op->poll)
+		return file->f_op->poll(file, pt);
+
+	/*
+	 * Only get the poll head and do the first mask check if we are actually
+	 * going to sleep on this file:
+	 */
+	if (pt && pt->_qproc) {
+		head = vfs_get_poll_head(file, events);
+		if (!head)
+			return DEFAULT_POLLMASK;
+		if (IS_ERR(head))
+			return PTR_TO_POLL(head);
+
+		pt->_qproc(file, head, pt);
+	}
+
+	return file->f_op->poll_mask(file, events);
+}
+EXPORT_SYMBOL_GPL(vfs_poll);
 
 /*
  * Estimate expected accuracy in ns from a timeval.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 79c413985305..6ea2c0843bb1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1708,6 +1708,8 @@ struct file_operations {
 	int (*iterate) (struct file *, struct dir_context *);
 	int (*iterate_shared) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
+	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 7e0fdcf905d2..42e8e8665fb0 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -74,18 +74,37 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 	pt->_key   = ~(__poll_t)0; /* all events enabled */
 }
 
+/*
+ * ->get_poll_head can return a __poll_t in the PTR_ERR, use these macros
+ * to return the value and recover it.  It takes care of the negation as
+ * well as off the annotations.
+ */
+#define POLL_TO_PTR(mask)	(ERR_PTR(-(__force int)(mask)))
+#define PTR_TO_POLL(ptr)	((__force __poll_t)-PTR_ERR((ptr)))
+
 static inline bool file_can_poll(struct file *file)
 {
-	return file->f_op->poll;
+	return file->f_op->poll ||
+		(file->f_op->get_poll_head && file->f_op->poll_mask);
 }
 
-static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+static inline struct wait_queue_head *vfs_get_poll_head(struct file *file,
+		__poll_t events)
 {
-	if (unlikely(!file->f_op->poll))
+	if (unlikely(!file->f_op->get_poll_head || !file->f_op->poll_mask))
+		return NULL;
+	return file->f_op->get_poll_head(file, events);
+}
+
+static inline __poll_t vfs_poll_mask(struct file *file, __poll_t events)
+{
+	if (unlikely(!file->f_op->poll_mask))
 		return DEFAULT_POLLMASK;
-	return file->f_op->poll(file, pt);
+	return file->f_op->poll_mask(file, events) & events;
 }
 
+__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt);
+
 struct poll_table_entry {
 	struct file *filp;
 	__poll_t key;
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 13/36] fs: introduce new ->get_poll_head and ->poll_mask methods
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

->get_poll_head returns the waitqueue that the poll operation is going
to sleep on.  Note that this means we can only use a single waitqueue
for the poll, unlike some current drivers that use two waitqueues for
different events.  But now that we have keyed wakeups and heavily use
those for poll there aren't that many good reason left to keep the
multiple waitqueues, and if there are any ->poll is still around, the
driver just won't support aio poll.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/Locking |  7 ++++++-
 Documentation/filesystems/vfs.txt | 13 +++++++++++++
 fs/select.c                       | 28 ++++++++++++++++++++++++++++
 include/linux/fs.h                |  2 ++
 include/linux/poll.h              | 27 +++++++++++++++++++++++----
 5 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 220bba28f72b..6d227f9d7bd9 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -440,6 +440,8 @@ prototypes:
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
+	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
@@ -470,7 +472,7 @@ prototypes:
 };
 
 locking rules:
-	All may block.
+	All except for ->poll_mask may block.
 
 ->llseek() locking has moved from llseek to the individual llseek
 implementations.  If your fs is not using generic_file_llseek, you
@@ -498,6 +500,9 @@ in sys_read() and friends.
 the lease within the individual filesystem to record the result of the
 operation
 
+->poll_mask can be called with or without the waitqueue lock for the waitqueue
+returned from ->get_poll_head.
+
 --------------------------- dquot_operations -------------------------------
 prototypes:
 	int (*write_dquot) (struct dquot *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index f608180ad59d..50ee13563271 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -857,6 +857,8 @@ struct file_operations {
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
+	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
@@ -901,6 +903,17 @@ otherwise noted.
 	activity on this file and (optionally) go to sleep until there
 	is activity. Called by the select(2) and poll(2) system calls
 
+  get_poll_head: Returns the struct wait_queue_head that poll, select,
+  epoll or aio poll should wait on in case this instance only has single
+  waitqueue.  Can return NULL to indicate polling is not supported,
+  or a POLL* value using the POLL_TO_PTR helper in case a grave error
+  occured and ->poll_mask shall not be called.
+
+  poll_mask: return the mask of POLL* values describing the file descriptor
+  state.  Called either before going to sleep on the waitqueue returned by
+  get_poll_head, or after it has been woken.  If ->get_poll_head and
+  ->poll_mask are implemented ->poll does not need to be implement.
+
   unlocked_ioctl: called by the ioctl(2) system call.
 
   compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
diff --git a/fs/select.c b/fs/select.c
index ba91103707ea..cc270d7f6192 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -34,6 +34,34 @@
 
 #include <linux/uaccess.h>
 
+__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+{
+	unsigned int events = poll_requested_events(pt);
+	struct wait_queue_head *head;
+
+	if (unlikely(!file_can_poll(file)))
+		return DEFAULT_POLLMASK;
+
+	if (file->f_op->poll)
+		return file->f_op->poll(file, pt);
+
+	/*
+	 * Only get the poll head and do the first mask check if we are actually
+	 * going to sleep on this file:
+	 */
+	if (pt && pt->_qproc) {
+		head = vfs_get_poll_head(file, events);
+		if (!head)
+			return DEFAULT_POLLMASK;
+		if (IS_ERR(head))
+			return PTR_TO_POLL(head);
+
+		pt->_qproc(file, head, pt);
+	}
+
+	return file->f_op->poll_mask(file, events);
+}
+EXPORT_SYMBOL_GPL(vfs_poll);
 
 /*
  * Estimate expected accuracy in ns from a timeval.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 79c413985305..6ea2c0843bb1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1708,6 +1708,8 @@ struct file_operations {
 	int (*iterate) (struct file *, struct dir_context *);
 	int (*iterate_shared) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
+	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 7e0fdcf905d2..42e8e8665fb0 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -74,18 +74,37 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 	pt->_key   = ~(__poll_t)0; /* all events enabled */
 }
 
+/*
+ * ->get_poll_head can return a __poll_t in the PTR_ERR, use these macros
+ * to return the value and recover it.  It takes care of the negation as
+ * well as off the annotations.
+ */
+#define POLL_TO_PTR(mask)	(ERR_PTR(-(__force int)(mask)))
+#define PTR_TO_POLL(ptr)	((__force __poll_t)-PTR_ERR((ptr)))
+
 static inline bool file_can_poll(struct file *file)
 {
-	return file->f_op->poll;
+	return file->f_op->poll ||
+		(file->f_op->get_poll_head && file->f_op->poll_mask);
 }
 
-static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+static inline struct wait_queue_head *vfs_get_poll_head(struct file *file,
+		__poll_t events)
 {
-	if (unlikely(!file->f_op->poll))
+	if (unlikely(!file->f_op->get_poll_head || !file->f_op->poll_mask))
+		return NULL;
+	return file->f_op->get_poll_head(file, events);
+}
+
+static inline __poll_t vfs_poll_mask(struct file *file, __poll_t events)
+{
+	if (unlikely(!file->f_op->poll_mask))
 		return DEFAULT_POLLMASK;
-	return file->f_op->poll(file, pt);
+	return file->f_op->poll_mask(file, events) & events;
 }
 
+__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt);
+
 struct poll_table_entry {
 	struct file *filp;
 	__poll_t key;
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 14/36] aio: implement IOCB_CMD_POLL
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Simple one-shot poll through the io_submit() interface.  To poll for
a file descriptor the application should submit an iocb of type
IOCB_CMD_POLL.  It will poll the fd for the events specified in the
the first 32 bits of the aio_buf field of the iocb.

Unlike poll or epoll without EPOLLONESHOT this interface always works
in one shot mode, that is once the iocb is completed, it will have to be
resubmitted.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c                     | 102 +++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/aio_abi.h |   6 +--
 2 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index da87cbf7c67a..0bafc4975d51 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -5,6 +5,7 @@
  *	Implements an efficient asynchronous io interface.
  *
  *	Copyright 2000, 2001, 2002 Red Hat, Inc.  All Rights Reserved.
+ *	Copyright 2018 Christoph Hellwig.
  *
  *	See ../COPYING for licensing terms.
  */
@@ -156,9 +157,17 @@ struct kioctx {
 	unsigned		id;
 };
 
+struct poll_iocb {
+	struct file		*file;
+	__poll_t		events;
+	struct wait_queue_head	*head;
+	struct wait_queue_entry	wait;
+};
+
 struct aio_kiocb {
 	union {
 		struct kiocb		rw;
+		struct poll_iocb	poll;
 	};
 
 	struct kioctx		*ki_ctx;
@@ -1565,6 +1574,96 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
 	return ret;
 }
 
+static void __aio_complete_poll(struct poll_iocb *req, __poll_t mask)
+{
+	fput(req->file);
+	aio_complete(container_of(req, struct aio_kiocb, poll),
+			mangle_poll(mask), 0);
+}
+
+static void aio_complete_poll(struct poll_iocb *req, __poll_t mask)
+{
+	struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
+
+	if (!(iocb->flags & AIO_IOCB_CANCELLED))
+		__aio_complete_poll(req, mask);
+}
+
+static int aio_poll_cancel(struct kiocb *rw)
+{
+	struct aio_kiocb *iocb = container_of(rw, struct aio_kiocb, rw);
+
+	remove_wait_queue(iocb->poll.head, &iocb->poll.wait);
+	__aio_complete_poll(&iocb->poll, 0); /* no events to report */
+	return 0;
+}
+
+static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
+		void *key)
+{
+	struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
+	struct file *file = req->file;
+	__poll_t mask = key_to_poll(key);
+
+	assert_spin_locked(&req->head->lock);
+
+	/* for instances that support it check for an event match first: */
+	if (mask && !(mask & req->events))
+		return 0;
+
+	mask = vfs_poll_mask(file, req->events);
+	if (!mask)
+		return 0;
+
+	__remove_wait_queue(req->head, &req->wait);
+	aio_complete_poll(req, mask);
+	return 1;
+}
+
+static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
+{
+	struct poll_iocb *req = &aiocb->poll;
+	unsigned long flags;
+	__poll_t mask;
+
+	/* reject any unknown events outside the normal event mask. */
+	if ((u16)iocb->aio_buf != iocb->aio_buf)
+		return -EINVAL;
+	/* reject fields that are not defined for poll */
+	if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
+		return -EINVAL;
+
+	req->events = demangle_poll(iocb->aio_buf) | POLLERR | POLLHUP;
+	req->file = fget(iocb->aio_fildes);
+	if (unlikely(!req->file))
+		return -EBADF;
+
+	req->head = vfs_get_poll_head(req->file, req->events);
+	if (!req->head) {
+		fput(req->file);
+		return -EINVAL; /* same as no support for IOCB_CMD_POLL */
+	}
+	if (IS_ERR(req->head)) {
+		mask = PTR_TO_POLL(req->head);
+		goto done;
+	}
+
+	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
+
+	spin_lock_irqsave(&req->head->lock, flags);
+	mask = vfs_poll_mask(req->file, req->events);
+	if (!mask) {
+		__kiocb_set_cancel_fn(aiocb, aio_poll_cancel,
+				AIO_IOCB_DELAYED_CANCEL);
+		__add_wait_queue(req->head, &req->wait);
+	}
+	spin_unlock_irqrestore(&req->head->lock, flags);
+done:
+	if (mask)
+		aio_complete_poll(req, mask);
+	return -EIOCBQUEUED;
+}
+
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 struct iocb *iocb, bool compat)
 {
@@ -1628,6 +1727,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	case IOCB_CMD_PWRITEV:
 		ret = aio_write(&req->rw, iocb, true, compat);
 		break;
+	case IOCB_CMD_POLL:
+		ret = aio_poll(req, iocb);
+		break;
 	default:
 		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
 		ret = -EINVAL;
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..ed0185945bb2 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -39,10 +39,8 @@ enum {
 	IOCB_CMD_PWRITE = 1,
 	IOCB_CMD_FSYNC = 2,
 	IOCB_CMD_FDSYNC = 3,
-	/* These two are experimental.
-	 * IOCB_CMD_PREADX = 4,
-	 * IOCB_CMD_POLL = 5,
-	 */
+	/* 4 was the experimental IOCB_CMD_PREADX */
+	IOCB_CMD_POLL = 5,
 	IOCB_CMD_NOOP = 6,
 	IOCB_CMD_PREADV = 7,
 	IOCB_CMD_PWRITEV = 8,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 14/36] aio: implement IOCB_CMD_POLL
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Simple one-shot poll through the io_submit() interface.  To poll for
a file descriptor the application should submit an iocb of type
IOCB_CMD_POLL.  It will poll the fd for the events specified in the
the first 32 bits of the aio_buf field of the iocb.

Unlike poll or epoll without EPOLLONESHOT this interface always works
in one shot mode, that is once the iocb is completed, it will have to be
resubmitted.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c                     | 102 +++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/aio_abi.h |   6 +--
 2 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index da87cbf7c67a..0bafc4975d51 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -5,6 +5,7 @@
  *	Implements an efficient asynchronous io interface.
  *
  *	Copyright 2000, 2001, 2002 Red Hat, Inc.  All Rights Reserved.
+ *	Copyright 2018 Christoph Hellwig.
  *
  *	See ../COPYING for licensing terms.
  */
@@ -156,9 +157,17 @@ struct kioctx {
 	unsigned		id;
 };
 
+struct poll_iocb {
+	struct file		*file;
+	__poll_t		events;
+	struct wait_queue_head	*head;
+	struct wait_queue_entry	wait;
+};
+
 struct aio_kiocb {
 	union {
 		struct kiocb		rw;
+		struct poll_iocb	poll;
 	};
 
 	struct kioctx		*ki_ctx;
@@ -1565,6 +1574,96 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
 	return ret;
 }
 
+static void __aio_complete_poll(struct poll_iocb *req, __poll_t mask)
+{
+	fput(req->file);
+	aio_complete(container_of(req, struct aio_kiocb, poll),
+			mangle_poll(mask), 0);
+}
+
+static void aio_complete_poll(struct poll_iocb *req, __poll_t mask)
+{
+	struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
+
+	if (!(iocb->flags & AIO_IOCB_CANCELLED))
+		__aio_complete_poll(req, mask);
+}
+
+static int aio_poll_cancel(struct kiocb *rw)
+{
+	struct aio_kiocb *iocb = container_of(rw, struct aio_kiocb, rw);
+
+	remove_wait_queue(iocb->poll.head, &iocb->poll.wait);
+	__aio_complete_poll(&iocb->poll, 0); /* no events to report */
+	return 0;
+}
+
+static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
+		void *key)
+{
+	struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
+	struct file *file = req->file;
+	__poll_t mask = key_to_poll(key);
+
+	assert_spin_locked(&req->head->lock);
+
+	/* for instances that support it check for an event match first: */
+	if (mask && !(mask & req->events))
+		return 0;
+
+	mask = vfs_poll_mask(file, req->events);
+	if (!mask)
+		return 0;
+
+	__remove_wait_queue(req->head, &req->wait);
+	aio_complete_poll(req, mask);
+	return 1;
+}
+
+static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
+{
+	struct poll_iocb *req = &aiocb->poll;
+	unsigned long flags;
+	__poll_t mask;
+
+	/* reject any unknown events outside the normal event mask. */
+	if ((u16)iocb->aio_buf != iocb->aio_buf)
+		return -EINVAL;
+	/* reject fields that are not defined for poll */
+	if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
+		return -EINVAL;
+
+	req->events = demangle_poll(iocb->aio_buf) | POLLERR | POLLHUP;
+	req->file = fget(iocb->aio_fildes);
+	if (unlikely(!req->file))
+		return -EBADF;
+
+	req->head = vfs_get_poll_head(req->file, req->events);
+	if (!req->head) {
+		fput(req->file);
+		return -EINVAL; /* same as no support for IOCB_CMD_POLL */
+	}
+	if (IS_ERR(req->head)) {
+		mask = PTR_TO_POLL(req->head);
+		goto done;
+	}
+
+	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
+
+	spin_lock_irqsave(&req->head->lock, flags);
+	mask = vfs_poll_mask(req->file, req->events);
+	if (!mask) {
+		__kiocb_set_cancel_fn(aiocb, aio_poll_cancel,
+				AIO_IOCB_DELAYED_CANCEL);
+		__add_wait_queue(req->head, &req->wait);
+	}
+	spin_unlock_irqrestore(&req->head->lock, flags);
+done:
+	if (mask)
+		aio_complete_poll(req, mask);
+	return -EIOCBQUEUED;
+}
+
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 struct iocb *iocb, bool compat)
 {
@@ -1628,6 +1727,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	case IOCB_CMD_PWRITEV:
 		ret = aio_write(&req->rw, iocb, true, compat);
 		break;
+	case IOCB_CMD_POLL:
+		ret = aio_poll(req, iocb);
+		break;
 	default:
 		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
 		ret = -EINVAL;
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..ed0185945bb2 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -39,10 +39,8 @@ enum {
 	IOCB_CMD_PWRITE = 1,
 	IOCB_CMD_FSYNC = 2,
 	IOCB_CMD_FDSYNC = 3,
-	/* These two are experimental.
-	 * IOCB_CMD_PREADX = 4,
-	 * IOCB_CMD_POLL = 5,
-	 */
+	/* 4 was the experimental IOCB_CMD_PREADX */
+	IOCB_CMD_POLL = 5,
 	IOCB_CMD_NOOP = 6,
 	IOCB_CMD_PREADV = 7,
 	IOCB_CMD_PWRITEV = 8,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 15/36] net: refactor socket_poll
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Factor out two busy poll related helpers for late reuse, and remove
a command that isn't very helpful, especially with the __poll_t
annotations in place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/busy_poll.h | 15 +++++++++++++++
 net/socket.c            | 21 ++++-----------------
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 71c72a939bf8..c5187438af38 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -121,6 +121,21 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
 #endif
 }
 
+static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
+{
+	if (sk_can_busy_loop(sock->sk) &&
+	    events && (events & POLL_BUSY_LOOP)) {
+		/* once, only if requested by syscall */
+		sk_busy_loop(sock->sk, 1);
+	}
+}
+
+/* if this socket can poll_ll, tell the system call */
+static inline __poll_t sock_poll_busy_flag(struct socket *sock)
+{
+	return sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0;
+}
+
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
 				    struct napi_struct *napi)
diff --git a/net/socket.c b/net/socket.c
index a93c99b518ca..3f859a07641a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1117,24 +1117,11 @@ EXPORT_SYMBOL(sock_create_lite);
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
-	__poll_t busy_flag = 0;
-	struct socket *sock;
-
-	/*
-	 *      We can't return errors to poll, so it's either yes or no.
-	 */
-	sock = file->private_data;
-
-	if (sk_can_busy_loop(sock->sk)) {
-		/* this socket can poll_ll so tell the system call */
-		busy_flag = POLL_BUSY_LOOP;
-
-		/* once, only if requested by syscall */
-		if (wait && (wait->_key & POLL_BUSY_LOOP))
-			sk_busy_loop(sock->sk, 1);
-	}
+	struct socket *sock = file->private_data;
+	__poll_t events = poll_requested_events(wait);
 
-	return busy_flag | sock->ops->poll(file, sock, wait);
+	sock_poll_busy_loop(sock, events);
+	return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 15/36] net: refactor socket_poll
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Factor out two busy poll related helpers for late reuse, and remove
a command that isn't very helpful, especially with the __poll_t
annotations in place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/busy_poll.h | 15 +++++++++++++++
 net/socket.c            | 21 ++++-----------------
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 71c72a939bf8..c5187438af38 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -121,6 +121,21 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
 #endif
 }
 
+static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
+{
+	if (sk_can_busy_loop(sock->sk) &&
+	    events && (events & POLL_BUSY_LOOP)) {
+		/* once, only if requested by syscall */
+		sk_busy_loop(sock->sk, 1);
+	}
+}
+
+/* if this socket can poll_ll, tell the system call */
+static inline __poll_t sock_poll_busy_flag(struct socket *sock)
+{
+	return sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0;
+}
+
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
 				    struct napi_struct *napi)
diff --git a/net/socket.c b/net/socket.c
index a93c99b518ca..3f859a07641a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1117,24 +1117,11 @@ EXPORT_SYMBOL(sock_create_lite);
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
-	__poll_t busy_flag = 0;
-	struct socket *sock;
-
-	/*
-	 *      We can't return errors to poll, so it's either yes or no.
-	 */
-	sock = file->private_data;
-
-	if (sk_can_busy_loop(sock->sk)) {
-		/* this socket can poll_ll so tell the system call */
-		busy_flag = POLL_BUSY_LOOP;
-
-		/* once, only if requested by syscall */
-		if (wait && (wait->_key & POLL_BUSY_LOOP))
-			sk_busy_loop(sock->sk, 1);
-	}
+	struct socket *sock = file->private_data;
+	__poll_t events = poll_requested_events(wait);
 
-	return busy_flag | sock->ops->poll(file, sock, wait);
+	sock_poll_busy_loop(sock, events);
+	return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 16/36] net: add support for ->poll_mask in proto_ops
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The socket file operations still implement ->poll until all protocols are
switched over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/net.h |  3 +++
 net/socket.c        | 51 ++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 91216b16feb7..ce3d4dacb51e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -147,6 +147,9 @@ struct proto_ops {
 	int		(*getname)   (struct socket *sock,
 				      struct sockaddr *addr,
 				      int *sockaddr_len, int peer);
+	struct wait_queue_head *(*get_poll_head)(struct socket *sock,
+				      __poll_t events);
+	__poll_t	(*poll_mask) (struct socket *sock, __poll_t events);
 	__poll_t	(*poll)	     (struct file *file, struct socket *sock,
 				      struct poll_table_struct *wait);
 	int		(*ioctl)     (struct socket *sock, unsigned int cmd,
diff --git a/net/socket.c b/net/socket.c
index 3f859a07641a..ceb69ddcd7bd 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -118,8 +118,10 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from);
 static int sock_mmap(struct file *file, struct vm_area_struct *vma);
 
 static int sock_close(struct inode *inode, struct file *file);
-static __poll_t sock_poll(struct file *file,
-			      struct poll_table_struct *wait);
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+		__poll_t events);
+static __poll_t sock_poll_mask(struct file *file, __poll_t);
+static __poll_t sock_poll(struct file *file, struct poll_table_struct *wait);
 static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 #ifdef CONFIG_COMPAT
 static long compat_sock_ioctl(struct file *file,
@@ -142,6 +144,8 @@ static const struct file_operations socket_file_ops = {
 	.llseek =	no_llseek,
 	.read_iter =	sock_read_iter,
 	.write_iter =	sock_write_iter,
+	.get_poll_head = sock_get_poll_head,
+	.poll_mask =	sock_poll_mask,
 	.poll =		sock_poll,
 	.unlocked_ioctl = sock_ioctl,
 #ifdef CONFIG_COMPAT
@@ -1114,14 +1118,51 @@ int sock_create_lite(int family, int type, int protocol, struct socket **res)
 }
 EXPORT_SYMBOL(sock_create_lite);
 
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+		__poll_t events)
+{
+	struct socket *sock = file->private_data;
+
+	if (!sock->ops->poll_mask)
+		return NULL;
+	if (sock->ops->get_poll_head)
+		return sock->ops->get_poll_head(sock, events);
+
+	sock_poll_busy_loop(sock, events);
+	return sk_sleep(sock->sk);
+}
+
+static __poll_t sock_poll_mask(struct file *file, __poll_t events)
+{
+	struct socket *sock = file->private_data;
+
+	/*
+	 * We need to be sure we are in sync with the socket flags modification.
+	 *
+	 * This memory barrier is paired in the wq_has_sleeper.
+	 */
+	smp_mb();
+
+	/* this socket can poll_ll so tell the system call */
+	return sock->ops->poll_mask(sock, events) |
+		(sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0);
+}
+
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
 	struct socket *sock = file->private_data;
-	__poll_t events = poll_requested_events(wait);
+	__poll_t events = poll_requested_events(wait), mask = 0;
 
-	sock_poll_busy_loop(sock, events);
-	return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
+	if (sock->ops->poll) {
+		sock_poll_busy_loop(sock, events);
+		mask = sock->ops->poll(file, sock, wait);
+	} else if (sock->ops->poll_mask) {
+		sock_poll_wait(file, sock_get_poll_head(file, events), wait);
+		mask = sock->ops->poll_mask(sock, events);
+	}
+
+	return mask | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 16/36] net: add support for ->poll_mask in proto_ops
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The socket file operations still implement ->poll until all protocols are
switched over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/net.h |  3 +++
 net/socket.c        | 51 ++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 91216b16feb7..ce3d4dacb51e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -147,6 +147,9 @@ struct proto_ops {
 	int		(*getname)   (struct socket *sock,
 				      struct sockaddr *addr,
 				      int *sockaddr_len, int peer);
+	struct wait_queue_head *(*get_poll_head)(struct socket *sock,
+				      __poll_t events);
+	__poll_t	(*poll_mask) (struct socket *sock, __poll_t events);
 	__poll_t	(*poll)	     (struct file *file, struct socket *sock,
 				      struct poll_table_struct *wait);
 	int		(*ioctl)     (struct socket *sock, unsigned int cmd,
diff --git a/net/socket.c b/net/socket.c
index 3f859a07641a..ceb69ddcd7bd 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -118,8 +118,10 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from);
 static int sock_mmap(struct file *file, struct vm_area_struct *vma);
 
 static int sock_close(struct inode *inode, struct file *file);
-static __poll_t sock_poll(struct file *file,
-			      struct poll_table_struct *wait);
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+		__poll_t events);
+static __poll_t sock_poll_mask(struct file *file, __poll_t);
+static __poll_t sock_poll(struct file *file, struct poll_table_struct *wait);
 static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 #ifdef CONFIG_COMPAT
 static long compat_sock_ioctl(struct file *file,
@@ -142,6 +144,8 @@ static const struct file_operations socket_file_ops = {
 	.llseek =	no_llseek,
 	.read_iter =	sock_read_iter,
 	.write_iter =	sock_write_iter,
+	.get_poll_head = sock_get_poll_head,
+	.poll_mask =	sock_poll_mask,
 	.poll =		sock_poll,
 	.unlocked_ioctl = sock_ioctl,
 #ifdef CONFIG_COMPAT
@@ -1114,14 +1118,51 @@ int sock_create_lite(int family, int type, int protocol, struct socket **res)
 }
 EXPORT_SYMBOL(sock_create_lite);
 
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+		__poll_t events)
+{
+	struct socket *sock = file->private_data;
+
+	if (!sock->ops->poll_mask)
+		return NULL;
+	if (sock->ops->get_poll_head)
+		return sock->ops->get_poll_head(sock, events);
+
+	sock_poll_busy_loop(sock, events);
+	return sk_sleep(sock->sk);
+}
+
+static __poll_t sock_poll_mask(struct file *file, __poll_t events)
+{
+	struct socket *sock = file->private_data;
+
+	/*
+	 * We need to be sure we are in sync with the socket flags modification.
+	 *
+	 * This memory barrier is paired in the wq_has_sleeper.
+	 */
+	smp_mb();
+
+	/* this socket can poll_ll so tell the system call */
+	return sock->ops->poll_mask(sock, events) |
+		(sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0);
+}
+
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
 	struct socket *sock = file->private_data;
-	__poll_t events = poll_requested_events(wait);
+	__poll_t events = poll_requested_events(wait), mask = 0;
 
-	sock_poll_busy_loop(sock, events);
-	return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
+	if (sock->ops->poll) {
+		sock_poll_busy_loop(sock, events);
+		mask = sock->ops->poll(file, sock, wait);
+	} else if (sock->ops->poll_mask) {
+		sock_poll_wait(file, sock_get_poll_head(file, events), wait);
+		mask = sock->ops->poll_mask(sock, events);
+	}
+
+	return mask | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 17/36] net: remove sock_no_poll
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Now that sock_poll handles a NULL ->poll or ->poll_mask there is no need
for a stub.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 crypto/af_alg.c             | 1 -
 crypto/algif_hash.c         | 2 --
 crypto/algif_rng.c          | 1 -
 drivers/isdn/mISDN/socket.c | 1 -
 drivers/net/ppp/pptp.c      | 1 -
 include/net/sock.h          | 2 --
 net/bluetooth/bnep/sock.c   | 1 -
 net/bluetooth/cmtp/sock.c   | 1 -
 net/bluetooth/hidp/sock.c   | 1 -
 net/core/sock.c             | 6 ------
 10 files changed, 17 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index c49766b03165..50d75de539f5 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -347,7 +347,6 @@ static const struct proto_ops alg_proto_ops = {
 	.sendpage	=	sock_no_sendpage,
 	.sendmsg	=	sock_no_sendmsg,
 	.recvmsg	=	sock_no_recvmsg,
-	.poll		=	sock_no_poll,
 
 	.bind		=	alg_bind,
 	.release	=	af_alg_release,
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 6c9b1927a520..bfcf595fd8f9 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -288,7 +288,6 @@ static struct proto_ops algif_hash_ops = {
 	.mmap		=	sock_no_mmap,
 	.bind		=	sock_no_bind,
 	.setsockopt	=	sock_no_setsockopt,
-	.poll		=	sock_no_poll,
 
 	.release	=	af_alg_release,
 	.sendmsg	=	hash_sendmsg,
@@ -396,7 +395,6 @@ static struct proto_ops algif_hash_ops_nokey = {
 	.mmap		=	sock_no_mmap,
 	.bind		=	sock_no_bind,
 	.setsockopt	=	sock_no_setsockopt,
-	.poll		=	sock_no_poll,
 
 	.release	=	af_alg_release,
 	.sendmsg	=	hash_sendmsg_nokey,
diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
index 150c2b6480ed..22df3799a17b 100644
--- a/crypto/algif_rng.c
+++ b/crypto/algif_rng.c
@@ -106,7 +106,6 @@ static struct proto_ops algif_rng_ops = {
 	.bind		=	sock_no_bind,
 	.accept		=	sock_no_accept,
 	.setsockopt	=	sock_no_setsockopt,
-	.poll		=	sock_no_poll,
 	.sendmsg	=	sock_no_sendmsg,
 	.sendpage	=	sock_no_sendpage,
 
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c5603d1a07d6..c84270e16bdd 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -746,7 +746,6 @@ static const struct proto_ops base_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 6dde9a0cfe76..87f892f1d0fe 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -627,7 +627,6 @@ static const struct proto_ops pptp_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept     = sock_no_accept,
 	.getname    = pptp_getname,
-	.poll       = sock_no_poll,
 	.listen     = sock_no_listen,
 	.shutdown   = sock_no_shutdown,
 	.setsockopt = sock_no_setsockopt,
diff --git a/include/net/sock.h b/include/net/sock.h
index 169c92afcafa..d9249fe65859 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1585,8 +1585,6 @@ int sock_no_connect(struct socket *, struct sockaddr *, int, int);
 int sock_no_socketpair(struct socket *, struct socket *);
 int sock_no_accept(struct socket *, struct socket *, int, bool);
 int sock_no_getname(struct socket *, struct sockaddr *, int *, int);
-__poll_t sock_no_poll(struct file *, struct socket *,
-			  struct poll_table_struct *);
 int sock_no_ioctl(struct socket *, unsigned int, unsigned long);
 int sock_no_listen(struct socket *, int);
 int sock_no_shutdown(struct socket *, int);
diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c
index b5116fa9835e..00deacdcb51c 100644
--- a/net/bluetooth/bnep/sock.c
+++ b/net/bluetooth/bnep/sock.c
@@ -175,7 +175,6 @@ static const struct proto_ops bnep_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index ce86a7bae844..e08f28fadd65 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -178,7 +178,6 @@ static const struct proto_ops cmtp_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c
index 008ba439bd62..1eaac01f85de 100644
--- a/net/bluetooth/hidp/sock.c
+++ b/net/bluetooth/hidp/sock.c
@@ -208,7 +208,6 @@ static const struct proto_ops hidp_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/net/core/sock.c b/net/core/sock.c
index c501499a04fe..b72b6ad050e4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2503,12 +2503,6 @@ int sock_no_getname(struct socket *sock, struct sockaddr *saddr,
 }
 EXPORT_SYMBOL(sock_no_getname);
 
-__poll_t sock_no_poll(struct file *file, struct socket *sock, poll_table *pt)
-{
-	return 0;
-}
-EXPORT_SYMBOL(sock_no_poll);
-
 int sock_no_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
 	return -EOPNOTSUPP;
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 17/36] net: remove sock_no_poll
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Now that sock_poll handles a NULL ->poll or ->poll_mask there is no need
for a stub.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 crypto/af_alg.c             | 1 -
 crypto/algif_hash.c         | 2 --
 crypto/algif_rng.c          | 1 -
 drivers/isdn/mISDN/socket.c | 1 -
 drivers/net/ppp/pptp.c      | 1 -
 include/net/sock.h          | 2 --
 net/bluetooth/bnep/sock.c   | 1 -
 net/bluetooth/cmtp/sock.c   | 1 -
 net/bluetooth/hidp/sock.c   | 1 -
 net/core/sock.c             | 6 ------
 10 files changed, 17 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index c49766b03165..50d75de539f5 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -347,7 +347,6 @@ static const struct proto_ops alg_proto_ops = {
 	.sendpage	=	sock_no_sendpage,
 	.sendmsg	=	sock_no_sendmsg,
 	.recvmsg	=	sock_no_recvmsg,
-	.poll		=	sock_no_poll,
 
 	.bind		=	alg_bind,
 	.release	=	af_alg_release,
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 6c9b1927a520..bfcf595fd8f9 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -288,7 +288,6 @@ static struct proto_ops algif_hash_ops = {
 	.mmap		=	sock_no_mmap,
 	.bind		=	sock_no_bind,
 	.setsockopt	=	sock_no_setsockopt,
-	.poll		=	sock_no_poll,
 
 	.release	=	af_alg_release,
 	.sendmsg	=	hash_sendmsg,
@@ -396,7 +395,6 @@ static struct proto_ops algif_hash_ops_nokey = {
 	.mmap		=	sock_no_mmap,
 	.bind		=	sock_no_bind,
 	.setsockopt	=	sock_no_setsockopt,
-	.poll		=	sock_no_poll,
 
 	.release	=	af_alg_release,
 	.sendmsg	=	hash_sendmsg_nokey,
diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
index 150c2b6480ed..22df3799a17b 100644
--- a/crypto/algif_rng.c
+++ b/crypto/algif_rng.c
@@ -106,7 +106,6 @@ static struct proto_ops algif_rng_ops = {
 	.bind		=	sock_no_bind,
 	.accept		=	sock_no_accept,
 	.setsockopt	=	sock_no_setsockopt,
-	.poll		=	sock_no_poll,
 	.sendmsg	=	sock_no_sendmsg,
 	.sendpage	=	sock_no_sendpage,
 
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c5603d1a07d6..c84270e16bdd 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -746,7 +746,6 @@ static const struct proto_ops base_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 6dde9a0cfe76..87f892f1d0fe 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -627,7 +627,6 @@ static const struct proto_ops pptp_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept     = sock_no_accept,
 	.getname    = pptp_getname,
-	.poll       = sock_no_poll,
 	.listen     = sock_no_listen,
 	.shutdown   = sock_no_shutdown,
 	.setsockopt = sock_no_setsockopt,
diff --git a/include/net/sock.h b/include/net/sock.h
index 169c92afcafa..d9249fe65859 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1585,8 +1585,6 @@ int sock_no_connect(struct socket *, struct sockaddr *, int, int);
 int sock_no_socketpair(struct socket *, struct socket *);
 int sock_no_accept(struct socket *, struct socket *, int, bool);
 int sock_no_getname(struct socket *, struct sockaddr *, int *, int);
-__poll_t sock_no_poll(struct file *, struct socket *,
-			  struct poll_table_struct *);
 int sock_no_ioctl(struct socket *, unsigned int, unsigned long);
 int sock_no_listen(struct socket *, int);
 int sock_no_shutdown(struct socket *, int);
diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c
index b5116fa9835e..00deacdcb51c 100644
--- a/net/bluetooth/bnep/sock.c
+++ b/net/bluetooth/bnep/sock.c
@@ -175,7 +175,6 @@ static const struct proto_ops bnep_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index ce86a7bae844..e08f28fadd65 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -178,7 +178,6 @@ static const struct proto_ops cmtp_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c
index 008ba439bd62..1eaac01f85de 100644
--- a/net/bluetooth/hidp/sock.c
+++ b/net/bluetooth/hidp/sock.c
@@ -208,7 +208,6 @@ static const struct proto_ops hidp_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/net/core/sock.c b/net/core/sock.c
index c501499a04fe..b72b6ad050e4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2503,12 +2503,6 @@ int sock_no_getname(struct socket *sock, struct sockaddr *saddr,
 }
 EXPORT_SYMBOL(sock_no_getname);
 
-__poll_t sock_no_poll(struct file *file, struct socket *sock, poll_table *pt)
-{
-	return 0;
-}
-EXPORT_SYMBOL(sock_no_poll);
-
 int sock_no_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
 	return -EOPNOTSUPP;
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 18/36] net/tcp: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/tcp.h   |  4 ++--
 net/ipv4/af_inet.c  |  3 ++-
 net/ipv4/tcp.c      | 31 ++++++++++++++-----------------
 net/ipv6/af_inet6.c |  3 ++-
 4 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e3fc667f9ac2..fb52f93d556c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -387,8 +387,8 @@ bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst);
 void tcp_close(struct sock *sk, long timeout);
 void tcp_init_sock(struct sock *sk);
 void tcp_init_transfer(struct sock *sk, int bpf_op);
-__poll_t tcp_poll(struct file *file, struct socket *sock,
-		      struct poll_table_struct *wait);
+struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t events);
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events);
 int tcp_getsockopt(struct sock *sk, int level, int optname,
 		   char __user *optval, int __user *optlen);
 int tcp_setsockopt(struct sock *sk, int level, int optname,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e4329e161943..ec32cc263b18 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -952,7 +952,8 @@ const struct proto_ops inet_stream_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = inet_getname,
-	.poll		   = tcp_poll,
+	.get_poll_head	   = tcp_get_poll_head,
+	.poll_mask	   = tcp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = inet_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 48636aee23c3..ad8e281066a0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -484,33 +484,30 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
 	}
 }
 
+struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t events)
+{
+	sock_poll_busy_loop(sock, events);
+	sock_rps_record_flow(sock->sk);
+	return sk_sleep(sock->sk);
+}
+EXPORT_SYMBOL(tcp_get_poll_head);
+
 /*
- *	Wait for a TCP event.
- *
- *	Note that we don't need to lock the socket, as the upper poll layers
- *	take care of normal races (between the test and the event) and we don't
- *	go look at any of the socket buffers directly.
+ * Socket is not locked. We are protected from async events by poll logic and
+ * correct handling of state changes made by other threads is impossible in
+ * any case.
  */
-__poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events)
 {
-	__poll_t mask;
 	struct sock *sk = sock->sk;
 	const struct tcp_sock *tp = tcp_sk(sk);
+	__poll_t mask = 0;
 	int state;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	state = inet_sk_state_load(sk);
 	if (state == TCP_LISTEN)
 		return inet_csk_listen_poll(sk);
 
-	/* Socket is not locked. We are protected from async events
-	 * by poll logic and correct handling of state changes
-	 * made by other threads is impossible in any case.
-	 */
-
-	mask = 0;
-
 	/*
 	 * EPOLLHUP is certainly not done right. But poll() doesn't
 	 * have a notion of HUP in just one direction, and for a
@@ -591,7 +588,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 
 	return mask;
 }
-EXPORT_SYMBOL(tcp_poll);
+EXPORT_SYMBOL(tcp_poll_mask);
 
 int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 416917719a6f..c470549d6ef9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -547,7 +547,8 @@ const struct proto_ops inet6_stream_ops = {
 	.socketpair	   = sock_no_socketpair,	/* a do nothing	*/
 	.accept		   = inet_accept,		/* ok		*/
 	.getname	   = inet6_getname,
-	.poll		   = tcp_poll,			/* ok		*/
+	.get_poll_head	   = tcp_get_poll_head,
+	.poll_mask	   = tcp_poll_mask,		/* ok		*/
 	.ioctl		   = inet6_ioctl,		/* must change  */
 	.listen		   = inet_listen,		/* ok		*/
 	.shutdown	   = inet_shutdown,		/* ok		*/
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 18/36] net/tcp: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/tcp.h   |  4 ++--
 net/ipv4/af_inet.c  |  3 ++-
 net/ipv4/tcp.c      | 31 ++++++++++++++-----------------
 net/ipv6/af_inet6.c |  3 ++-
 4 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e3fc667f9ac2..fb52f93d556c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -387,8 +387,8 @@ bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst);
 void tcp_close(struct sock *sk, long timeout);
 void tcp_init_sock(struct sock *sk);
 void tcp_init_transfer(struct sock *sk, int bpf_op);
-__poll_t tcp_poll(struct file *file, struct socket *sock,
-		      struct poll_table_struct *wait);
+struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t events);
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events);
 int tcp_getsockopt(struct sock *sk, int level, int optname,
 		   char __user *optval, int __user *optlen);
 int tcp_setsockopt(struct sock *sk, int level, int optname,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e4329e161943..ec32cc263b18 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -952,7 +952,8 @@ const struct proto_ops inet_stream_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = inet_getname,
-	.poll		   = tcp_poll,
+	.get_poll_head	   = tcp_get_poll_head,
+	.poll_mask	   = tcp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = inet_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 48636aee23c3..ad8e281066a0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -484,33 +484,30 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
 	}
 }
 
+struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t events)
+{
+	sock_poll_busy_loop(sock, events);
+	sock_rps_record_flow(sock->sk);
+	return sk_sleep(sock->sk);
+}
+EXPORT_SYMBOL(tcp_get_poll_head);
+
 /*
- *	Wait for a TCP event.
- *
- *	Note that we don't need to lock the socket, as the upper poll layers
- *	take care of normal races (between the test and the event) and we don't
- *	go look at any of the socket buffers directly.
+ * Socket is not locked. We are protected from async events by poll logic and
+ * correct handling of state changes made by other threads is impossible in
+ * any case.
  */
-__poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events)
 {
-	__poll_t mask;
 	struct sock *sk = sock->sk;
 	const struct tcp_sock *tp = tcp_sk(sk);
+	__poll_t mask = 0;
 	int state;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	state = inet_sk_state_load(sk);
 	if (state == TCP_LISTEN)
 		return inet_csk_listen_poll(sk);
 
-	/* Socket is not locked. We are protected from async events
-	 * by poll logic and correct handling of state changes
-	 * made by other threads is impossible in any case.
-	 */
-
-	mask = 0;
-
 	/*
 	 * EPOLLHUP is certainly not done right. But poll() doesn't
 	 * have a notion of HUP in just one direction, and for a
@@ -591,7 +588,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 
 	return mask;
 }
-EXPORT_SYMBOL(tcp_poll);
+EXPORT_SYMBOL(tcp_poll_mask);
 
 int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 416917719a6f..c470549d6ef9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -547,7 +547,8 @@ const struct proto_ops inet6_stream_ops = {
 	.socketpair	   = sock_no_socketpair,	/* a do nothing	*/
 	.accept		   = inet_accept,		/* ok		*/
 	.getname	   = inet6_getname,
-	.poll		   = tcp_poll,			/* ok		*/
+	.get_poll_head	   = tcp_get_poll_head,
+	.poll_mask	   = tcp_poll_mask,		/* ok		*/
 	.ioctl		   = inet6_ioctl,		/* must change  */
 	.listen		   = inet_listen,		/* ok		*/
 	.shutdown	   = inet_shutdown,		/* ok		*/
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 19/36] net/unix: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/unix/af_unix.c | 30 +++++++++++-------------------
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2d465bdeccbc..619c6921dd46 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -638,9 +638,8 @@ static int unix_stream_connect(struct socket *, struct sockaddr *,
 static int unix_socketpair(struct socket *, struct socket *);
 static int unix_accept(struct socket *, struct socket *, int, bool);
 static int unix_getname(struct socket *, struct sockaddr *, int *, int);
-static __poll_t unix_poll(struct file *, struct socket *, poll_table *);
-static __poll_t unix_dgram_poll(struct file *, struct socket *,
-				    poll_table *);
+static __poll_t unix_poll_mask(struct socket *, __poll_t);
+static __poll_t unix_dgram_poll_mask(struct socket *, __poll_t);
 static int unix_ioctl(struct socket *, unsigned int, unsigned long);
 static int unix_shutdown(struct socket *, int);
 static int unix_stream_sendmsg(struct socket *, struct msghdr *, size_t);
@@ -681,7 +680,7 @@ static const struct proto_ops unix_stream_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	unix_accept,
 	.getname =	unix_getname,
-	.poll =		unix_poll,
+	.poll_mask =	unix_poll_mask,
 	.ioctl =	unix_ioctl,
 	.listen =	unix_listen,
 	.shutdown =	unix_shutdown,
@@ -704,7 +703,7 @@ static const struct proto_ops unix_dgram_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	unix_getname,
-	.poll =		unix_dgram_poll,
+	.poll_mask =	unix_dgram_poll_mask,
 	.ioctl =	unix_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	unix_shutdown,
@@ -726,7 +725,7 @@ static const struct proto_ops unix_seqpacket_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	unix_accept,
 	.getname =	unix_getname,
-	.poll =		unix_dgram_poll,
+	.poll_mask =	unix_dgram_poll_mask,
 	.ioctl =	unix_ioctl,
 	.listen =	unix_listen,
 	.shutdown =	unix_shutdown,
@@ -2640,13 +2639,10 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 	return err;
 }
 
-static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wait)
+static __poll_t unix_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err)
@@ -2675,15 +2671,11 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
 	return mask;
 }
 
-static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
-				    poll_table *wait)
+static __poll_t unix_dgram_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk, *other;
-	unsigned int writable;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	int writable;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
@@ -2709,7 +2701,7 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
 	}
 
 	/* No write status requested, avoid expensive OUT tests. */
-	if (!(poll_requested_events(wait) & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
+	if (!(events & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
 		return mask;
 
 	writable = unix_writable(sk);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 19/36] net/unix: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/unix/af_unix.c | 30 +++++++++++-------------------
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2d465bdeccbc..619c6921dd46 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -638,9 +638,8 @@ static int unix_stream_connect(struct socket *, struct sockaddr *,
 static int unix_socketpair(struct socket *, struct socket *);
 static int unix_accept(struct socket *, struct socket *, int, bool);
 static int unix_getname(struct socket *, struct sockaddr *, int *, int);
-static __poll_t unix_poll(struct file *, struct socket *, poll_table *);
-static __poll_t unix_dgram_poll(struct file *, struct socket *,
-				    poll_table *);
+static __poll_t unix_poll_mask(struct socket *, __poll_t);
+static __poll_t unix_dgram_poll_mask(struct socket *, __poll_t);
 static int unix_ioctl(struct socket *, unsigned int, unsigned long);
 static int unix_shutdown(struct socket *, int);
 static int unix_stream_sendmsg(struct socket *, struct msghdr *, size_t);
@@ -681,7 +680,7 @@ static const struct proto_ops unix_stream_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	unix_accept,
 	.getname =	unix_getname,
-	.poll =		unix_poll,
+	.poll_mask =	unix_poll_mask,
 	.ioctl =	unix_ioctl,
 	.listen =	unix_listen,
 	.shutdown =	unix_shutdown,
@@ -704,7 +703,7 @@ static const struct proto_ops unix_dgram_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	unix_getname,
-	.poll =		unix_dgram_poll,
+	.poll_mask =	unix_dgram_poll_mask,
 	.ioctl =	unix_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	unix_shutdown,
@@ -726,7 +725,7 @@ static const struct proto_ops unix_seqpacket_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	unix_accept,
 	.getname =	unix_getname,
-	.poll =		unix_dgram_poll,
+	.poll_mask =	unix_dgram_poll_mask,
 	.ioctl =	unix_ioctl,
 	.listen =	unix_listen,
 	.shutdown =	unix_shutdown,
@@ -2640,13 +2639,10 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 	return err;
 }
 
-static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wait)
+static __poll_t unix_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err)
@@ -2675,15 +2671,11 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
 	return mask;
 }
 
-static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
-				    poll_table *wait)
+static __poll_t unix_dgram_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk, *other;
-	unsigned int writable;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	int writable;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
@@ -2709,7 +2701,7 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
 	}
 
 	/* No write status requested, avoid expensive OUT tests. */
-	if (!(poll_requested_events(wait) & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
+	if (!(events & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
 		return mask;
 
 	writable = unix_writable(sk);
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 20/36] net: convert datagram_poll users tp ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/isdn/mISDN/socket.c        |  2 +-
 drivers/net/ppp/pppoe.c            |  2 +-
 drivers/staging/ipx/af_ipx.c       |  2 +-
 drivers/staging/irda/net/af_irda.c |  6 +++---
 include/linux/skbuff.h             |  3 +--
 include/net/udp.h                  |  2 +-
 net/appletalk/ddp.c                |  2 +-
 net/ax25/af_ax25.c                 |  2 +-
 net/bluetooth/hci_sock.c           |  2 +-
 net/can/bcm.c                      |  2 +-
 net/can/raw.c                      |  2 +-
 net/core/datagram.c                | 13 ++++---------
 net/decnet/af_decnet.c             |  6 +++---
 net/ieee802154/socket.c            |  4 ++--
 net/ipv4/af_inet.c                 |  6 +++---
 net/ipv4/udp.c                     | 10 +++++-----
 net/ipv6/af_inet6.c                |  2 +-
 net/ipv6/raw.c                     |  4 ++--
 net/kcm/kcmsock.c                  |  4 ++--
 net/key/af_key.c                   |  2 +-
 net/l2tp/l2tp_ip.c                 |  2 +-
 net/l2tp/l2tp_ip6.c                |  2 +-
 net/l2tp/l2tp_ppp.c                |  2 +-
 net/llc/af_llc.c                   |  2 +-
 net/netlink/af_netlink.c           |  2 +-
 net/netrom/af_netrom.c             |  2 +-
 net/nfc/rawsock.c                  |  4 ++--
 net/packet/af_packet.c             |  9 ++++-----
 net/phonet/socket.c                |  2 +-
 net/qrtr/qrtr.c                    |  2 +-
 net/rose/af_rose.c                 |  2 +-
 net/x25/af_x25.c                   |  2 +-
 32 files changed, 52 insertions(+), 59 deletions(-)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c84270e16bdd..61d6e4c9e7d1 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -589,7 +589,7 @@ static const struct proto_ops data_sock_ops = {
 	.getname	= data_sock_getname,
 	.sendmsg	= mISDN_sock_sendmsg,
 	.recvmsg	= mISDN_sock_recvmsg,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= data_sock_setsockopt,
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 5aa59f41bf8c..8c311e626884 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -1120,7 +1120,7 @@ static const struct proto_ops pppoe_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= pppoe_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c
index d21a9d128d3e..3373f7f67d35 100644
--- a/drivers/staging/ipx/af_ipx.c
+++ b/drivers/staging/ipx/af_ipx.c
@@ -1967,7 +1967,7 @@ static const struct proto_ops ipx_dgram_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= ipx_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= ipx_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ipx_compat_ioctl,
diff --git a/drivers/staging/irda/net/af_irda.c b/drivers/staging/irda/net/af_irda.c
index 2f1e9ab3d6d0..77659b1c40ba 100644
--- a/drivers/staging/irda/net/af_irda.c
+++ b/drivers/staging/irda/net/af_irda.c
@@ -2600,7 +2600,7 @@ static const struct proto_ops irda_seqpacket_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	irda_accept,
 	.getname =	irda_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	irda_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	irda_compat_ioctl,
@@ -2624,7 +2624,7 @@ static const struct proto_ops irda_dgram_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	irda_accept,
 	.getname =	irda_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	irda_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	irda_compat_ioctl,
@@ -2649,7 +2649,7 @@ static const struct proto_ops irda_ultra_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	irda_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	irda_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	irda_compat_ioctl,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c1e66bdcf583..455f4660c2a2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3246,8 +3246,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
 				    int *peeked, int *off, int *err);
 struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
 				  int *err);
-__poll_t datagram_poll(struct file *file, struct socket *sock,
-			   struct poll_table_struct *wait);
+__poll_t datagram_poll_mask(struct socket *sock, __poll_t events);
 int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
 			   struct iov_iter *to, int size);
 static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
diff --git a/include/net/udp.h b/include/net/udp.h
index 850a8e581cce..03e8907ae57c 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -275,7 +275,7 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 int udp_init_sock(struct sock *sk);
 int __udp_disconnect(struct sock *sk, int flags);
 int udp_disconnect(struct sock *sk, int flags);
-__poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t udp_poll_mask(struct socket *sock, __poll_t events);
 struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 				       netdev_features_t features,
 				       bool is_ipv6);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 03a9fc0771c0..3ea5631fee29 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1870,7 +1870,7 @@ static const struct proto_ops atalk_dgram_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= atalk_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= atalk_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= atalk_compat_ioctl,
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 47fdd399626b..3282dbe7d9eb 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1954,7 +1954,7 @@ static const struct proto_ops ax25_proto_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= ax25_accept,
 	.getname	= ax25_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= ax25_ioctl,
 	.listen		= ax25_listen,
 	.shutdown	= ax25_shutdown,
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 923e9a271872..46a547e4a0c8 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -1975,7 +1975,7 @@ static const struct proto_ops hci_sock_ops = {
 	.sendmsg	= hci_sock_sendmsg,
 	.recvmsg	= hci_sock_recvmsg,
 	.ioctl		= hci_sock_ioctl,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= hci_sock_setsockopt,
diff --git a/net/can/bcm.c b/net/can/bcm.c
index ac5e5e34fee3..30c51e0ce294 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1669,7 +1669,7 @@ static const struct proto_ops bcm_ops = {
 	.socketpair    = sock_no_socketpair,
 	.accept        = sock_no_accept,
 	.getname       = sock_no_getname,
-	.poll          = datagram_poll,
+	.poll_mask     = datagram_poll_mask,
 	.ioctl         = can_ioctl,	/* use can_ioctl() from af_can.c */
 	.listen        = sock_no_listen,
 	.shutdown      = sock_no_shutdown,
diff --git a/net/can/raw.c b/net/can/raw.c
index f2ecc43376a1..d65678554979 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -845,7 +845,7 @@ static const struct proto_ops raw_ops = {
 	.socketpair    = sock_no_socketpair,
 	.accept        = sock_no_accept,
 	.getname       = raw_getname,
-	.poll          = datagram_poll,
+	.poll_mask     = datagram_poll_mask,
 	.ioctl         = can_ioctl,	/* use can_ioctl() from af_can.c */
 	.listen        = sock_no_listen,
 	.shutdown      = sock_no_shutdown,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 9938952c5c78..f19bf3dc2bd6 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -819,9 +819,8 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
 
 /**
  * 	datagram_poll - generic datagram poll
- *	@file: file struct
  *	@sock: socket
- *	@wait: poll table
+ *	@events to wait for
  *
  *	Datagram poll: Again totally generic. This also handles
  *	sequenced packet sockets providing the socket receive queue
@@ -831,14 +830,10 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
  *	and you use a different write policy from sock_writeable()
  *	then please supply your own write_space callback.
  */
-__poll_t datagram_poll(struct file *file, struct socket *sock,
-			   poll_table *wait)
+__poll_t datagram_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
@@ -871,4 +866,4 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
 
 	return mask;
 }
-EXPORT_SYMBOL(datagram_poll);
+EXPORT_SYMBOL(datagram_poll_mask);
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 791aff68af88..7793b3829906 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -1209,11 +1209,11 @@ static int dn_getname(struct socket *sock, struct sockaddr *uaddr,int *uaddr_len
 }
 
 
-static __poll_t dn_poll(struct file *file, struct socket *sock, poll_table  *wait)
+static __poll_t dn_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct dn_scp *scp = DN_SK(sk);
-	__poll_t mask = datagram_poll(file, sock, wait);
+	__poll_t mask = datagram_poll_mask(sock, events);
 
 	if (!skb_queue_empty(&scp->other_receive_queue))
 		mask |= EPOLLRDBAND;
@@ -2346,7 +2346,7 @@ static const struct proto_ops dn_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	dn_accept,
 	.getname =	dn_getname,
-	.poll =		dn_poll,
+	.poll_mask =	dn_poll_mask,
 	.ioctl =	dn_ioctl,
 	.listen =	dn_listen,
 	.shutdown =	dn_shutdown,
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index a60658c85a9a..a0768d2759b8 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -423,7 +423,7 @@ static const struct proto_ops ieee802154_raw_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = sock_no_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = ieee802154_sock_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = sock_no_shutdown,
@@ -969,7 +969,7 @@ static const struct proto_ops ieee802154_dgram_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = sock_no_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = ieee802154_sock_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = sock_no_shutdown,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index ec32cc263b18..4a2bfae06b28 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -985,7 +985,7 @@ const struct proto_ops inet_dgram_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = inet_getname,
-	.poll		   = udp_poll,
+	.poll_mask	   = udp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
@@ -1006,7 +1006,7 @@ EXPORT_SYMBOL(inet_dgram_ops);
 
 /*
  * For SOCK_RAW sockets; should be the same as inet_dgram_ops but without
- * udp_poll
+ * udp_poll_mask
  */
 static const struct proto_ops inet_sockraw_ops = {
 	.family		   = PF_INET,
@@ -1017,7 +1017,7 @@ static const struct proto_ops inet_sockraw_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = inet_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e5ef7c38c934..56427047f7e1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2491,7 +2491,7 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
  * 	udp_poll - wait for a UDP event.
  *	@file - file struct
  *	@sock - socket
- *	@wait - poll table
+ *	@events - events to wait for
  *
  *	This is same as datagram poll, except for the special case of
  *	blocking sockets. If application is using a blocking fd
@@ -2500,23 +2500,23 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
  *	but then block when reading it. Add special case code
  *	to work around these arguably broken applications.
  */
-__poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t udp_poll_mask(struct socket *sock, __poll_t events)
 {
-	__poll_t mask = datagram_poll(file, sock, wait);
+	__poll_t mask = datagram_poll_mask(sock, events);
 	struct sock *sk = sock->sk;
 
 	if (!skb_queue_empty(&udp_sk(sk)->reader_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Check for false positives due to checksum errors */
-	if ((mask & EPOLLRDNORM) && !(file->f_flags & O_NONBLOCK) &&
+	if ((mask & EPOLLRDNORM) && !(sock->file->f_flags & O_NONBLOCK) &&
 	    !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
 		mask &= ~(EPOLLIN | EPOLLRDNORM);
 
 	return mask;
 
 }
-EXPORT_SYMBOL(udp_poll);
+EXPORT_SYMBOL(udp_poll_mask);
 
 int udp_abort(struct sock *sk, int err)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index c470549d6ef9..82c192b92358 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -578,7 +578,7 @@ const struct proto_ops inet6_dgram_ops = {
 	.socketpair	   = sock_no_socketpair,	/* a do nothing	*/
 	.accept		   = sock_no_accept,		/* a do nothing	*/
 	.getname	   = inet6_getname,
-	.poll		   = udp_poll,			/* ok		*/
+	.poll_mask	   = udp_poll_mask,		/* ok		*/
 	.ioctl		   = inet6_ioctl,		/* must change  */
 	.listen		   = sock_no_listen,		/* ok		*/
 	.shutdown	   = inet_shutdown,		/* ok		*/
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 4c25339b1984..fdd9916f1475 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1345,7 +1345,7 @@ void raw6_proc_exit(void)
 }
 #endif	/* CONFIG_PROC_FS */
 
-/* Same as inet6_dgram_ops, sans udp_poll.  */
+/* Same as inet6_dgram_ops, sans udp_poll_mask.  */
 const struct proto_ops inet6_sockraw_ops = {
 	.family		   = PF_INET6,
 	.owner		   = THIS_MODULE,
@@ -1355,7 +1355,7 @@ const struct proto_ops inet6_sockraw_ops = {
 	.socketpair	   = sock_no_socketpair,	/* a do nothing	*/
 	.accept		   = sock_no_accept,		/* a do nothing	*/
 	.getname	   = inet6_getname,
-	.poll		   = datagram_poll,		/* ok		*/
+	.poll_mask	   = datagram_poll_mask,	/* ok		*/
 	.ioctl		   = inet6_ioctl,		/* must change  */
 	.listen		   = sock_no_listen,		/* ok		*/
 	.shutdown	   = inet_shutdown,		/* ok		*/
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index f297d53a11aa..305c8c38c5d9 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1889,7 +1889,7 @@ static const struct proto_ops kcm_dgram_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	sock_no_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	kcm_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
@@ -1910,7 +1910,7 @@ static const struct proto_ops kcm_seqpacket_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	sock_no_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	kcm_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 7e2e7188e7f4..7654607e728b 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3726,7 +3726,7 @@ static const struct proto_ops pfkey_ops = {
 
 	/* Now the operations that really occur. */
 	.release	=	pfkey_release,
-	.poll		=	datagram_poll,
+	.poll_mask	=	datagram_poll_mask,
 	.sendmsg	=	pfkey_sendmsg,
 	.recvmsg	=	pfkey_recvmsg,
 };
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index ff61124fdf59..aa3fced17a75 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -618,7 +618,7 @@ static const struct proto_ops l2tp_ip_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = l2tp_ip_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 192344688c06..8ca5486ce952 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -757,7 +757,7 @@ static const struct proto_ops l2tp_ip6_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = l2tp_ip6_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = inet6_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 59f246d7b290..d6918a56af5e 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1786,7 +1786,7 @@ static const struct proto_ops pppol2tp_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= pppol2tp_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= pppol2tp_setsockopt,
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index c38d16f22d2a..67e5db6157ef 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -1180,7 +1180,7 @@ static const struct proto_ops llc_ui_ops = {
 	.socketpair  = sock_no_socketpair,
 	.accept      = llc_ui_accept,
 	.getname     = llc_ui_getname,
-	.poll	     = datagram_poll,
+	.poll_mask   = datagram_poll_mask,
 	.ioctl       = llc_ui_ioctl,
 	.listen      = llc_ui_listen,
 	.shutdown    = llc_ui_shutdown,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 07e8478068f0..9e4ff69deddb 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2669,7 +2669,7 @@ static const struct proto_ops netlink_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	netlink_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	netlink_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 9ba30c63be3d..22636fca311e 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1366,7 +1366,7 @@ static const struct proto_ops nr_proto_ops = {
 	.socketpair	=	sock_no_socketpair,
 	.accept		=	nr_accept,
 	.getname	=	nr_getname,
-	.poll		=	datagram_poll,
+	.poll_mask	=	datagram_poll_mask,
 	.ioctl		=	nr_ioctl,
 	.listen		=	nr_listen,
 	.shutdown	=	sock_no_shutdown,
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index e2188deb08dc..60c322531c49 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -284,7 +284,7 @@ static const struct proto_ops rawsock_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = sock_no_accept,
 	.getname        = sock_no_getname,
-	.poll           = datagram_poll,
+	.poll_mask      = datagram_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = sock_no_listen,
 	.shutdown       = sock_no_shutdown,
@@ -304,7 +304,7 @@ static const struct proto_ops rawsock_raw_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = sock_no_accept,
 	.getname        = sock_no_getname,
-	.poll           = datagram_poll,
+	.poll_mask      = datagram_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = sock_no_listen,
 	.shutdown       = sock_no_shutdown,
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index e0f3f4aeeb4f..f50ec1244281 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4074,12 +4074,11 @@ static int packet_ioctl(struct socket *sock, unsigned int cmd,
 	return 0;
 }
 
-static __poll_t packet_poll(struct file *file, struct socket *sock,
-				poll_table *wait)
+static __poll_t packet_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct packet_sock *po = pkt_sk(sk);
-	__poll_t mask = datagram_poll(file, sock, wait);
+	__poll_t mask = datagram_poll_mask(sock, events);
 
 	spin_lock_bh(&sk->sk_receive_queue.lock);
 	if (po->rx_ring.pg_vec) {
@@ -4424,7 +4423,7 @@ static const struct proto_ops packet_ops_spkt = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	packet_getname_spkt,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	packet_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
@@ -4445,7 +4444,7 @@ static const struct proto_ops packet_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	packet_getname,
-	.poll =		packet_poll,
+	.poll_mask =	packet_poll_mask,
 	.ioctl =	packet_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index fffcd69f63ff..28d981512f5f 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -449,7 +449,7 @@ const struct proto_ops phonet_dgram_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= pn_socket_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= pn_socket_ioctl,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
index 5fb3929e3d7d..7f273529a0b1 100644
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -1024,7 +1024,7 @@ static const struct proto_ops qrtr_proto_ops = {
 	.recvmsg	= qrtr_recvmsg,
 	.getname	= qrtr_getname,
 	.ioctl		= qrtr_ioctl,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
 	.getsockopt	= sock_no_getsockopt,
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 083bd251406f..f80a5c0804f1 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1483,7 +1483,7 @@ static const struct proto_ops rose_proto_ops = {
 	.socketpair	=	sock_no_socketpair,
 	.accept		=	rose_accept,
 	.getname	=	rose_getname,
-	.poll		=	datagram_poll,
+	.poll_mask	=	datagram_poll_mask,
 	.ioctl		=	rose_ioctl,
 	.listen		=	rose_listen,
 	.shutdown	=	sock_no_shutdown,
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 562cc11131f6..b94b8f3339f3 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1750,7 +1750,7 @@ static const struct proto_ops x25_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	x25_accept,
 	.getname =	x25_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	x25_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl = compat_x25_ioctl,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 20/36] net: convert datagram_poll users tp ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/isdn/mISDN/socket.c        |  2 +-
 drivers/net/ppp/pppoe.c            |  2 +-
 drivers/staging/ipx/af_ipx.c       |  2 +-
 drivers/staging/irda/net/af_irda.c |  6 +++---
 include/linux/skbuff.h             |  3 +--
 include/net/udp.h                  |  2 +-
 net/appletalk/ddp.c                |  2 +-
 net/ax25/af_ax25.c                 |  2 +-
 net/bluetooth/hci_sock.c           |  2 +-
 net/can/bcm.c                      |  2 +-
 net/can/raw.c                      |  2 +-
 net/core/datagram.c                | 13 ++++---------
 net/decnet/af_decnet.c             |  6 +++---
 net/ieee802154/socket.c            |  4 ++--
 net/ipv4/af_inet.c                 |  6 +++---
 net/ipv4/udp.c                     | 10 +++++-----
 net/ipv6/af_inet6.c                |  2 +-
 net/ipv6/raw.c                     |  4 ++--
 net/kcm/kcmsock.c                  |  4 ++--
 net/key/af_key.c                   |  2 +-
 net/l2tp/l2tp_ip.c                 |  2 +-
 net/l2tp/l2tp_ip6.c                |  2 +-
 net/l2tp/l2tp_ppp.c                |  2 +-
 net/llc/af_llc.c                   |  2 +-
 net/netlink/af_netlink.c           |  2 +-
 net/netrom/af_netrom.c             |  2 +-
 net/nfc/rawsock.c                  |  4 ++--
 net/packet/af_packet.c             |  9 ++++-----
 net/phonet/socket.c                |  2 +-
 net/qrtr/qrtr.c                    |  2 +-
 net/rose/af_rose.c                 |  2 +-
 net/x25/af_x25.c                   |  2 +-
 32 files changed, 52 insertions(+), 59 deletions(-)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c84270e16bdd..61d6e4c9e7d1 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -589,7 +589,7 @@ static const struct proto_ops data_sock_ops = {
 	.getname	= data_sock_getname,
 	.sendmsg	= mISDN_sock_sendmsg,
 	.recvmsg	= mISDN_sock_recvmsg,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= data_sock_setsockopt,
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 5aa59f41bf8c..8c311e626884 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -1120,7 +1120,7 @@ static const struct proto_ops pppoe_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= pppoe_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c
index d21a9d128d3e..3373f7f67d35 100644
--- a/drivers/staging/ipx/af_ipx.c
+++ b/drivers/staging/ipx/af_ipx.c
@@ -1967,7 +1967,7 @@ static const struct proto_ops ipx_dgram_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= ipx_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= ipx_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ipx_compat_ioctl,
diff --git a/drivers/staging/irda/net/af_irda.c b/drivers/staging/irda/net/af_irda.c
index 2f1e9ab3d6d0..77659b1c40ba 100644
--- a/drivers/staging/irda/net/af_irda.c
+++ b/drivers/staging/irda/net/af_irda.c
@@ -2600,7 +2600,7 @@ static const struct proto_ops irda_seqpacket_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	irda_accept,
 	.getname =	irda_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	irda_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	irda_compat_ioctl,
@@ -2624,7 +2624,7 @@ static const struct proto_ops irda_dgram_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	irda_accept,
 	.getname =	irda_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	irda_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	irda_compat_ioctl,
@@ -2649,7 +2649,7 @@ static const struct proto_ops irda_ultra_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	irda_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	irda_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	irda_compat_ioctl,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c1e66bdcf583..455f4660c2a2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3246,8 +3246,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
 				    int *peeked, int *off, int *err);
 struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
 				  int *err);
-__poll_t datagram_poll(struct file *file, struct socket *sock,
-			   struct poll_table_struct *wait);
+__poll_t datagram_poll_mask(struct socket *sock, __poll_t events);
 int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
 			   struct iov_iter *to, int size);
 static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
diff --git a/include/net/udp.h b/include/net/udp.h
index 850a8e581cce..03e8907ae57c 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -275,7 +275,7 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 int udp_init_sock(struct sock *sk);
 int __udp_disconnect(struct sock *sk, int flags);
 int udp_disconnect(struct sock *sk, int flags);
-__poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t udp_poll_mask(struct socket *sock, __poll_t events);
 struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 				       netdev_features_t features,
 				       bool is_ipv6);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 03a9fc0771c0..3ea5631fee29 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1870,7 +1870,7 @@ static const struct proto_ops atalk_dgram_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= atalk_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= atalk_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= atalk_compat_ioctl,
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 47fdd399626b..3282dbe7d9eb 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1954,7 +1954,7 @@ static const struct proto_ops ax25_proto_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= ax25_accept,
 	.getname	= ax25_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= ax25_ioctl,
 	.listen		= ax25_listen,
 	.shutdown	= ax25_shutdown,
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 923e9a271872..46a547e4a0c8 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -1975,7 +1975,7 @@ static const struct proto_ops hci_sock_ops = {
 	.sendmsg	= hci_sock_sendmsg,
 	.recvmsg	= hci_sock_recvmsg,
 	.ioctl		= hci_sock_ioctl,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= hci_sock_setsockopt,
diff --git a/net/can/bcm.c b/net/can/bcm.c
index ac5e5e34fee3..30c51e0ce294 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1669,7 +1669,7 @@ static const struct proto_ops bcm_ops = {
 	.socketpair    = sock_no_socketpair,
 	.accept        = sock_no_accept,
 	.getname       = sock_no_getname,
-	.poll          = datagram_poll,
+	.poll_mask     = datagram_poll_mask,
 	.ioctl         = can_ioctl,	/* use can_ioctl() from af_can.c */
 	.listen        = sock_no_listen,
 	.shutdown      = sock_no_shutdown,
diff --git a/net/can/raw.c b/net/can/raw.c
index f2ecc43376a1..d65678554979 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -845,7 +845,7 @@ static const struct proto_ops raw_ops = {
 	.socketpair    = sock_no_socketpair,
 	.accept        = sock_no_accept,
 	.getname       = raw_getname,
-	.poll          = datagram_poll,
+	.poll_mask     = datagram_poll_mask,
 	.ioctl         = can_ioctl,	/* use can_ioctl() from af_can.c */
 	.listen        = sock_no_listen,
 	.shutdown      = sock_no_shutdown,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 9938952c5c78..f19bf3dc2bd6 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -819,9 +819,8 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
 
 /**
  * 	datagram_poll - generic datagram poll
- *	@file: file struct
  *	@sock: socket
- *	@wait: poll table
+ *	@events to wait for
  *
  *	Datagram poll: Again totally generic. This also handles
  *	sequenced packet sockets providing the socket receive queue
@@ -831,14 +830,10 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
  *	and you use a different write policy from sock_writeable()
  *	then please supply your own write_space callback.
  */
-__poll_t datagram_poll(struct file *file, struct socket *sock,
-			   poll_table *wait)
+__poll_t datagram_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
@@ -871,4 +866,4 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
 
 	return mask;
 }
-EXPORT_SYMBOL(datagram_poll);
+EXPORT_SYMBOL(datagram_poll_mask);
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 791aff68af88..7793b3829906 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -1209,11 +1209,11 @@ static int dn_getname(struct socket *sock, struct sockaddr *uaddr,int *uaddr_len
 }
 
 
-static __poll_t dn_poll(struct file *file, struct socket *sock, poll_table  *wait)
+static __poll_t dn_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct dn_scp *scp = DN_SK(sk);
-	__poll_t mask = datagram_poll(file, sock, wait);
+	__poll_t mask = datagram_poll_mask(sock, events);
 
 	if (!skb_queue_empty(&scp->other_receive_queue))
 		mask |= EPOLLRDBAND;
@@ -2346,7 +2346,7 @@ static const struct proto_ops dn_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	dn_accept,
 	.getname =	dn_getname,
-	.poll =		dn_poll,
+	.poll_mask =	dn_poll_mask,
 	.ioctl =	dn_ioctl,
 	.listen =	dn_listen,
 	.shutdown =	dn_shutdown,
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index a60658c85a9a..a0768d2759b8 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -423,7 +423,7 @@ static const struct proto_ops ieee802154_raw_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = sock_no_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = ieee802154_sock_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = sock_no_shutdown,
@@ -969,7 +969,7 @@ static const struct proto_ops ieee802154_dgram_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = sock_no_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = ieee802154_sock_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = sock_no_shutdown,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index ec32cc263b18..4a2bfae06b28 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -985,7 +985,7 @@ const struct proto_ops inet_dgram_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = inet_getname,
-	.poll		   = udp_poll,
+	.poll_mask	   = udp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
@@ -1006,7 +1006,7 @@ EXPORT_SYMBOL(inet_dgram_ops);
 
 /*
  * For SOCK_RAW sockets; should be the same as inet_dgram_ops but without
- * udp_poll
+ * udp_poll_mask
  */
 static const struct proto_ops inet_sockraw_ops = {
 	.family		   = PF_INET,
@@ -1017,7 +1017,7 @@ static const struct proto_ops inet_sockraw_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = inet_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e5ef7c38c934..56427047f7e1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2491,7 +2491,7 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
  * 	udp_poll - wait for a UDP event.
  *	@file - file struct
  *	@sock - socket
- *	@wait - poll table
+ *	@events - events to wait for
  *
  *	This is same as datagram poll, except for the special case of
  *	blocking sockets. If application is using a blocking fd
@@ -2500,23 +2500,23 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
  *	but then block when reading it. Add special case code
  *	to work around these arguably broken applications.
  */
-__poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t udp_poll_mask(struct socket *sock, __poll_t events)
 {
-	__poll_t mask = datagram_poll(file, sock, wait);
+	__poll_t mask = datagram_poll_mask(sock, events);
 	struct sock *sk = sock->sk;
 
 	if (!skb_queue_empty(&udp_sk(sk)->reader_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Check for false positives due to checksum errors */
-	if ((mask & EPOLLRDNORM) && !(file->f_flags & O_NONBLOCK) &&
+	if ((mask & EPOLLRDNORM) && !(sock->file->f_flags & O_NONBLOCK) &&
 	    !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
 		mask &= ~(EPOLLIN | EPOLLRDNORM);
 
 	return mask;
 
 }
-EXPORT_SYMBOL(udp_poll);
+EXPORT_SYMBOL(udp_poll_mask);
 
 int udp_abort(struct sock *sk, int err)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index c470549d6ef9..82c192b92358 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -578,7 +578,7 @@ const struct proto_ops inet6_dgram_ops = {
 	.socketpair	   = sock_no_socketpair,	/* a do nothing	*/
 	.accept		   = sock_no_accept,		/* a do nothing	*/
 	.getname	   = inet6_getname,
-	.poll		   = udp_poll,			/* ok		*/
+	.poll_mask	   = udp_poll_mask,		/* ok		*/
 	.ioctl		   = inet6_ioctl,		/* must change  */
 	.listen		   = sock_no_listen,		/* ok		*/
 	.shutdown	   = inet_shutdown,		/* ok		*/
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 4c25339b1984..fdd9916f1475 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1345,7 +1345,7 @@ void raw6_proc_exit(void)
 }
 #endif	/* CONFIG_PROC_FS */
 
-/* Same as inet6_dgram_ops, sans udp_poll.  */
+/* Same as inet6_dgram_ops, sans udp_poll_mask.  */
 const struct proto_ops inet6_sockraw_ops = {
 	.family		   = PF_INET6,
 	.owner		   = THIS_MODULE,
@@ -1355,7 +1355,7 @@ const struct proto_ops inet6_sockraw_ops = {
 	.socketpair	   = sock_no_socketpair,	/* a do nothing	*/
 	.accept		   = sock_no_accept,		/* a do nothing	*/
 	.getname	   = inet6_getname,
-	.poll		   = datagram_poll,		/* ok		*/
+	.poll_mask	   = datagram_poll_mask,	/* ok		*/
 	.ioctl		   = inet6_ioctl,		/* must change  */
 	.listen		   = sock_no_listen,		/* ok		*/
 	.shutdown	   = inet_shutdown,		/* ok		*/
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index f297d53a11aa..305c8c38c5d9 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1889,7 +1889,7 @@ static const struct proto_ops kcm_dgram_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	sock_no_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	kcm_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
@@ -1910,7 +1910,7 @@ static const struct proto_ops kcm_seqpacket_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	sock_no_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	kcm_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 7e2e7188e7f4..7654607e728b 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3726,7 +3726,7 @@ static const struct proto_ops pfkey_ops = {
 
 	/* Now the operations that really occur. */
 	.release	=	pfkey_release,
-	.poll		=	datagram_poll,
+	.poll_mask	=	datagram_poll_mask,
 	.sendmsg	=	pfkey_sendmsg,
 	.recvmsg	=	pfkey_recvmsg,
 };
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index ff61124fdf59..aa3fced17a75 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -618,7 +618,7 @@ static const struct proto_ops l2tp_ip_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = l2tp_ip_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 192344688c06..8ca5486ce952 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -757,7 +757,7 @@ static const struct proto_ops l2tp_ip6_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = l2tp_ip6_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = inet6_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 59f246d7b290..d6918a56af5e 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1786,7 +1786,7 @@ static const struct proto_ops pppol2tp_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= pppol2tp_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= pppol2tp_setsockopt,
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index c38d16f22d2a..67e5db6157ef 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -1180,7 +1180,7 @@ static const struct proto_ops llc_ui_ops = {
 	.socketpair  = sock_no_socketpair,
 	.accept      = llc_ui_accept,
 	.getname     = llc_ui_getname,
-	.poll	     = datagram_poll,
+	.poll_mask   = datagram_poll_mask,
 	.ioctl       = llc_ui_ioctl,
 	.listen      = llc_ui_listen,
 	.shutdown    = llc_ui_shutdown,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 07e8478068f0..9e4ff69deddb 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2669,7 +2669,7 @@ static const struct proto_ops netlink_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	netlink_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	netlink_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 9ba30c63be3d..22636fca311e 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1366,7 +1366,7 @@ static const struct proto_ops nr_proto_ops = {
 	.socketpair	=	sock_no_socketpair,
 	.accept		=	nr_accept,
 	.getname	=	nr_getname,
-	.poll		=	datagram_poll,
+	.poll_mask	=	datagram_poll_mask,
 	.ioctl		=	nr_ioctl,
 	.listen		=	nr_listen,
 	.shutdown	=	sock_no_shutdown,
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index e2188deb08dc..60c322531c49 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -284,7 +284,7 @@ static const struct proto_ops rawsock_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = sock_no_accept,
 	.getname        = sock_no_getname,
-	.poll           = datagram_poll,
+	.poll_mask      = datagram_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = sock_no_listen,
 	.shutdown       = sock_no_shutdown,
@@ -304,7 +304,7 @@ static const struct proto_ops rawsock_raw_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = sock_no_accept,
 	.getname        = sock_no_getname,
-	.poll           = datagram_poll,
+	.poll_mask      = datagram_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = sock_no_listen,
 	.shutdown       = sock_no_shutdown,
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index e0f3f4aeeb4f..f50ec1244281 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4074,12 +4074,11 @@ static int packet_ioctl(struct socket *sock, unsigned int cmd,
 	return 0;
 }
 
-static __poll_t packet_poll(struct file *file, struct socket *sock,
-				poll_table *wait)
+static __poll_t packet_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct packet_sock *po = pkt_sk(sk);
-	__poll_t mask = datagram_poll(file, sock, wait);
+	__poll_t mask = datagram_poll_mask(sock, events);
 
 	spin_lock_bh(&sk->sk_receive_queue.lock);
 	if (po->rx_ring.pg_vec) {
@@ -4424,7 +4423,7 @@ static const struct proto_ops packet_ops_spkt = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	packet_getname_spkt,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	packet_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
@@ -4445,7 +4444,7 @@ static const struct proto_ops packet_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	packet_getname,
-	.poll =		packet_poll,
+	.poll_mask =	packet_poll_mask,
 	.ioctl =	packet_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index fffcd69f63ff..28d981512f5f 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -449,7 +449,7 @@ const struct proto_ops phonet_dgram_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= pn_socket_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= pn_socket_ioctl,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
index 5fb3929e3d7d..7f273529a0b1 100644
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -1024,7 +1024,7 @@ static const struct proto_ops qrtr_proto_ops = {
 	.recvmsg	= qrtr_recvmsg,
 	.getname	= qrtr_getname,
 	.ioctl		= qrtr_ioctl,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
 	.getsockopt	= sock_no_getsockopt,
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 083bd251406f..f80a5c0804f1 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1483,7 +1483,7 @@ static const struct proto_ops rose_proto_ops = {
 	.socketpair	=	sock_no_socketpair,
 	.accept		=	rose_accept,
 	.getname	=	rose_getname,
-	.poll		=	datagram_poll,
+	.poll_mask	=	datagram_poll_mask,
 	.ioctl		=	rose_ioctl,
 	.listen		=	rose_listen,
 	.shutdown	=	sock_no_shutdown,
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 562cc11131f6..b94b8f3339f3 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1750,7 +1750,7 @@ static const struct proto_ops x25_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	x25_accept,
 	.getname =	x25_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	x25_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl = compat_x25_ioctl,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 21/36] net/dccp: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/dccp/dccp.h  |  3 +--
 net/dccp/ipv4.c  |  2 +-
 net/dccp/ipv6.c  |  2 +-
 net/dccp/proto.c | 13 ++-----------
 4 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index f91e3816806b..0ea2ee56ac1b 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -316,8 +316,7 @@ int dccp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		 int flags, int *addr_len);
 void dccp_shutdown(struct sock *sk, int how);
 int inet_dccp_listen(struct socket *sock, int backlog);
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-		       poll_table *wait);
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events);
 int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 void dccp_req_err(struct sock *sk, u64 seq);
 
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index e65fcb45c3f6..e8476f319efd 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -983,7 +983,7 @@ static const struct proto_ops inet_dccp_ops = {
 	.accept		   = inet_accept,
 	.getname	   = inet_getname,
 	/* FIXME: work on tcp_poll to rename it to inet_csk_poll */
-	.poll		   = dccp_poll,
+	.poll_mask	   = dccp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	/* FIXME: work on inet_listen to rename it to sock_common_listen */
 	.listen		   = inet_dccp_listen,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 5df7857fc0f3..f0aac8e4b888 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1069,7 +1069,7 @@ static const struct proto_ops inet6_dccp_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = inet6_getname,
-	.poll		   = dccp_poll,
+	.poll_mask	   = dccp_poll_mask,
 	.ioctl		   = inet6_ioctl,
 	.listen		   = inet_dccp_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 15bdc002d90c..26816032a7c2 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -314,20 +314,11 @@ int dccp_disconnect(struct sock *sk, int flags)
 
 EXPORT_SYMBOL_GPL(dccp_disconnect);
 
-/*
- *	Wait for a DCCP event.
- *
- *	Note that we don't need to lock the socket, as the upper poll layers
- *	take care of normal races (between the test and the event) and we don't
- *	go look at any of the socket buffers directly.
- */
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-		       poll_table *wait)
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events)
 {
 	__poll_t mask;
 	struct sock *sk = sock->sk;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
 	if (sk->sk_state == DCCP_LISTEN)
 		return inet_csk_listen_poll(sk);
 
@@ -369,7 +360,7 @@ __poll_t dccp_poll(struct file *file, struct socket *sock,
 	return mask;
 }
 
-EXPORT_SYMBOL_GPL(dccp_poll);
+EXPORT_SYMBOL_GPL(dccp_poll_mask);
 
 int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 21/36] net/dccp: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/dccp/dccp.h  |  3 +--
 net/dccp/ipv4.c  |  2 +-
 net/dccp/ipv6.c  |  2 +-
 net/dccp/proto.c | 13 ++-----------
 4 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index f91e3816806b..0ea2ee56ac1b 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -316,8 +316,7 @@ int dccp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		 int flags, int *addr_len);
 void dccp_shutdown(struct sock *sk, int how);
 int inet_dccp_listen(struct socket *sock, int backlog);
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-		       poll_table *wait);
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events);
 int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 void dccp_req_err(struct sock *sk, u64 seq);
 
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index e65fcb45c3f6..e8476f319efd 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -983,7 +983,7 @@ static const struct proto_ops inet_dccp_ops = {
 	.accept		   = inet_accept,
 	.getname	   = inet_getname,
 	/* FIXME: work on tcp_poll to rename it to inet_csk_poll */
-	.poll		   = dccp_poll,
+	.poll_mask	   = dccp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	/* FIXME: work on inet_listen to rename it to sock_common_listen */
 	.listen		   = inet_dccp_listen,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 5df7857fc0f3..f0aac8e4b888 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1069,7 +1069,7 @@ static const struct proto_ops inet6_dccp_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = inet6_getname,
-	.poll		   = dccp_poll,
+	.poll_mask	   = dccp_poll_mask,
 	.ioctl		   = inet6_ioctl,
 	.listen		   = inet_dccp_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 15bdc002d90c..26816032a7c2 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -314,20 +314,11 @@ int dccp_disconnect(struct sock *sk, int flags)
 
 EXPORT_SYMBOL_GPL(dccp_disconnect);
 
-/*
- *	Wait for a DCCP event.
- *
- *	Note that we don't need to lock the socket, as the upper poll layers
- *	take care of normal races (between the test and the event) and we don't
- *	go look at any of the socket buffers directly.
- */
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-		       poll_table *wait)
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events)
 {
 	__poll_t mask;
 	struct sock *sk = sock->sk;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
 	if (sk->sk_state == DCCP_LISTEN)
 		return inet_csk_listen_poll(sk);
 
@@ -369,7 +360,7 @@ __poll_t dccp_poll(struct file *file, struct socket *sock,
 	return mask;
 }
 
-EXPORT_SYMBOL_GPL(dccp_poll);
+EXPORT_SYMBOL_GPL(dccp_poll_mask);
 
 int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 22/36] net/atm: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/atm/common.c | 11 +++--------
 net/atm/common.h |  2 +-
 net/atm/pvc.c    |  2 +-
 net/atm/svc.c    |  2 +-
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/atm/common.c b/net/atm/common.c
index fc78a0508ae1..1f2af59935db 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -648,16 +648,11 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t size)
 	return error;
 }
 
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	struct atm_vcc *vcc;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
-
-	vcc = ATM_SD(sock);
+	struct atm_vcc *vcc = ATM_SD(sock);
+	__poll_t mask = 0;
 
 	/* exceptional events */
 	if (sk->sk_err)
diff --git a/net/atm/common.h b/net/atm/common.h
index 5850649068bb..526796ad230f 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -17,7 +17,7 @@ int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
 int vcc_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		int flags);
 int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len);
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events);
 int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_setsockopt(struct socket *sock, int level, int optname,
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index e1140b3bdcaa..930651c5e77c 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -114,7 +114,7 @@ static const struct proto_ops pvc_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	pvc_getname,
-	.poll =		vcc_poll,
+	.poll_mask =	vcc_poll_mask,
 	.ioctl =	vcc_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl = vcc_compat_ioctl,
diff --git a/net/atm/svc.c b/net/atm/svc.c
index c458adcbc177..ad0e6ffb9cfe 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -637,7 +637,7 @@ static const struct proto_ops svc_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	svc_accept,
 	.getname =	svc_getname,
-	.poll =		vcc_poll,
+	.poll_mask =	vcc_poll_mask,
 	.ioctl =	svc_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	svc_compat_ioctl,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 22/36] net/atm: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/atm/common.c | 11 +++--------
 net/atm/common.h |  2 +-
 net/atm/pvc.c    |  2 +-
 net/atm/svc.c    |  2 +-
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/atm/common.c b/net/atm/common.c
index fc78a0508ae1..1f2af59935db 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -648,16 +648,11 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t size)
 	return error;
 }
 
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	struct atm_vcc *vcc;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
-
-	vcc = ATM_SD(sock);
+	struct atm_vcc *vcc = ATM_SD(sock);
+	__poll_t mask = 0;
 
 	/* exceptional events */
 	if (sk->sk_err)
diff --git a/net/atm/common.h b/net/atm/common.h
index 5850649068bb..526796ad230f 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -17,7 +17,7 @@ int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
 int vcc_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		int flags);
 int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len);
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events);
 int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_setsockopt(struct socket *sock, int level, int optname,
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index e1140b3bdcaa..930651c5e77c 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -114,7 +114,7 @@ static const struct proto_ops pvc_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	pvc_getname,
-	.poll =		vcc_poll,
+	.poll_mask =	vcc_poll_mask,
 	.ioctl =	vcc_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl = vcc_compat_ioctl,
diff --git a/net/atm/svc.c b/net/atm/svc.c
index c458adcbc177..ad0e6ffb9cfe 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -637,7 +637,7 @@ static const struct proto_ops svc_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	svc_accept,
 	.getname =	svc_getname,
-	.poll =		vcc_poll,
+	.poll_mask =	vcc_poll_mask,
 	.ioctl =	svc_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	svc_compat_ioctl,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 23/36] net/vmw_vsock: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/vmw_vsock/af_vsock.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index e0fc84daed94..b9210329bda8 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -850,18 +850,11 @@ static int vsock_shutdown(struct socket *sock, int mode)
 	return err;
 }
 
-static __poll_t vsock_poll(struct file *file, struct socket *sock,
-			       poll_table *wait)
+static __poll_t vsock_poll_mask(struct socket *sock, __poll_t events)
 {
-	struct sock *sk;
-	__poll_t mask;
-	struct vsock_sock *vsk;
-
-	sk = sock->sk;
-	vsk = vsock_sk(sk);
-
-	poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	struct sock *sk = sock->sk;
+	struct vsock_sock *vsk = vsock_sk(sk);
+	__poll_t mask = 0;
 
 	if (sk->sk_err)
 		/* Signify that there has been an error on this socket. */
@@ -1091,7 +1084,7 @@ static const struct proto_ops vsock_dgram_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = vsock_getname,
-	.poll = vsock_poll,
+	.poll_mask = vsock_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = sock_no_listen,
 	.shutdown = vsock_shutdown,
@@ -1849,7 +1842,7 @@ static const struct proto_ops vsock_stream_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = vsock_accept,
 	.getname = vsock_getname,
-	.poll = vsock_poll,
+	.poll_mask = vsock_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = vsock_listen,
 	.shutdown = vsock_shutdown,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 23/36] net/vmw_vsock: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/vmw_vsock/af_vsock.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index e0fc84daed94..b9210329bda8 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -850,18 +850,11 @@ static int vsock_shutdown(struct socket *sock, int mode)
 	return err;
 }
 
-static __poll_t vsock_poll(struct file *file, struct socket *sock,
-			       poll_table *wait)
+static __poll_t vsock_poll_mask(struct socket *sock, __poll_t events)
 {
-	struct sock *sk;
-	__poll_t mask;
-	struct vsock_sock *vsk;
-
-	sk = sock->sk;
-	vsk = vsock_sk(sk);
-
-	poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	struct sock *sk = sock->sk;
+	struct vsock_sock *vsk = vsock_sk(sk);
+	__poll_t mask = 0;
 
 	if (sk->sk_err)
 		/* Signify that there has been an error on this socket. */
@@ -1091,7 +1084,7 @@ static const struct proto_ops vsock_dgram_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = vsock_getname,
-	.poll = vsock_poll,
+	.poll_mask = vsock_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = sock_no_listen,
 	.shutdown = vsock_shutdown,
@@ -1849,7 +1842,7 @@ static const struct proto_ops vsock_stream_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = vsock_accept,
 	.getname = vsock_getname,
-	.poll = vsock_poll,
+	.poll_mask = vsock_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = vsock_listen,
 	.shutdown = vsock_shutdown,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 24/36] net/tipc: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/tipc/socket.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index b0323ec7971e..1ea1666e8e95 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -694,10 +694,9 @@ static int tipc_getname(struct socket *sock, struct sockaddr *uaddr,
 }
 
 /**
- * tipc_poll - read and possibly block on pollmask
+ * tipc_poll - read pollmask
  * @file: file structure associated with the socket
  * @sock: socket for which to calculate the poll bits
- * @wait: ???
  *
  * Returns pollmask value
  *
@@ -711,15 +710,12 @@ static int tipc_getname(struct socket *sock, struct sockaddr *uaddr,
  * imply that the operation will succeed, merely that it should be performed
  * and will not block.
  */
-static __poll_t tipc_poll(struct file *file, struct socket *sock,
-			      poll_table *wait)
+static __poll_t tipc_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct tipc_sock *tsk = tipc_sk(sk);
 	__poll_t revents = 0;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_shutdown & RCV_SHUTDOWN)
 		revents |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
 	if (sk->sk_shutdown == SHUTDOWN_MASK)
@@ -3019,7 +3015,7 @@ static const struct proto_ops msg_ops = {
 	.socketpair	= tipc_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= tipc_getname,
-	.poll		= tipc_poll,
+	.poll_mask	= tipc_poll_mask,
 	.ioctl		= tipc_ioctl,
 	.listen		= sock_no_listen,
 	.shutdown	= tipc_shutdown,
@@ -3040,7 +3036,7 @@ static const struct proto_ops packet_ops = {
 	.socketpair	= tipc_socketpair,
 	.accept		= tipc_accept,
 	.getname	= tipc_getname,
-	.poll		= tipc_poll,
+	.poll_mask	= tipc_poll_mask,
 	.ioctl		= tipc_ioctl,
 	.listen		= tipc_listen,
 	.shutdown	= tipc_shutdown,
@@ -3061,7 +3057,7 @@ static const struct proto_ops stream_ops = {
 	.socketpair	= tipc_socketpair,
 	.accept		= tipc_accept,
 	.getname	= tipc_getname,
-	.poll		= tipc_poll,
+	.poll_mask	= tipc_poll_mask,
 	.ioctl		= tipc_ioctl,
 	.listen		= tipc_listen,
 	.shutdown	= tipc_shutdown,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 24/36] net/tipc: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/tipc/socket.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index b0323ec7971e..1ea1666e8e95 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -694,10 +694,9 @@ static int tipc_getname(struct socket *sock, struct sockaddr *uaddr,
 }
 
 /**
- * tipc_poll - read and possibly block on pollmask
+ * tipc_poll - read pollmask
  * @file: file structure associated with the socket
  * @sock: socket for which to calculate the poll bits
- * @wait: ???
  *
  * Returns pollmask value
  *
@@ -711,15 +710,12 @@ static int tipc_getname(struct socket *sock, struct sockaddr *uaddr,
  * imply that the operation will succeed, merely that it should be performed
  * and will not block.
  */
-static __poll_t tipc_poll(struct file *file, struct socket *sock,
-			      poll_table *wait)
+static __poll_t tipc_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct tipc_sock *tsk = tipc_sk(sk);
 	__poll_t revents = 0;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_shutdown & RCV_SHUTDOWN)
 		revents |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
 	if (sk->sk_shutdown == SHUTDOWN_MASK)
@@ -3019,7 +3015,7 @@ static const struct proto_ops msg_ops = {
 	.socketpair	= tipc_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= tipc_getname,
-	.poll		= tipc_poll,
+	.poll_mask	= tipc_poll_mask,
 	.ioctl		= tipc_ioctl,
 	.listen		= sock_no_listen,
 	.shutdown	= tipc_shutdown,
@@ -3040,7 +3036,7 @@ static const struct proto_ops packet_ops = {
 	.socketpair	= tipc_socketpair,
 	.accept		= tipc_accept,
 	.getname	= tipc_getname,
-	.poll		= tipc_poll,
+	.poll_mask	= tipc_poll_mask,
 	.ioctl		= tipc_ioctl,
 	.listen		= tipc_listen,
 	.shutdown	= tipc_shutdown,
@@ -3061,7 +3057,7 @@ static const struct proto_ops stream_ops = {
 	.socketpair	= tipc_socketpair,
 	.accept		= tipc_accept,
 	.getname	= tipc_getname,
-	.poll		= tipc_poll,
+	.poll_mask	= tipc_poll_mask,
 	.ioctl		= tipc_ioctl,
 	.listen		= tipc_listen,
 	.shutdown	= tipc_shutdown,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 25/36] net/sctp: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/sctp/sctp.h | 3 +--
 net/sctp/ipv6.c         | 2 +-
 net/sctp/protocol.c     | 2 +-
 net/sctp/socket.c       | 4 +---
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index f7ae6b0a21d0..37abd5ba4a3f 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -107,8 +107,7 @@ int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb);
 int sctp_inet_listen(struct socket *sock, int backlog);
 void sctp_write_space(struct sock *sk);
 void sctp_data_ready(struct sock *sk);
-__poll_t sctp_poll(struct file *file, struct socket *sock,
-		poll_table *wait);
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events);
 void sctp_sock_rfree(struct sk_buff *skb);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 		    struct sctp_association *asoc);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index e35d4f73d2df..6b0b8fc5b75a 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -976,7 +976,7 @@ static const struct proto_ops inet6_seqpacket_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = sctp_getname,
-	.poll		   = sctp_poll,
+	.poll_mask	   = sctp_poll_mask,
 	.ioctl		   = inet6_ioctl,
 	.listen		   = sctp_inet_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 91813e686c67..20c544890e80 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1024,7 +1024,7 @@ static const struct proto_ops inet_seqpacket_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = inet_getname,	/* Semantics are different.  */
-	.poll		   = sctp_poll,
+	.poll_mask	   = sctp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sctp_inet_listen,
 	.shutdown	   = inet_shutdown,	/* Looks harmless.  */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index bf271f8c2dc9..097454740929 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7587,14 +7587,12 @@ int sctp_inet_listen(struct socket *sock, int backlog)
  * here, again, by modeling the current TCP/UDP code.  We don't have
  * a good way to test with it yet.
  */
-__poll_t sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct sctp_sock *sp = sctp_sk(sk);
 	__poll_t mask;
 
-	poll_wait(file, sk_sleep(sk), wait);
-
 	sock_rps_record_flow(sk);
 
 	/* A TCP-style listening socket becomes readable when the accept queue
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 25/36] net/sctp: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/sctp/sctp.h | 3 +--
 net/sctp/ipv6.c         | 2 +-
 net/sctp/protocol.c     | 2 +-
 net/sctp/socket.c       | 4 +---
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index f7ae6b0a21d0..37abd5ba4a3f 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -107,8 +107,7 @@ int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb);
 int sctp_inet_listen(struct socket *sock, int backlog);
 void sctp_write_space(struct sock *sk);
 void sctp_data_ready(struct sock *sk);
-__poll_t sctp_poll(struct file *file, struct socket *sock,
-		poll_table *wait);
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events);
 void sctp_sock_rfree(struct sk_buff *skb);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 		    struct sctp_association *asoc);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index e35d4f73d2df..6b0b8fc5b75a 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -976,7 +976,7 @@ static const struct proto_ops inet6_seqpacket_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = sctp_getname,
-	.poll		   = sctp_poll,
+	.poll_mask	   = sctp_poll_mask,
 	.ioctl		   = inet6_ioctl,
 	.listen		   = sctp_inet_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 91813e686c67..20c544890e80 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1024,7 +1024,7 @@ static const struct proto_ops inet_seqpacket_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = inet_getname,	/* Semantics are different.  */
-	.poll		   = sctp_poll,
+	.poll_mask	   = sctp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sctp_inet_listen,
 	.shutdown	   = inet_shutdown,	/* Looks harmless.  */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index bf271f8c2dc9..097454740929 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7587,14 +7587,12 @@ int sctp_inet_listen(struct socket *sock, int backlog)
  * here, again, by modeling the current TCP/UDP code.  We don't have
  * a good way to test with it yet.
  */
-__poll_t sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct sctp_sock *sp = sctp_sk(sk);
 	__poll_t mask;
 
-	poll_wait(file, sk_sleep(sk), wait);
-
 	sock_rps_record_flow(sk);
 
 	/* A TCP-style listening socket becomes readable when the accept queue
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 26/36] net/bluetooth: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/bluetooth/bluetooth.h | 2 +-
 net/bluetooth/af_bluetooth.c      | 7 ++-----
 net/bluetooth/l2cap_sock.c        | 2 +-
 net/bluetooth/rfcomm/sock.c       | 2 +-
 net/bluetooth/sco.c               | 2 +-
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index ec9d6bc65855..53ce8176c313 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -271,7 +271,7 @@ int  bt_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 		     int flags);
 int  bt_sock_stream_recvmsg(struct socket *sock, struct msghdr *msg,
 			    size_t len, int flags);
-__poll_t bt_sock_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events);
 int  bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int  bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo);
 int  bt_sock_wait_ready(struct sock *sk, unsigned long flags);
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 84d92a077834..80033a7e1de2 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -437,16 +437,13 @@ static inline __poll_t bt_accept_poll(struct sock *parent)
 	return 0;
 }
 
-__poll_t bt_sock_poll(struct file *file, struct socket *sock,
-			  poll_table *wait)
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	__poll_t mask = 0;
 
 	BT_DBG("sock %p, sk %p", sock, sk);
 
-	poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == BT_LISTEN)
 		return bt_accept_poll(sk);
 
@@ -478,7 +475,7 @@ __poll_t bt_sock_poll(struct file *file, struct socket *sock,
 
 	return mask;
 }
-EXPORT_SYMBOL(bt_sock_poll);
+EXPORT_SYMBOL(bt_sock_poll_mask);
 
 int bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 67a8642f57ea..d20b33daa80f 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1654,7 +1654,7 @@ static const struct proto_ops l2cap_sock_ops = {
 	.getname	= l2cap_sock_getname,
 	.sendmsg	= l2cap_sock_sendmsg,
 	.recvmsg	= l2cap_sock_recvmsg,
-	.poll		= bt_sock_poll,
+	.poll_mask	= bt_sock_poll_mask,
 	.ioctl		= bt_sock_ioctl,
 	.mmap		= sock_no_mmap,
 	.socketpair	= sock_no_socketpair,
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 1aaccf637479..b4dc96481d92 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -1049,7 +1049,7 @@ static const struct proto_ops rfcomm_sock_ops = {
 	.setsockopt	= rfcomm_sock_setsockopt,
 	.getsockopt	= rfcomm_sock_getsockopt,
 	.ioctl		= rfcomm_sock_ioctl,
-	.poll		= bt_sock_poll,
+	.poll_mask	= bt_sock_poll_mask,
 	.socketpair	= sock_no_socketpair,
 	.mmap		= sock_no_mmap
 };
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index 08df57665e1f..b2bf5c767b3e 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -1198,7 +1198,7 @@ static const struct proto_ops sco_sock_ops = {
 	.getname	= sco_sock_getname,
 	.sendmsg	= sco_sock_sendmsg,
 	.recvmsg	= sco_sock_recvmsg,
-	.poll		= bt_sock_poll,
+	.poll_mask	= bt_sock_poll_mask,
 	.ioctl		= bt_sock_ioctl,
 	.mmap		= sock_no_mmap,
 	.socketpair	= sock_no_socketpair,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 26/36] net/bluetooth: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/bluetooth/bluetooth.h | 2 +-
 net/bluetooth/af_bluetooth.c      | 7 ++-----
 net/bluetooth/l2cap_sock.c        | 2 +-
 net/bluetooth/rfcomm/sock.c       | 2 +-
 net/bluetooth/sco.c               | 2 +-
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index ec9d6bc65855..53ce8176c313 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -271,7 +271,7 @@ int  bt_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 		     int flags);
 int  bt_sock_stream_recvmsg(struct socket *sock, struct msghdr *msg,
 			    size_t len, int flags);
-__poll_t bt_sock_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events);
 int  bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int  bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo);
 int  bt_sock_wait_ready(struct sock *sk, unsigned long flags);
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 84d92a077834..80033a7e1de2 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -437,16 +437,13 @@ static inline __poll_t bt_accept_poll(struct sock *parent)
 	return 0;
 }
 
-__poll_t bt_sock_poll(struct file *file, struct socket *sock,
-			  poll_table *wait)
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	__poll_t mask = 0;
 
 	BT_DBG("sock %p, sk %p", sock, sk);
 
-	poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == BT_LISTEN)
 		return bt_accept_poll(sk);
 
@@ -478,7 +475,7 @@ __poll_t bt_sock_poll(struct file *file, struct socket *sock,
 
 	return mask;
 }
-EXPORT_SYMBOL(bt_sock_poll);
+EXPORT_SYMBOL(bt_sock_poll_mask);
 
 int bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 67a8642f57ea..d20b33daa80f 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1654,7 +1654,7 @@ static const struct proto_ops l2cap_sock_ops = {
 	.getname	= l2cap_sock_getname,
 	.sendmsg	= l2cap_sock_sendmsg,
 	.recvmsg	= l2cap_sock_recvmsg,
-	.poll		= bt_sock_poll,
+	.poll_mask	= bt_sock_poll_mask,
 	.ioctl		= bt_sock_ioctl,
 	.mmap		= sock_no_mmap,
 	.socketpair	= sock_no_socketpair,
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 1aaccf637479..b4dc96481d92 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -1049,7 +1049,7 @@ static const struct proto_ops rfcomm_sock_ops = {
 	.setsockopt	= rfcomm_sock_setsockopt,
 	.getsockopt	= rfcomm_sock_getsockopt,
 	.ioctl		= rfcomm_sock_ioctl,
-	.poll		= bt_sock_poll,
+	.poll_mask	= bt_sock_poll_mask,
 	.socketpair	= sock_no_socketpair,
 	.mmap		= sock_no_mmap
 };
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index 08df57665e1f..b2bf5c767b3e 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -1198,7 +1198,7 @@ static const struct proto_ops sco_sock_ops = {
 	.getname	= sco_sock_getname,
 	.sendmsg	= sco_sock_sendmsg,
 	.recvmsg	= sco_sock_recvmsg,
-	.poll		= bt_sock_poll,
+	.poll_mask	= bt_sock_poll_mask,
 	.ioctl		= bt_sock_ioctl,
 	.mmap		= sock_no_mmap,
 	.socketpair	= sock_no_socketpair,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 27/36] net/caif: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/caif/caif_socket.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index a6fb1b3bcad9..c7991867d622 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -934,15 +934,11 @@ static int caif_release(struct socket *sock)
 }
 
 /* Copied from af_unix.c:unix_poll(), added CAIF tx_flow handling */
-static __poll_t caif_poll(struct file *file,
-			      struct socket *sock, poll_table *wait)
+static __poll_t caif_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	__poll_t mask;
 	struct caifsock *cf_sk = container_of(sk, struct caifsock, sk);
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err)
@@ -976,7 +972,7 @@ static const struct proto_ops caif_seqpacket_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = sock_no_getname,
-	.poll = caif_poll,
+	.poll_mask = caif_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = sock_no_listen,
 	.shutdown = sock_no_shutdown,
@@ -997,7 +993,7 @@ static const struct proto_ops caif_stream_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = sock_no_getname,
-	.poll = caif_poll,
+	.poll_mask = caif_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = sock_no_listen,
 	.shutdown = sock_no_shutdown,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 27/36] net/caif: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/caif/caif_socket.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index a6fb1b3bcad9..c7991867d622 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -934,15 +934,11 @@ static int caif_release(struct socket *sock)
 }
 
 /* Copied from af_unix.c:unix_poll(), added CAIF tx_flow handling */
-static __poll_t caif_poll(struct file *file,
-			      struct socket *sock, poll_table *wait)
+static __poll_t caif_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	__poll_t mask;
 	struct caifsock *cf_sk = container_of(sk, struct caifsock, sk);
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err)
@@ -976,7 +972,7 @@ static const struct proto_ops caif_seqpacket_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = sock_no_getname,
-	.poll = caif_poll,
+	.poll_mask = caif_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = sock_no_listen,
 	.shutdown = sock_no_shutdown,
@@ -997,7 +993,7 @@ static const struct proto_ops caif_stream_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = sock_no_getname,
-	.poll = caif_poll,
+	.poll_mask = caif_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = sock_no_listen,
 	.shutdown = sock_no_shutdown,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 28/36] net/nfc: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/nfc/llcp_sock.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 376040092142..b6010750e634 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -549,16 +549,13 @@ static inline __poll_t llcp_accept_poll(struct sock *parent)
 	return 0;
 }
 
-static __poll_t llcp_sock_poll(struct file *file, struct socket *sock,
-				   poll_table *wait)
+static __poll_t llcp_sock_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	__poll_t mask = 0;
 
 	pr_debug("%p\n", sk);
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == LLCP_LISTEN)
 		return llcp_accept_poll(sk);
 
@@ -900,7 +897,7 @@ static const struct proto_ops llcp_sock_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = llcp_sock_accept,
 	.getname        = llcp_sock_getname,
-	.poll           = llcp_sock_poll,
+	.poll_mask      = llcp_sock_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = llcp_sock_listen,
 	.shutdown       = sock_no_shutdown,
@@ -920,7 +917,7 @@ static const struct proto_ops llcp_rawsock_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = sock_no_accept,
 	.getname        = llcp_sock_getname,
-	.poll           = llcp_sock_poll,
+	.poll_mask      = llcp_sock_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = sock_no_listen,
 	.shutdown       = sock_no_shutdown,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 28/36] net/nfc: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/nfc/llcp_sock.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 376040092142..b6010750e634 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -549,16 +549,13 @@ static inline __poll_t llcp_accept_poll(struct sock *parent)
 	return 0;
 }
 
-static __poll_t llcp_sock_poll(struct file *file, struct socket *sock,
-				   poll_table *wait)
+static __poll_t llcp_sock_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	__poll_t mask = 0;
 
 	pr_debug("%p\n", sk);
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == LLCP_LISTEN)
 		return llcp_accept_poll(sk);
 
@@ -900,7 +897,7 @@ static const struct proto_ops llcp_sock_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = llcp_sock_accept,
 	.getname        = llcp_sock_getname,
-	.poll           = llcp_sock_poll,
+	.poll_mask      = llcp_sock_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = llcp_sock_listen,
 	.shutdown       = sock_no_shutdown,
@@ -920,7 +917,7 @@ static const struct proto_ops llcp_rawsock_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = sock_no_accept,
 	.getname        = llcp_sock_getname,
-	.poll           = llcp_sock_poll,
+	.poll_mask      = llcp_sock_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = sock_no_listen,
 	.shutdown       = sock_no_shutdown,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 29/36] net/phonet: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/phonet/socket.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 28d981512f5f..70ac4539d5b7 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -341,15 +341,12 @@ static int pn_socket_getname(struct socket *sock, struct sockaddr *addr,
 	return 0;
 }
 
-static __poll_t pn_socket_poll(struct file *file, struct socket *sock,
-					poll_table *wait)
+static __poll_t pn_socket_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct pep_sock *pn = pep_sk(sk);
 	__poll_t mask = 0;
 
-	poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == TCP_CLOSE)
 		return EPOLLERR;
 	if (!skb_queue_empty(&sk->sk_receive_queue))
@@ -474,7 +471,7 @@ const struct proto_ops phonet_stream_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= pn_socket_accept,
 	.getname	= pn_socket_getname,
-	.poll		= pn_socket_poll,
+	.poll_mask	= pn_socket_poll_mask,
 	.ioctl		= pn_socket_ioctl,
 	.listen		= pn_socket_listen,
 	.shutdown	= sock_no_shutdown,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 29/36] net/phonet: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/phonet/socket.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 28d981512f5f..70ac4539d5b7 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -341,15 +341,12 @@ static int pn_socket_getname(struct socket *sock, struct sockaddr *addr,
 	return 0;
 }
 
-static __poll_t pn_socket_poll(struct file *file, struct socket *sock,
-					poll_table *wait)
+static __poll_t pn_socket_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct pep_sock *pn = pep_sk(sk);
 	__poll_t mask = 0;
 
-	poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == TCP_CLOSE)
 		return EPOLLERR;
 	if (!skb_queue_empty(&sk->sk_receive_queue))
@@ -474,7 +471,7 @@ const struct proto_ops phonet_stream_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= pn_socket_accept,
 	.getname	= pn_socket_getname,
-	.poll		= pn_socket_poll,
+	.poll_mask	= pn_socket_poll_mask,
 	.ioctl		= pn_socket_ioctl,
 	.listen		= pn_socket_listen,
 	.shutdown	= sock_no_shutdown,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 30/36] net/iucv: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/iucv/af_iucv.h | 2 --
 net/iucv/af_iucv.c         | 7 ++-----
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/net/iucv/af_iucv.h b/include/net/iucv/af_iucv.h
index f4c21b5a1242..b0eaeb02d46d 100644
--- a/include/net/iucv/af_iucv.h
+++ b/include/net/iucv/af_iucv.h
@@ -153,8 +153,6 @@ struct iucv_sock_list {
 	atomic_t	  autobind_name;
 };
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-			    poll_table *wait);
 void iucv_sock_link(struct iucv_sock_list *l, struct sock *s);
 void iucv_sock_unlink(struct iucv_sock_list *l, struct sock *s);
 void iucv_accept_enqueue(struct sock *parent, struct sock *sk);
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 1e8cc7bcbca3..539a312dc481 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1489,14 +1489,11 @@ static inline __poll_t iucv_accept_poll(struct sock *parent)
 	return 0;
 }
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-			    poll_table *wait)
+static __poll_t iucv_sock_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	__poll_t mask = 0;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == IUCV_LISTEN)
 		return iucv_accept_poll(sk);
 
@@ -2389,7 +2386,7 @@ static const struct proto_ops iucv_sock_ops = {
 	.getname	= iucv_sock_getname,
 	.sendmsg	= iucv_sock_sendmsg,
 	.recvmsg	= iucv_sock_recvmsg,
-	.poll		= iucv_sock_poll,
+	.poll_mask	= iucv_sock_poll_mask,
 	.ioctl		= sock_no_ioctl,
 	.mmap		= sock_no_mmap,
 	.socketpair	= sock_no_socketpair,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 30/36] net/iucv: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/iucv/af_iucv.h | 2 --
 net/iucv/af_iucv.c         | 7 ++-----
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/net/iucv/af_iucv.h b/include/net/iucv/af_iucv.h
index f4c21b5a1242..b0eaeb02d46d 100644
--- a/include/net/iucv/af_iucv.h
+++ b/include/net/iucv/af_iucv.h
@@ -153,8 +153,6 @@ struct iucv_sock_list {
 	atomic_t	  autobind_name;
 };
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-			    poll_table *wait);
 void iucv_sock_link(struct iucv_sock_list *l, struct sock *s);
 void iucv_sock_unlink(struct iucv_sock_list *l, struct sock *s);
 void iucv_accept_enqueue(struct sock *parent, struct sock *sk);
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 1e8cc7bcbca3..539a312dc481 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1489,14 +1489,11 @@ static inline __poll_t iucv_accept_poll(struct sock *parent)
 	return 0;
 }
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-			    poll_table *wait)
+static __poll_t iucv_sock_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	__poll_t mask = 0;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == IUCV_LISTEN)
 		return iucv_accept_poll(sk);
 
@@ -2389,7 +2386,7 @@ static const struct proto_ops iucv_sock_ops = {
 	.getname	= iucv_sock_getname,
 	.sendmsg	= iucv_sock_sendmsg,
 	.recvmsg	= iucv_sock_recvmsg,
-	.poll		= iucv_sock_poll,
+	.poll_mask	= iucv_sock_poll_mask,
 	.ioctl		= sock_no_ioctl,
 	.mmap		= sock_no_mmap,
 	.socketpair	= sock_no_socketpair,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 31/36] net/rxrpc: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/rxrpc/af_rxrpc.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0c9c18aa7c77..d2440d5c3ce8 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -729,15 +729,11 @@ static int rxrpc_getsockopt(struct socket *sock, int level, int optname,
 /*
  * permit an RxRPC socket to be polled
  */
-static __poll_t rxrpc_poll(struct file *file, struct socket *sock,
-			       poll_table *wait)
+static __poll_t rxrpc_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct rxrpc_sock *rx = rxrpc_sk(sk);
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* the socket is readable if there are any messages waiting on the Rx
 	 * queue */
@@ -940,7 +936,7 @@ static const struct proto_ops rxrpc_rpc_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= sock_no_getname,
-	.poll		= rxrpc_poll,
+	.poll_mask	= rxrpc_poll_mask,
 	.ioctl		= sock_no_ioctl,
 	.listen		= rxrpc_listen,
 	.shutdown	= rxrpc_shutdown,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 31/36] net/rxrpc: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/rxrpc/af_rxrpc.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0c9c18aa7c77..d2440d5c3ce8 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -729,15 +729,11 @@ static int rxrpc_getsockopt(struct socket *sock, int level, int optname,
 /*
  * permit an RxRPC socket to be polled
  */
-static __poll_t rxrpc_poll(struct file *file, struct socket *sock,
-			       poll_table *wait)
+static __poll_t rxrpc_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct rxrpc_sock *rx = rxrpc_sk(sk);
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* the socket is readable if there are any messages waiting on the Rx
 	 * queue */
@@ -940,7 +936,7 @@ static const struct proto_ops rxrpc_rpc_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= sock_no_getname,
-	.poll		= rxrpc_poll,
+	.poll_mask	= rxrpc_poll_mask,
 	.ioctl		= sock_no_ioctl,
 	.listen		= rxrpc_listen,
 	.shutdown	= rxrpc_shutdown,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 32/36] crypto: af_alg: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 crypto/af_alg.c         | 13 +++----------
 crypto/algif_aead.c     |  4 ++--
 crypto/algif_skcipher.c |  4 ++--
 include/crypto/if_alg.h |  3 +--
 4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 50d75de539f5..330aef1cd08b 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1060,19 +1060,12 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
 }
 EXPORT_SYMBOL_GPL(af_alg_async_cb);
 
-/**
- * af_alg_poll - poll system call handler
- */
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-			 poll_table *wait)
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
 	struct af_alg_ctx *ctx = ask->private;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	if (!ctx->more || ctx->used)
 		mask |= EPOLLIN | EPOLLRDNORM;
@@ -1082,7 +1075,7 @@ __poll_t af_alg_poll(struct file *file, struct socket *sock,
 
 	return mask;
 }
-EXPORT_SYMBOL_GPL(af_alg_poll);
+EXPORT_SYMBOL_GPL(af_alg_poll_mask);
 
 /**
  * af_alg_alloc_areq - allocate struct af_alg_async_req
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 4b07edd5a9ff..330cf9f2b767 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -375,7 +375,7 @@ static struct proto_ops algif_aead_ops = {
 	.sendmsg	=	aead_sendmsg,
 	.sendpage	=	af_alg_sendpage,
 	.recvmsg	=	aead_recvmsg,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static int aead_check_key(struct socket *sock)
@@ -471,7 +471,7 @@ static struct proto_ops algif_aead_ops_nokey = {
 	.sendmsg	=	aead_sendmsg_nokey,
 	.sendpage	=	aead_sendpage_nokey,
 	.recvmsg	=	aead_recvmsg_nokey,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static void *aead_bind(const char *name, u32 type, u32 mask)
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index c4e885df4564..15cf3c5222e0 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -205,7 +205,7 @@ static struct proto_ops algif_skcipher_ops = {
 	.sendmsg	=	skcipher_sendmsg,
 	.sendpage	=	af_alg_sendpage,
 	.recvmsg	=	skcipher_recvmsg,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static int skcipher_check_key(struct socket *sock)
@@ -301,7 +301,7 @@ static struct proto_ops algif_skcipher_ops_nokey = {
 	.sendmsg	=	skcipher_sendmsg_nokey,
 	.sendpage	=	skcipher_sendpage_nokey,
 	.recvmsg	=	skcipher_recvmsg_nokey,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static void *skcipher_bind(const char *name, u32 type, u32 mask)
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 482461d8931d..cc414db9da0a 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -245,8 +245,7 @@ ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
 			int offset, size_t size, int flags);
 void af_alg_free_resources(struct af_alg_async_req *areq);
 void af_alg_async_cb(struct crypto_async_request *_req, int err);
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-			 poll_table *wait);
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events);
 struct af_alg_async_req *af_alg_alloc_areq(struct sock *sk,
 					   unsigned int areqlen);
 int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 32/36] crypto: af_alg: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 crypto/af_alg.c         | 13 +++----------
 crypto/algif_aead.c     |  4 ++--
 crypto/algif_skcipher.c |  4 ++--
 include/crypto/if_alg.h |  3 +--
 4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 50d75de539f5..330aef1cd08b 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1060,19 +1060,12 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
 }
 EXPORT_SYMBOL_GPL(af_alg_async_cb);
 
-/**
- * af_alg_poll - poll system call handler
- */
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-			 poll_table *wait)
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
 	struct af_alg_ctx *ctx = ask->private;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	if (!ctx->more || ctx->used)
 		mask |= EPOLLIN | EPOLLRDNORM;
@@ -1082,7 +1075,7 @@ __poll_t af_alg_poll(struct file *file, struct socket *sock,
 
 	return mask;
 }
-EXPORT_SYMBOL_GPL(af_alg_poll);
+EXPORT_SYMBOL_GPL(af_alg_poll_mask);
 
 /**
  * af_alg_alloc_areq - allocate struct af_alg_async_req
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 4b07edd5a9ff..330cf9f2b767 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -375,7 +375,7 @@ static struct proto_ops algif_aead_ops = {
 	.sendmsg	=	aead_sendmsg,
 	.sendpage	=	af_alg_sendpage,
 	.recvmsg	=	aead_recvmsg,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static int aead_check_key(struct socket *sock)
@@ -471,7 +471,7 @@ static struct proto_ops algif_aead_ops_nokey = {
 	.sendmsg	=	aead_sendmsg_nokey,
 	.sendpage	=	aead_sendpage_nokey,
 	.recvmsg	=	aead_recvmsg_nokey,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static void *aead_bind(const char *name, u32 type, u32 mask)
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index c4e885df4564..15cf3c5222e0 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -205,7 +205,7 @@ static struct proto_ops algif_skcipher_ops = {
 	.sendmsg	=	skcipher_sendmsg,
 	.sendpage	=	af_alg_sendpage,
 	.recvmsg	=	skcipher_recvmsg,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static int skcipher_check_key(struct socket *sock)
@@ -301,7 +301,7 @@ static struct proto_ops algif_skcipher_ops_nokey = {
 	.sendmsg	=	skcipher_sendmsg_nokey,
 	.sendpage	=	skcipher_sendpage_nokey,
 	.recvmsg	=	skcipher_recvmsg_nokey,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static void *skcipher_bind(const char *name, u32 type, u32 mask)
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 482461d8931d..cc414db9da0a 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -245,8 +245,7 @@ ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
 			int offset, size_t size, int flags);
 void af_alg_free_resources(struct af_alg_async_req *areq);
 void af_alg_async_cb(struct crypto_async_request *_req, int err);
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-			 poll_table *wait);
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events);
 struct af_alg_async_req *af_alg_alloc_areq(struct sock *sk,
 					   unsigned int areqlen);
 int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 33/36] pipe: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/pipe.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 7b1954caf388..81937590ea0a 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -509,19 +509,22 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	}
 }
 
-/* No kernel lock held - fine */
-static __poll_t
-pipe_poll(struct file *filp, poll_table *wait)
+static struct wait_queue_head *
+pipe_get_poll_head(struct file *filp, __poll_t events)
 {
-	__poll_t mask;
 	struct pipe_inode_info *pipe = filp->private_data;
-	int nrbufs;
 
-	poll_wait(filp, &pipe->wait, wait);
+	return &pipe->wait;
+}
+
+/* No kernel lock held - fine */
+static __poll_t pipe_poll_mask(struct file *filp, __poll_t events)
+{
+	struct pipe_inode_info *pipe = filp->private_data;
+	int nrbufs = pipe->nrbufs;
+	__poll_t mask = 0;
 
 	/* Reading only -- no need for acquiring the semaphore.  */
-	nrbufs = pipe->nrbufs;
-	mask = 0;
 	if (filp->f_mode & FMODE_READ) {
 		mask = (nrbufs > 0) ? EPOLLIN | EPOLLRDNORM : 0;
 		if (!pipe->writers && filp->f_version != pipe->w_counter)
@@ -1015,7 +1018,8 @@ const struct file_operations pipefifo_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= pipe_read,
 	.write_iter	= pipe_write,
-	.poll		= pipe_poll,
+	.get_poll_head	= pipe_get_poll_head,
+	.poll_mask	= pipe_poll_mask,
 	.unlocked_ioctl	= pipe_ioctl,
 	.release	= pipe_release,
 	.fasync		= pipe_fasync,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 33/36] pipe: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/pipe.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 7b1954caf388..81937590ea0a 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -509,19 +509,22 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	}
 }
 
-/* No kernel lock held - fine */
-static __poll_t
-pipe_poll(struct file *filp, poll_table *wait)
+static struct wait_queue_head *
+pipe_get_poll_head(struct file *filp, __poll_t events)
 {
-	__poll_t mask;
 	struct pipe_inode_info *pipe = filp->private_data;
-	int nrbufs;
 
-	poll_wait(filp, &pipe->wait, wait);
+	return &pipe->wait;
+}
+
+/* No kernel lock held - fine */
+static __poll_t pipe_poll_mask(struct file *filp, __poll_t events)
+{
+	struct pipe_inode_info *pipe = filp->private_data;
+	int nrbufs = pipe->nrbufs;
+	__poll_t mask = 0;
 
 	/* Reading only -- no need for acquiring the semaphore.  */
-	nrbufs = pipe->nrbufs;
-	mask = 0;
 	if (filp->f_mode & FMODE_READ) {
 		mask = (nrbufs > 0) ? EPOLLIN | EPOLLRDNORM : 0;
 		if (!pipe->writers && filp->f_version != pipe->w_counter)
@@ -1015,7 +1018,8 @@ const struct file_operations pipefifo_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= pipe_read,
 	.write_iter	= pipe_write,
-	.poll		= pipe_poll,
+	.get_poll_head	= pipe_get_poll_head,
+	.poll_mask	= pipe_poll_mask,
 	.unlocked_ioctl	= pipe_ioctl,
 	.release	= pipe_release,
 	.fasync		= pipe_fasync,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 34/36] eventfd: switch to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/eventfd.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 012f5bd46dfa..d70b4907f978 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -101,14 +101,20 @@ static int eventfd_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static __poll_t eventfd_poll(struct file *file, poll_table *wait)
+static struct wait_queue_head *
+eventfd_get_poll_head(struct file *file, __poll_t events)
+{
+	struct eventfd_ctx *ctx = file->private_data;
+
+	return &ctx->wqh;
+}
+
+static __poll_t eventfd_poll_mask(struct file *file, __poll_t eventmask)
 {
 	struct eventfd_ctx *ctx = file->private_data;
 	__poll_t events = 0;
 	u64 count;
 
-	poll_wait(file, &ctx->wqh, wait);
-
 	/*
 	 * All writes to ctx->count occur within ctx->wqh.lock.  This read
 	 * can be done outside ctx->wqh.lock because we know that poll_wait
@@ -305,7 +311,8 @@ static const struct file_operations eventfd_fops = {
 	.show_fdinfo	= eventfd_show_fdinfo,
 #endif
 	.release	= eventfd_release,
-	.poll		= eventfd_poll,
+	.get_poll_head	= eventfd_get_poll_head,
+	.poll_mask	= eventfd_poll_mask,
 	.read		= eventfd_read,
 	.write		= eventfd_write,
 	.llseek		= noop_llseek,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 34/36] eventfd: switch to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/eventfd.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 012f5bd46dfa..d70b4907f978 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -101,14 +101,20 @@ static int eventfd_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static __poll_t eventfd_poll(struct file *file, poll_table *wait)
+static struct wait_queue_head *
+eventfd_get_poll_head(struct file *file, __poll_t events)
+{
+	struct eventfd_ctx *ctx = file->private_data;
+
+	return &ctx->wqh;
+}
+
+static __poll_t eventfd_poll_mask(struct file *file, __poll_t eventmask)
 {
 	struct eventfd_ctx *ctx = file->private_data;
 	__poll_t events = 0;
 	u64 count;
 
-	poll_wait(file, &ctx->wqh, wait);
-
 	/*
 	 * All writes to ctx->count occur within ctx->wqh.lock.  This read
 	 * can be done outside ctx->wqh.lock because we know that poll_wait
@@ -305,7 +311,8 @@ static const struct file_operations eventfd_fops = {
 	.show_fdinfo	= eventfd_show_fdinfo,
 #endif
 	.release	= eventfd_release,
-	.poll		= eventfd_poll,
+	.get_poll_head	= eventfd_get_poll_head,
+	.poll_mask	= eventfd_poll_mask,
 	.read		= eventfd_read,
 	.write		= eventfd_write,
 	.llseek		= noop_llseek,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 35/36] timerfd: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/timerfd.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index cdad49da3ff7..d84a2bee4f82 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -226,21 +226,20 @@ static int timerfd_release(struct inode *inode, struct file *file)
 	kfree_rcu(ctx, rcu);
 	return 0;
 }
-
-static __poll_t timerfd_poll(struct file *file, poll_table *wait)
+	
+static struct wait_queue_head *timerfd_get_poll_head(struct file *file,
+		__poll_t eventmask)
 {
 	struct timerfd_ctx *ctx = file->private_data;
-	__poll_t events = 0;
-	unsigned long flags;
 
-	poll_wait(file, &ctx->wqh, wait);
+	return &ctx->wqh;
+}
 
-	spin_lock_irqsave(&ctx->wqh.lock, flags);
-	if (ctx->ticks)
-		events |= EPOLLIN;
-	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+static __poll_t timerfd_poll_mask(struct file *file, __poll_t eventmask)
+{
+	struct timerfd_ctx *ctx = file->private_data;
 
-	return events;
+	return ctx->ticks ? EPOLLIN : 0;
 }
 
 static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
@@ -364,7 +363,8 @@ static long timerfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg
 
 static const struct file_operations timerfd_fops = {
 	.release	= timerfd_release,
-	.poll		= timerfd_poll,
+	.get_poll_head	= timerfd_get_poll_head,
+	.poll_mask	= timerfd_poll_mask,
 	.read		= timerfd_read,
 	.llseek		= noop_llseek,
 	.show_fdinfo	= timerfd_show,
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 35/36] timerfd: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/timerfd.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index cdad49da3ff7..d84a2bee4f82 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -226,21 +226,20 @@ static int timerfd_release(struct inode *inode, struct file *file)
 	kfree_rcu(ctx, rcu);
 	return 0;
 }
-
-static __poll_t timerfd_poll(struct file *file, poll_table *wait)
+	
+static struct wait_queue_head *timerfd_get_poll_head(struct file *file,
+		__poll_t eventmask)
 {
 	struct timerfd_ctx *ctx = file->private_data;
-	__poll_t events = 0;
-	unsigned long flags;
 
-	poll_wait(file, &ctx->wqh, wait);
+	return &ctx->wqh;
+}
 
-	spin_lock_irqsave(&ctx->wqh.lock, flags);
-	if (ctx->ticks)
-		events |= EPOLLIN;
-	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+static __poll_t timerfd_poll_mask(struct file *file, __poll_t eventmask)
+{
+	struct timerfd_ctx *ctx = file->private_data;
 
-	return events;
+	return ctx->ticks ? EPOLLIN : 0;
 }
 
 static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
@@ -364,7 +363,8 @@ static long timerfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg
 
 static const struct file_operations timerfd_fops = {
 	.release	= timerfd_release,
-	.poll		= timerfd_poll,
+	.get_poll_head	= timerfd_get_poll_head,
+	.poll_mask	= timerfd_poll_mask,
 	.read		= timerfd_read,
 	.llseek		= noop_llseek,
 	.show_fdinfo	= timerfd_show,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 36/36] random: convert to ->poll_mask
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-05 21:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The big change is that random_read_wait and random_write_wait are merged
into a single waitqueue that uses keyed wakeups.  Because wait_event_*
doesn't know about that this will lead to occassional spurious wakeups
in _random_read and add_hwgenerator_randomness, but wait_event_* is
designed to handle these and were are not in a a hot path there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/char/random.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index e5b3d3ba4660..840d80b64431 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -401,8 +401,7 @@ static struct poolinfo {
 /*
  * Static global variables
  */
-static DECLARE_WAIT_QUEUE_HEAD(random_read_wait);
-static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
+static DECLARE_WAIT_QUEUE_HEAD(random_wait);
 static struct fasync_struct *fasync;
 
 static DEFINE_SPINLOCK(random_ready_list_lock);
@@ -710,7 +709,7 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
 
 		/* should we wake readers? */
 		if (entropy_bits >= random_read_wakeup_bits) {
-			wake_up_interruptible(&random_read_wait);
+			wake_up_interruptible_poll(&random_wait, POLLIN);
 			kill_fasync(&fasync, SIGIO, POLL_IN);
 		}
 		/* If the input pool is getting full, send some
@@ -1293,7 +1292,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
 	trace_debit_entropy(r->name, 8 * ibytes);
 	if (ibytes &&
 	    (r->entropy_count >> ENTROPY_SHIFT) < random_write_wakeup_bits) {
-		wake_up_interruptible(&random_write_wait);
+		wake_up_interruptible_poll(&random_wait, POLLOUT);
 		kill_fasync(&fasync, SIGIO, POLL_OUT);
 	}
 
@@ -1748,7 +1747,7 @@ _random_read(int nonblock, char __user *buf, size_t nbytes)
 		if (nonblock)
 			return -EAGAIN;
 
-		wait_event_interruptible(random_read_wait,
+		wait_event_interruptible(random_wait,
 			ENTROPY_BITS(&input_pool) >=
 			random_read_wakeup_bits);
 		if (signal_pending(current))
@@ -1784,14 +1783,17 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	return ret;
 }
 
+static struct wait_queue_head *
+random_get_poll_head(struct file *file, __poll_t events)
+{
+	return &random_wait;
+}
+
 static __poll_t
-random_poll(struct file *file, poll_table * wait)
+random_poll_mask(struct file *file, __poll_t events)
 {
-	__poll_t mask;
+	__poll_t mask = 0;
 
-	poll_wait(file, &random_read_wait, wait);
-	poll_wait(file, &random_write_wait, wait);
-	mask = 0;
 	if (ENTROPY_BITS(&input_pool) >= random_read_wakeup_bits)
 		mask |= EPOLLIN | EPOLLRDNORM;
 	if (ENTROPY_BITS(&input_pool) < random_write_wakeup_bits)
@@ -1890,7 +1892,8 @@ static int random_fasync(int fd, struct file *filp, int on)
 const struct file_operations random_fops = {
 	.read  = random_read,
 	.write = random_write,
-	.poll  = random_poll,
+	.get_poll_head  = random_get_poll_head,
+	.poll_mask  = random_poll_mask,
 	.unlocked_ioctl = random_ioctl,
 	.fasync = random_fasync,
 	.llseek = noop_llseek,
@@ -2223,7 +2226,7 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
 	 * We'll be woken up again once below random_write_wakeup_thresh,
 	 * or when the calling thread is about to terminate.
 	 */
-	wait_event_interruptible(random_write_wait, kthread_should_stop() ||
+	wait_event_interruptible(random_wait, kthread_should_stop() ||
 			ENTROPY_BITS(&input_pool) <= random_write_wakeup_bits);
 	mix_pool_bytes(poolp, buffer, count);
 	credit_entropy_bits(poolp, entropy);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* [PATCH 36/36] random: convert to ->poll_mask
@ 2018-03-05 21:27   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-05 21:27 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The big change is that random_read_wait and random_write_wait are merged
into a single waitqueue that uses keyed wakeups.  Because wait_event_*
doesn't know about that this will lead to occassional spurious wakeups
in _random_read and add_hwgenerator_randomness, but wait_event_* is
designed to handle these and were are not in a a hot path there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/char/random.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index e5b3d3ba4660..840d80b64431 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -401,8 +401,7 @@ static struct poolinfo {
 /*
  * Static global variables
  */
-static DECLARE_WAIT_QUEUE_HEAD(random_read_wait);
-static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
+static DECLARE_WAIT_QUEUE_HEAD(random_wait);
 static struct fasync_struct *fasync;
 
 static DEFINE_SPINLOCK(random_ready_list_lock);
@@ -710,7 +709,7 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
 
 		/* should we wake readers? */
 		if (entropy_bits >= random_read_wakeup_bits) {
-			wake_up_interruptible(&random_read_wait);
+			wake_up_interruptible_poll(&random_wait, POLLIN);
 			kill_fasync(&fasync, SIGIO, POLL_IN);
 		}
 		/* If the input pool is getting full, send some
@@ -1293,7 +1292,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
 	trace_debit_entropy(r->name, 8 * ibytes);
 	if (ibytes &&
 	    (r->entropy_count >> ENTROPY_SHIFT) < random_write_wakeup_bits) {
-		wake_up_interruptible(&random_write_wait);
+		wake_up_interruptible_poll(&random_wait, POLLOUT);
 		kill_fasync(&fasync, SIGIO, POLL_OUT);
 	}
 
@@ -1748,7 +1747,7 @@ _random_read(int nonblock, char __user *buf, size_t nbytes)
 		if (nonblock)
 			return -EAGAIN;
 
-		wait_event_interruptible(random_read_wait,
+		wait_event_interruptible(random_wait,
 			ENTROPY_BITS(&input_pool) >=
 			random_read_wakeup_bits);
 		if (signal_pending(current))
@@ -1784,14 +1783,17 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	return ret;
 }
 
+static struct wait_queue_head *
+random_get_poll_head(struct file *file, __poll_t events)
+{
+	return &random_wait;
+}
+
 static __poll_t
-random_poll(struct file *file, poll_table * wait)
+random_poll_mask(struct file *file, __poll_t events)
 {
-	__poll_t mask;
+	__poll_t mask = 0;
 
-	poll_wait(file, &random_read_wait, wait);
-	poll_wait(file, &random_write_wait, wait);
-	mask = 0;
 	if (ENTROPY_BITS(&input_pool) >= random_read_wakeup_bits)
 		mask |= EPOLLIN | EPOLLRDNORM;
 	if (ENTROPY_BITS(&input_pool) < random_write_wakeup_bits)
@@ -1890,7 +1892,8 @@ static int random_fasync(int fd, struct file *filp, int on)
 const struct file_operations random_fops = {
 	.read  = random_read,
 	.write = random_write,
-	.poll  = random_poll,
+	.get_poll_head  = random_get_poll_head,
+	.poll_mask  = random_poll_mask,
 	.unlocked_ioctl = random_ioctl,
 	.fasync = random_fasync,
 	.llseek = noop_llseek,
@@ -2223,7 +2226,7 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
 	 * We'll be woken up again once below random_write_wakeup_thresh,
 	 * or when the calling thread is about to terminate.
 	 */
-	wait_event_interruptible(random_write_wait, kthread_should_stop() ||
+	wait_event_interruptible(random_wait, kthread_should_stop() ||
 			ENTROPY_BITS(&input_pool) <= random_write_wakeup_bits);
 	mix_pool_bytes(poolp, buffer, count);
 	credit_entropy_bits(poolp, entropy);
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
  2018-03-05 21:27   ` Christoph Hellwig
  (?)
@ 2018-03-05 21:51     ` Jeff Moyer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-05 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> This is the io_getevents equivalent of ppoll/pselect and allows to
> properly mix signals and aio completions (especially with IOCB_CMD_POLL)
> and atomically executes the following sequence:
>
> 	sigset_t origmask;
>
> 	pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
> 	ret = io_getevents(ctx, min_nr, nr, events, timeout);
> 	pthread_sigmask(SIG_SETMASK, &origmask, NULL);
>
> Note that unlike many other signal related calls we do not pass a sigmask
> size, as that would get us to 7 arguments, which aren't easily supported
> by the syscall infrastructure.  It seems a lot less painful to just add a
> new syscall variant in the unlikely case we're going to increase the
> sigset size.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I acked this in the last set, so...

Acked-by: Jeff Moyer <jmoyer@redhat.com>

> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  fs/aio.c                               | 114 ++++++++++++++++++++++++++++++---
>  include/linux/compat.h                 |   7 ++
>  include/linux/syscalls.h               |   6 ++
>  include/uapi/asm-generic/unistd.h      |   4 +-
>  include/uapi/linux/aio_abi.h           |   6 ++
>  kernel/sys_ni.c                        |   2 +
>  8 files changed, 130 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 448ac2161112..5997c3e9ac3e 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -391,3 +391,4 @@
>  382	i386	pkey_free		sys_pkey_free
>  383	i386	statx			sys_statx
>  384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
> +385	i386	io_pgetevents		sys_io_pgetevents		compat_sys_io_pgetevents
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 5aef183e2f85..e995cd2b4e65 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -339,6 +339,7 @@
>  330	common	pkey_alloc		sys_pkey_alloc
>  331	common	pkey_free		sys_pkey_free
>  332	common	statx			sys_statx
> +333	common	io_pgetevents		sys_io_pgetevents
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/aio.c b/fs/aio.c
> index 9d7d6e4cde87..da87cbf7c67a 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1291,10 +1291,6 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
>  		wait_event_interruptible_hrtimeout(ctx->wait,
>  				aio_read_events(ctx, min_nr, nr, event, &ret),
>  				until);
> -
> -	if (!ret && signal_pending(current))
> -		ret = -EINTR;
> -
>  	return ret;
>  }
>  
> @@ -1874,13 +1870,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
>  		struct timespec __user *, timeout)
>  {
>  	struct timespec64	ts;
> +	int			ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +SYSCALL_DEFINE6(io_pgetevents,
> +		aio_context_t, ctx_id,
> +		long, min_nr,
> +		long, nr,
> +		struct io_event __user *, events,
> +		struct timespec __user *, timeout,
> +		const struct __aio_sigset __user *, usig)
> +{
> +	struct __aio_sigset	ksig = { NULL, };
> +	sigset_t		ksigmask, sigsaved;
> +	struct timespec64	ts;
> +	int ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (unlikely(get_timespec64(&ts, timeout)))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(sigset_t))
> +			return -EINVAL;
> +		if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	return ret;
>  }
>  
>  #ifdef CONFIG_COMPAT
> @@ -1891,13 +1934,64 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id,
>  		       struct compat_timespec __user *, timeout)
>  {
>  	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +
> +struct __compat_aio_sigset {
> +	compat_sigset_t __user	*sigmask;
> +	compat_size_t		sigsetsize;
> +};
> +
> +COMPAT_SYSCALL_DEFINE6(io_pgetevents,
> +		compat_aio_context_t, ctx_id,
> +		compat_long_t, min_nr,
> +		compat_long_t, nr,
> +		struct io_event __user *, events,
> +		struct compat_timespec __user *, timeout,
> +		const struct __compat_aio_sigset __user *, usig)
> +{
> +	struct __compat_aio_sigset ksig = { NULL, };
> +	sigset_t ksigmask, sigsaved;
> +	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (compat_get_timespec64(&t, timeout))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(compat_sigset_t))
> +			return -EINVAL;
> +		if (get_compat_sigset(&ksigmask, ksig.sigmask))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
>  
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	return ret;
>  }
>  #endif
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index 8a9643857c4a..bfb8a94fbabd 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -303,6 +303,7 @@ extern int put_compat_rusage(const struct rusage *,
>  			     struct compat_rusage __user *);
>  
>  struct compat_siginfo;
> +struct __compat_aio_sigset;
>  
>  extern asmlinkage long compat_sys_waitid(int, compat_pid_t,
>  		struct compat_siginfo __user *, int,
> @@ -634,6 +635,12 @@ asmlinkage long compat_sys_io_getevents(compat_aio_context_t ctx_id,
>  					compat_long_t nr,
>  					struct io_event __user *events,
>  					struct compat_timespec __user *timeout);
> +asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
> +					compat_long_t min_nr,
> +					compat_long_t nr,
> +					struct io_event __user *events,
> +					struct compat_timespec __user *timeout,
> +					const struct __compat_aio_sigset __user *usig);
>  asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr,
>  				     u32 __user *iocb);
>  asmlinkage long compat_sys_mount(const char __user *dev_name,
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index a78186d826d7..8515ec53c81b 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -539,6 +539,12 @@ asmlinkage long sys_io_getevents(aio_context_t ctx_id,
>  				long nr,
>  				struct io_event __user *events,
>  				struct timespec __user *timeout);
> +asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
> +				long min_nr,
> +				long nr,
> +				struct io_event __user *events,
> +				struct timespec __user *timeout,
> +				const struct __aio_sigset *sig);
>  asmlinkage long sys_io_submit(aio_context_t, long,
>  				struct iocb __user * __user *);
>  asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 8b87de067bc7..ce2ebbeece10 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
>  __SYSCALL(__NR_pkey_free,     sys_pkey_free)
>  #define __NR_statx 291
>  __SYSCALL(__NR_statx,     sys_statx)
> +#define __NR_io_pgetevents 292
> +__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 292
> +#define __NR_syscalls 293
>  
>  /*
>   * All syscalls below here should go away really,
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index a04adbc70ddf..2c0a3415beee 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -29,6 +29,7 @@
>  
>  #include <linux/types.h>
>  #include <linux/fs.h>
> +#include <linux/signal.h>
>  #include <asm/byteorder.h>
>  
>  typedef __kernel_ulong_t aio_context_t;
> @@ -108,5 +109,10 @@ struct iocb {
>  #undef IFBIG
>  #undef IFLITTLE
>  
> +struct __aio_sigset {
> +	sigset_t __user	*sigmask;
> +	size_t		sigsetsize;
> +};
> +
>  #endif /* __LINUX__AIO_ABI_H */
>  
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index b5189762d275..8f7705559b38 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -151,9 +151,11 @@ cond_syscall(sys_io_destroy);
>  cond_syscall(sys_io_submit);
>  cond_syscall(sys_io_cancel);
>  cond_syscall(sys_io_getevents);
> +cond_syscall(sys_io_pgetevents);
>  cond_syscall(compat_sys_io_setup);
>  cond_syscall(compat_sys_io_submit);
>  cond_syscall(compat_sys_io_getevents);
> +cond_syscall(compat_sys_io_pgetevents);
>  cond_syscall(sys_sysfs);
>  cond_syscall(sys_syslog);
>  cond_syscall(sys_process_vm_readv);

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-05 21:51     ` Jeff Moyer
  0 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-05 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> This is the io_getevents equivalent of ppoll/pselect and allows to
> properly mix signals and aio completions (especially with IOCB_CMD_POLL)
> and atomically executes the following sequence:
>
> 	sigset_t origmask;
>
> 	pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
> 	ret = io_getevents(ctx, min_nr, nr, events, timeout);
> 	pthread_sigmask(SIG_SETMASK, &origmask, NULL);
>
> Note that unlike many other signal related calls we do not pass a sigmask
> size, as that would get us to 7 arguments, which aren't easily supported
> by the syscall infrastructure.  It seems a lot less painful to just add a
> new syscall variant in the unlikely case we're going to increase the
> sigset size.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I acked this in the last set, so...

Acked-by: Jeff Moyer <jmoyer@redhat.com>

> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  fs/aio.c                               | 114 ++++++++++++++++++++++++++++++---
>  include/linux/compat.h                 |   7 ++
>  include/linux/syscalls.h               |   6 ++
>  include/uapi/asm-generic/unistd.h      |   4 +-
>  include/uapi/linux/aio_abi.h           |   6 ++
>  kernel/sys_ni.c                        |   2 +
>  8 files changed, 130 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 448ac2161112..5997c3e9ac3e 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -391,3 +391,4 @@
>  382	i386	pkey_free		sys_pkey_free
>  383	i386	statx			sys_statx
>  384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
> +385	i386	io_pgetevents		sys_io_pgetevents		compat_sys_io_pgetevents
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 5aef183e2f85..e995cd2b4e65 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -339,6 +339,7 @@
>  330	common	pkey_alloc		sys_pkey_alloc
>  331	common	pkey_free		sys_pkey_free
>  332	common	statx			sys_statx
> +333	common	io_pgetevents		sys_io_pgetevents
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/aio.c b/fs/aio.c
> index 9d7d6e4cde87..da87cbf7c67a 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1291,10 +1291,6 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
>  		wait_event_interruptible_hrtimeout(ctx->wait,
>  				aio_read_events(ctx, min_nr, nr, event, &ret),
>  				until);
> -
> -	if (!ret && signal_pending(current))
> -		ret = -EINTR;
> -
>  	return ret;
>  }
>  
> @@ -1874,13 +1870,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
>  		struct timespec __user *, timeout)
>  {
>  	struct timespec64	ts;
> +	int			ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +SYSCALL_DEFINE6(io_pgetevents,
> +		aio_context_t, ctx_id,
> +		long, min_nr,
> +		long, nr,
> +		struct io_event __user *, events,
> +		struct timespec __user *, timeout,
> +		const struct __aio_sigset __user *, usig)
> +{
> +	struct __aio_sigset	ksig = { NULL, };
> +	sigset_t		ksigmask, sigsaved;
> +	struct timespec64	ts;
> +	int ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (unlikely(get_timespec64(&ts, timeout)))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(sigset_t))
> +			return -EINVAL;
> +		if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	return ret;
>  }
>  
>  #ifdef CONFIG_COMPAT
> @@ -1891,13 +1934,64 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id,
>  		       struct compat_timespec __user *, timeout)
>  {
>  	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +
> +struct __compat_aio_sigset {
> +	compat_sigset_t __user	*sigmask;
> +	compat_size_t		sigsetsize;
> +};
> +
> +COMPAT_SYSCALL_DEFINE6(io_pgetevents,
> +		compat_aio_context_t, ctx_id,
> +		compat_long_t, min_nr,
> +		compat_long_t, nr,
> +		struct io_event __user *, events,
> +		struct compat_timespec __user *, timeout,
> +		const struct __compat_aio_sigset __user *, usig)
> +{
> +	struct __compat_aio_sigset ksig = { NULL, };
> +	sigset_t ksigmask, sigsaved;
> +	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (compat_get_timespec64(&t, timeout))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(compat_sigset_t))
> +			return -EINVAL;
> +		if (get_compat_sigset(&ksigmask, ksig.sigmask))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
>  
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	return ret;
>  }
>  #endif
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index 8a9643857c4a..bfb8a94fbabd 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -303,6 +303,7 @@ extern int put_compat_rusage(const struct rusage *,
>  			     struct compat_rusage __user *);
>  
>  struct compat_siginfo;
> +struct __compat_aio_sigset;
>  
>  extern asmlinkage long compat_sys_waitid(int, compat_pid_t,
>  		struct compat_siginfo __user *, int,
> @@ -634,6 +635,12 @@ asmlinkage long compat_sys_io_getevents(compat_aio_context_t ctx_id,
>  					compat_long_t nr,
>  					struct io_event __user *events,
>  					struct compat_timespec __user *timeout);
> +asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
> +					compat_long_t min_nr,
> +					compat_long_t nr,
> +					struct io_event __user *events,
> +					struct compat_timespec __user *timeout,
> +					const struct __compat_aio_sigset __user *usig);
>  asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr,
>  				     u32 __user *iocb);
>  asmlinkage long compat_sys_mount(const char __user *dev_name,
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index a78186d826d7..8515ec53c81b 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -539,6 +539,12 @@ asmlinkage long sys_io_getevents(aio_context_t ctx_id,
>  				long nr,
>  				struct io_event __user *events,
>  				struct timespec __user *timeout);
> +asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
> +				long min_nr,
> +				long nr,
> +				struct io_event __user *events,
> +				struct timespec __user *timeout,
> +				const struct __aio_sigset *sig);
>  asmlinkage long sys_io_submit(aio_context_t, long,
>  				struct iocb __user * __user *);
>  asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 8b87de067bc7..ce2ebbeece10 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
>  __SYSCALL(__NR_pkey_free,     sys_pkey_free)
>  #define __NR_statx 291
>  __SYSCALL(__NR_statx,     sys_statx)
> +#define __NR_io_pgetevents 292
> +__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 292
> +#define __NR_syscalls 293
>  
>  /*
>   * All syscalls below here should go away really,
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index a04adbc70ddf..2c0a3415beee 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -29,6 +29,7 @@
>  
>  #include <linux/types.h>
>  #include <linux/fs.h>
> +#include <linux/signal.h>
>  #include <asm/byteorder.h>
>  
>  typedef __kernel_ulong_t aio_context_t;
> @@ -108,5 +109,10 @@ struct iocb {
>  #undef IFBIG
>  #undef IFLITTLE
>  
> +struct __aio_sigset {
> +	sigset_t __user	*sigmask;
> +	size_t		sigsetsize;
> +};
> +
>  #endif /* __LINUX__AIO_ABI_H */
>  
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index b5189762d275..8f7705559b38 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -151,9 +151,11 @@ cond_syscall(sys_io_destroy);
>  cond_syscall(sys_io_submit);
>  cond_syscall(sys_io_cancel);
>  cond_syscall(sys_io_getevents);
> +cond_syscall(sys_io_pgetevents);
>  cond_syscall(compat_sys_io_setup);
>  cond_syscall(compat_sys_io_submit);
>  cond_syscall(compat_sys_io_getevents);
> +cond_syscall(compat_sys_io_pgetevents);
>  cond_syscall(sys_sysfs);
>  cond_syscall(sys_syslog);
>  cond_syscall(sys_process_vm_readv);

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-05 21:51     ` Jeff Moyer
  0 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-05 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> This is the io_getevents equivalent of ppoll/pselect and allows to
> properly mix signals and aio completions (especially with IOCB_CMD_POLL)
> and atomically executes the following sequence:
>
> 	sigset_t origmask;
>
> 	pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
> 	ret = io_getevents(ctx, min_nr, nr, events, timeout);
> 	pthread_sigmask(SIG_SETMASK, &origmask, NULL);
>
> Note that unlike many other signal related calls we do not pass a sigmask
> size, as that would get us to 7 arguments, which aren't easily supported
> by the syscall infrastructure.  It seems a lot less painful to just add a
> new syscall variant in the unlikely case we're going to increase the
> sigset size.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I acked this in the last set, so...

Acked-by: Jeff Moyer <jmoyer@redhat.com>

> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  fs/aio.c                               | 114 ++++++++++++++++++++++++++++++---
>  include/linux/compat.h                 |   7 ++
>  include/linux/syscalls.h               |   6 ++
>  include/uapi/asm-generic/unistd.h      |   4 +-
>  include/uapi/linux/aio_abi.h           |   6 ++
>  kernel/sys_ni.c                        |   2 +
>  8 files changed, 130 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 448ac2161112..5997c3e9ac3e 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -391,3 +391,4 @@
>  382	i386	pkey_free		sys_pkey_free
>  383	i386	statx			sys_statx
>  384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
> +385	i386	io_pgetevents		sys_io_pgetevents		compat_sys_io_pgetevents
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 5aef183e2f85..e995cd2b4e65 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -339,6 +339,7 @@
>  330	common	pkey_alloc		sys_pkey_alloc
>  331	common	pkey_free		sys_pkey_free
>  332	common	statx			sys_statx
> +333	common	io_pgetevents		sys_io_pgetevents
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/aio.c b/fs/aio.c
> index 9d7d6e4cde87..da87cbf7c67a 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1291,10 +1291,6 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
>  		wait_event_interruptible_hrtimeout(ctx->wait,
>  				aio_read_events(ctx, min_nr, nr, event, &ret),
>  				until);
> -
> -	if (!ret && signal_pending(current))
> -		ret = -EINTR;
> -
>  	return ret;
>  }
>  
> @@ -1874,13 +1870,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
>  		struct timespec __user *, timeout)
>  {
>  	struct timespec64	ts;
> +	int			ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +SYSCALL_DEFINE6(io_pgetevents,
> +		aio_context_t, ctx_id,
> +		long, min_nr,
> +		long, nr,
> +		struct io_event __user *, events,
> +		struct timespec __user *, timeout,
> +		const struct __aio_sigset __user *, usig)
> +{
> +	struct __aio_sigset	ksig = { NULL, };
> +	sigset_t		ksigmask, sigsaved;
> +	struct timespec64	ts;
> +	int ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (unlikely(get_timespec64(&ts, timeout)))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(sigset_t))
> +			return -EINVAL;
> +		if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	return ret;
>  }
>  
>  #ifdef CONFIG_COMPAT
> @@ -1891,13 +1934,64 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id,
>  		       struct compat_timespec __user *, timeout)
>  {
>  	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +
> +struct __compat_aio_sigset {
> +	compat_sigset_t __user	*sigmask;
> +	compat_size_t		sigsetsize;
> +};
> +
> +COMPAT_SYSCALL_DEFINE6(io_pgetevents,
> +		compat_aio_context_t, ctx_id,
> +		compat_long_t, min_nr,
> +		compat_long_t, nr,
> +		struct io_event __user *, events,
> +		struct compat_timespec __user *, timeout,
> +		const struct __compat_aio_sigset __user *, usig)
> +{
> +	struct __compat_aio_sigset ksig = { NULL, };
> +	sigset_t ksigmask, sigsaved;
> +	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (compat_get_timespec64(&t, timeout))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(compat_sigset_t))
> +			return -EINVAL;
> +		if (get_compat_sigset(&ksigmask, ksig.sigmask))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
>  
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	return ret;
>  }
>  #endif
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index 8a9643857c4a..bfb8a94fbabd 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -303,6 +303,7 @@ extern int put_compat_rusage(const struct rusage *,
>  			     struct compat_rusage __user *);
>  
>  struct compat_siginfo;
> +struct __compat_aio_sigset;
>  
>  extern asmlinkage long compat_sys_waitid(int, compat_pid_t,
>  		struct compat_siginfo __user *, int,
> @@ -634,6 +635,12 @@ asmlinkage long compat_sys_io_getevents(compat_aio_context_t ctx_id,
>  					compat_long_t nr,
>  					struct io_event __user *events,
>  					struct compat_timespec __user *timeout);
> +asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
> +					compat_long_t min_nr,
> +					compat_long_t nr,
> +					struct io_event __user *events,
> +					struct compat_timespec __user *timeout,
> +					const struct __compat_aio_sigset __user *usig);
>  asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr,
>  				     u32 __user *iocb);
>  asmlinkage long compat_sys_mount(const char __user *dev_name,
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index a78186d826d7..8515ec53c81b 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -539,6 +539,12 @@ asmlinkage long sys_io_getevents(aio_context_t ctx_id,
>  				long nr,
>  				struct io_event __user *events,
>  				struct timespec __user *timeout);
> +asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
> +				long min_nr,
> +				long nr,
> +				struct io_event __user *events,
> +				struct timespec __user *timeout,
> +				const struct __aio_sigset *sig);
>  asmlinkage long sys_io_submit(aio_context_t, long,
>  				struct iocb __user * __user *);
>  asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 8b87de067bc7..ce2ebbeece10 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
>  __SYSCALL(__NR_pkey_free,     sys_pkey_free)
>  #define __NR_statx 291
>  __SYSCALL(__NR_statx,     sys_statx)
> +#define __NR_io_pgetevents 292
> +__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 292
> +#define __NR_syscalls 293
>  
>  /*
>   * All syscalls below here should go away really,
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index a04adbc70ddf..2c0a3415beee 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -29,6 +29,7 @@
>  
>  #include <linux/types.h>
>  #include <linux/fs.h>
> +#include <linux/signal.h>
>  #include <asm/byteorder.h>
>  
>  typedef __kernel_ulong_t aio_context_t;
> @@ -108,5 +109,10 @@ struct iocb {
>  #undef IFBIG
>  #undef IFLITTLE
>  
> +struct __aio_sigset {
> +	sigset_t __user	*sigmask;
> +	size_t		sigsetsize;
> +};
> +
>  #endif /* __LINUX__AIO_ABI_H */
>  
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index b5189762d275..8f7705559b38 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -151,9 +151,11 @@ cond_syscall(sys_io_destroy);
>  cond_syscall(sys_io_submit);
>  cond_syscall(sys_io_cancel);
>  cond_syscall(sys_io_getevents);
> +cond_syscall(sys_io_pgetevents);
>  cond_syscall(compat_sys_io_setup);
>  cond_syscall(compat_sys_io_submit);
>  cond_syscall(compat_sys_io_getevents);
> +cond_syscall(compat_sys_io_pgetevents);
>  cond_syscall(sys_sysfs);
>  cond_syscall(sys_syslog);
>  cond_syscall(sys_process_vm_readv);

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 14/36] aio: implement IOCB_CMD_POLL
  2018-03-05 21:27   ` Christoph Hellwig
  (?)
@ 2018-03-05 21:51   ` Jeff Moyer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-05 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> Simple one-shot poll through the io_submit() interface.  To poll for
> a file descriptor the application should submit an iocb of type
> IOCB_CMD_POLL.  It will poll the fd for the events specified in the
> the first 32 bits of the aio_buf field of the iocb.
>
> Unlike poll or epoll without EPOLLONESHOT this interface always works
> in one shot mode, that is once the iocb is completed, it will have to be
> resubmitted.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Also acked this one in the last posting.

Acked-by: Jeff Moyer <jmoyer@redhat.com>


> ---
>  fs/aio.c                     | 102 +++++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/aio_abi.h |   6 +--
>  2 files changed, 104 insertions(+), 4 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index da87cbf7c67a..0bafc4975d51 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -5,6 +5,7 @@
>   *	Implements an efficient asynchronous io interface.
>   *
>   *	Copyright 2000, 2001, 2002 Red Hat, Inc.  All Rights Reserved.
> + *	Copyright 2018 Christoph Hellwig.
>   *
>   *	See ../COPYING for licensing terms.
>   */
> @@ -156,9 +157,17 @@ struct kioctx {
>  	unsigned		id;
>  };
>  
> +struct poll_iocb {
> +	struct file		*file;
> +	__poll_t		events;
> +	struct wait_queue_head	*head;
> +	struct wait_queue_entry	wait;
> +};
> +
>  struct aio_kiocb {
>  	union {
>  		struct kiocb		rw;
> +		struct poll_iocb	poll;
>  	};
>  
>  	struct kioctx		*ki_ctx;
> @@ -1565,6 +1574,96 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
>  	return ret;
>  }
>  
> +static void __aio_complete_poll(struct poll_iocb *req, __poll_t mask)
> +{
> +	fput(req->file);
> +	aio_complete(container_of(req, struct aio_kiocb, poll),
> +			mangle_poll(mask), 0);
> +}
> +
> +static void aio_complete_poll(struct poll_iocb *req, __poll_t mask)
> +{
> +	struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
> +
> +	if (!(iocb->flags & AIO_IOCB_CANCELLED))
> +		__aio_complete_poll(req, mask);
> +}
> +
> +static int aio_poll_cancel(struct kiocb *rw)
> +{
> +	struct aio_kiocb *iocb = container_of(rw, struct aio_kiocb, rw);
> +
> +	remove_wait_queue(iocb->poll.head, &iocb->poll.wait);
> +	__aio_complete_poll(&iocb->poll, 0); /* no events to report */
> +	return 0;
> +}
> +
> +static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
> +		void *key)
> +{
> +	struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
> +	struct file *file = req->file;
> +	__poll_t mask = key_to_poll(key);
> +
> +	assert_spin_locked(&req->head->lock);
> +
> +	/* for instances that support it check for an event match first: */
> +	if (mask && !(mask & req->events))
> +		return 0;
> +
> +	mask = vfs_poll_mask(file, req->events);
> +	if (!mask)
> +		return 0;
> +
> +	__remove_wait_queue(req->head, &req->wait);
> +	aio_complete_poll(req, mask);
> +	return 1;
> +}
> +
> +static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
> +{
> +	struct poll_iocb *req = &aiocb->poll;
> +	unsigned long flags;
> +	__poll_t mask;
> +
> +	/* reject any unknown events outside the normal event mask. */
> +	if ((u16)iocb->aio_buf != iocb->aio_buf)
> +		return -EINVAL;
> +	/* reject fields that are not defined for poll */
> +	if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
> +		return -EINVAL;
> +
> +	req->events = demangle_poll(iocb->aio_buf) | POLLERR | POLLHUP;
> +	req->file = fget(iocb->aio_fildes);
> +	if (unlikely(!req->file))
> +		return -EBADF;
> +
> +	req->head = vfs_get_poll_head(req->file, req->events);
> +	if (!req->head) {
> +		fput(req->file);
> +		return -EINVAL; /* same as no support for IOCB_CMD_POLL */
> +	}
> +	if (IS_ERR(req->head)) {
> +		mask = PTR_TO_POLL(req->head);
> +		goto done;
> +	}
> +
> +	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
> +
> +	spin_lock_irqsave(&req->head->lock, flags);
> +	mask = vfs_poll_mask(req->file, req->events);
> +	if (!mask) {
> +		__kiocb_set_cancel_fn(aiocb, aio_poll_cancel,
> +				AIO_IOCB_DELAYED_CANCEL);
> +		__add_wait_queue(req->head, &req->wait);
> +	}
> +	spin_unlock_irqrestore(&req->head->lock, flags);
> +done:
> +	if (mask)
> +		aio_complete_poll(req, mask);
> +	return -EIOCBQUEUED;
> +}
> +
>  static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  			 struct iocb *iocb, bool compat)
>  {
> @@ -1628,6 +1727,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  	case IOCB_CMD_PWRITEV:
>  		ret = aio_write(&req->rw, iocb, true, compat);
>  		break;
> +	case IOCB_CMD_POLL:
> +		ret = aio_poll(req, iocb);
> +		break;
>  	default:
>  		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
>  		ret = -EINVAL;
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index 2c0a3415beee..ed0185945bb2 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -39,10 +39,8 @@ enum {
>  	IOCB_CMD_PWRITE = 1,
>  	IOCB_CMD_FSYNC = 2,
>  	IOCB_CMD_FDSYNC = 3,
> -	/* These two are experimental.
> -	 * IOCB_CMD_PREADX = 4,
> -	 * IOCB_CMD_POLL = 5,
> -	 */
> +	/* 4 was the experimental IOCB_CMD_PREADX */
> +	IOCB_CMD_POLL = 5,
>  	IOCB_CMD_NOOP = 6,
>  	IOCB_CMD_PREADV = 7,
>  	IOCB_CMD_PWRITEV = 8,

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: aio poll, io_pgetevents and a new in-kernel poll API V5
  2018-03-05 21:27 ` Christoph Hellwig
                   ` (36 preceding siblings ...)
  (?)
@ 2018-03-13  7:46 ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-13  7:46 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

ping?

On Mon, Mar 05, 2018 at 01:27:07PM -0800, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds support for the IOCB_CMD_POLL operation to poll for the
> readyness of file descriptors using the aio subsystem.  The API is based
> on patches that existed in RHAS2.1 and RHEL3, which means it already is
> supported by libaio.  To implement the poll support efficiently new
> methods to poll are introduced in struct file_operations:  get_poll_head
> and poll_mask.  The first one returns a wait_queue_head to wait on
> (lifetime is bound by the file), and the second does a non-blocking
> check for the POLL* events.  This allows aio poll to work without
> any additional context switches, unlike epoll.
> 
> To make the interface fully useful a new io_pgetevents system call is
> added, which atomically saves and restores the signal mask over the
> io_pgetevents system call.  It it the logical equivalent to pselect and
> ppoll for io_pgetevents.
> 
> The corresponding libaio changes for io_pgetevents support and
> documentation, as well as a test case will be posted in a separate
> series.
> 
> The changes were sponsored by Scylladb, and improve performance
> of the seastar framework up to 10%, while also removing the need
> for a privileged SCHED_FIFO epoll listener thread.
> 
>     git://git.infradead.org/users/hch/vfs.git aio-poll.5
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.5
> 
> Libaio changes:
> 
>     https://pagure.io/libaio.git io-poll
> 
> Seastar changes (not updated for the new io_pgetevens ABI yet):
> 
>     https://github.com/avikivity/seastar/commits/aio
> 
> Changes since V4:
>  - rebased ontop of Linux 4.16-rc4
> 
> Changes since V3:
>  - remove the pre-sleep ->poll_mask call in vfs_poll,
>    allow ->get_poll_head to return POLL* values.
> 
> Changes since V2:
>  - removed a double initialization
>  - new vfs_get_poll_head helper
>  - document that ->get_poll_head can return NULL
>  - call ->poll_mask before sleeping
>  - various ACKs
>  - add conversion of random to ->poll_mask
>  - add conversion of af_alg to ->poll_mask
>  - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
>  - reshuffled the series so that prep patches and everything not
>    requiring the new in-kernel poll API is in the beginning
> 
> Changes since V1:
>  - handle the NULL ->poll case in vfs_poll
>  - dropped the file argument to the ->poll_mask socket operation
>  - replace the ->pre_poll socket operation with ->get_poll_head as
>    in the file operations
---end quoted text---

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: aio poll, io_pgetevents and a new in-kernel poll API V5
  2018-03-05 21:27 ` Christoph Hellwig
@ 2018-03-19  8:35   ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-19  8:35 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Hi everyone,

this posting has been out for 2 weeks now, without any replies except
for Jeff reminding of the missing reviews he already gave me in the
previous version.

Canwe make some progress on this?  Especially as all the aio bits
have gotten reviews from Jeff.

On Mon, Mar 05, 2018 at 01:27:07PM -0800, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds support for the IOCB_CMD_POLL operation to poll for the
> readyness of file descriptors using the aio subsystem.  The API is based
> on patches that existed in RHAS2.1 and RHEL3, which means it already is
> supported by libaio.  To implement the poll support efficiently new
> methods to poll are introduced in struct file_operations:  get_poll_head
> and poll_mask.  The first one returns a wait_queue_head to wait on
> (lifetime is bound by the file), and the second does a non-blocking
> check for the POLL* events.  This allows aio poll to work without
> any additional context switches, unlike epoll.
> 
> To make the interface fully useful a new io_pgetevents system call is
> added, which atomically saves and restores the signal mask over the
> io_pgetevents system call.  It it the logical equivalent to pselect and
> ppoll for io_pgetevents.
> 
> The corresponding libaio changes for io_pgetevents support and
> documentation, as well as a test case will be posted in a separate
> series.
> 
> The changes were sponsored by Scylladb, and improve performance
> of the seastar framework up to 10%, while also removing the need
> for a privileged SCHED_FIFO epoll listener thread.
> 
>     git://git.infradead.org/users/hch/vfs.git aio-poll.5
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.5
> 
> Libaio changes:
> 
>     https://pagure.io/libaio.git io-poll
> 
> Seastar changes (not updated for the new io_pgetevens ABI yet):
> 
>     https://github.com/avikivity/seastar/commits/aio
> 
> Changes since V4:
>  - rebased ontop of Linux 4.16-rc4
> 
> Changes since V3:
>  - remove the pre-sleep ->poll_mask call in vfs_poll,
>    allow ->get_poll_head to return POLL* values.
> 
> Changes since V2:
>  - removed a double initialization
>  - new vfs_get_poll_head helper
>  - document that ->get_poll_head can return NULL
>  - call ->poll_mask before sleeping
>  - various ACKs
>  - add conversion of random to ->poll_mask
>  - add conversion of af_alg to ->poll_mask
>  - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
>  - reshuffled the series so that prep patches and everything not
>    requiring the new in-kernel poll API is in the beginning
> 
> Changes since V1:
>  - handle the NULL ->poll case in vfs_poll
>  - dropped the file argument to the ->poll_mask socket operation
>  - replace the ->pre_poll socket operation with ->get_poll_head as
>    in the file operations
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
---end quoted text---

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: aio poll, io_pgetevents and a new in-kernel poll API V5
@ 2018-03-19  8:35   ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-19  8:35 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Hi everyone,

this posting has been out for 2 weeks now, without any replies except
for Jeff reminding of the missing reviews he already gave me in the
previous version.

Canwe make some progress on this?  Especially as all the aio bits
have gotten reviews from Jeff.

On Mon, Mar 05, 2018 at 01:27:07PM -0800, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds support for the IOCB_CMD_POLL operation to poll for the
> readyness of file descriptors using the aio subsystem.  The API is based
> on patches that existed in RHAS2.1 and RHEL3, which means it already is
> supported by libaio.  To implement the poll support efficiently new
> methods to poll are introduced in struct file_operations:  get_poll_head
> and poll_mask.  The first one returns a wait_queue_head to wait on
> (lifetime is bound by the file), and the second does a non-blocking
> check for the POLL* events.  This allows aio poll to work without
> any additional context switches, unlike epoll.
> 
> To make the interface fully useful a new io_pgetevents system call is
> added, which atomically saves and restores the signal mask over the
> io_pgetevents system call.  It it the logical equivalent to pselect and
> ppoll for io_pgetevents.
> 
> The corresponding libaio changes for io_pgetevents support and
> documentation, as well as a test case will be posted in a separate
> series.
> 
> The changes were sponsored by Scylladb, and improve performance
> of the seastar framework up to 10%, while also removing the need
> for a privileged SCHED_FIFO epoll listener thread.
> 
>     git://git.infradead.org/users/hch/vfs.git aio-poll.5
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.5
> 
> Libaio changes:
> 
>     https://pagure.io/libaio.git io-poll
> 
> Seastar changes (not updated for the new io_pgetevens ABI yet):
> 
>     https://github.com/avikivity/seastar/commits/aio
> 
> Changes since V4:
>  - rebased ontop of Linux 4.16-rc4
> 
> Changes since V3:
>  - remove the pre-sleep ->poll_mask call in vfs_poll,
>    allow ->get_poll_head to return POLL* values.
> 
> Changes since V2:
>  - removed a double initialization
>  - new vfs_get_poll_head helper
>  - document that ->get_poll_head can return NULL
>  - call ->poll_mask before sleeping
>  - various ACKs
>  - add conversion of random to ->poll_mask
>  - add conversion of af_alg to ->poll_mask
>  - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
>  - reshuffled the series so that prep patches and everything not
>    requiring the new in-kernel poll API is in the beginning
> 
> Changes since V1:
>  - handle the NULL ->poll case in vfs_poll
>  - dropped the file argument to the ->poll_mask socket operation
>  - replace the ->pre_poll socket operation with ->get_poll_head as
>    in the file operations
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
---end quoted text---

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 01/36] aio: don't print the page size at boot time
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  0:11     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:08PM -0800, Christoph Hellwig wrote:
> The page size is in no way related to the aio code, and printing it in
> the (debug) dmesg at every boot serves no purpose.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index a062d75109cb..03d59593912d 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -264,9 +264,6 @@ static int __init aio_setup(void)
>  
>  	kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
>  	kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
> -
> -	pr_debug("sizeof(struct page) = %zu\n", sizeof(struct page));
> -
>  	return 0;
>  }
>  __initcall(aio_setup);
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 01/36] aio: don't print the page size at boot time
@ 2018-03-20  0:11     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:08PM -0800, Christoph Hellwig wrote:
> The page size is in no way related to the aio code, and printing it in
> the (debug) dmesg at every boot serves no purpose.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index a062d75109cb..03d59593912d 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -264,9 +264,6 @@ static int __init aio_setup(void)
>  
>  	kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
>  	kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
> -
> -	pr_debug("sizeof(struct page) = %zu\n", sizeof(struct page));
> -
>  	return 0;
>  }
>  __initcall(aio_setup);
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/36] aio: remove an outdated comment in aio_complete
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  0:12     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:09PM -0800, Christoph Hellwig wrote:
> These days we don't treat sync iocbs special in the aio completion code as
> they never use it.  Remove the old comment, and move the BUG_ON for a sync
> iocb to the top of the function.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 03d59593912d..41fc8ce6bc7f 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1088,6 +1088,8 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
>  	unsigned tail, pos, head;
>  	unsigned long	flags;
>  
> +	BUG_ON(is_sync_kiocb(kiocb));
> +
>  	if (kiocb->ki_flags & IOCB_WRITE) {
>  		struct file *file = kiocb->ki_filp;
>  
> @@ -1100,15 +1102,6 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
>  		file_end_write(file);
>  	}
>  
> -	/*
> -	 * Special case handling for sync iocbs:
> -	 *  - events go directly into the iocb for fast handling
> -	 *  - the sync task with the iocb in its stack holds the single iocb
> -	 *    ref, no other paths have a way to get another ref
> -	 *  - the sync task helpfully left a reference to itself in the iocb
> -	 */
> -	BUG_ON(is_sync_kiocb(kiocb));
> -
>  	if (iocb->ki_list.next) {
>  		unsigned long flags;
>  
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/36] aio: remove an outdated comment in aio_complete
@ 2018-03-20  0:12     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:09PM -0800, Christoph Hellwig wrote:
> These days we don't treat sync iocbs special in the aio completion code as
> they never use it.  Remove the old comment, and move the BUG_ON for a sync
> iocb to the top of the function.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 03d59593912d..41fc8ce6bc7f 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1088,6 +1088,8 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
>  	unsigned tail, pos, head;
>  	unsigned long	flags;
>  
> +	BUG_ON(is_sync_kiocb(kiocb));
> +
>  	if (kiocb->ki_flags & IOCB_WRITE) {
>  		struct file *file = kiocb->ki_filp;
>  
> @@ -1100,15 +1102,6 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
>  		file_end_write(file);
>  	}
>  
> -	/*
> -	 * Special case handling for sync iocbs:
> -	 *  - events go directly into the iocb for fast handling
> -	 *  - the sync task with the iocb in its stack holds the single iocb
> -	 *    ref, no other paths have a way to get another ref
> -	 *  - the sync task helpfully left a reference to itself in the iocb
> -	 */
> -	BUG_ON(is_sync_kiocb(kiocb));
> -
>  	if (iocb->ki_list.next) {
>  		unsigned long flags;
>  
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 03/36] aio: refactor read/write iocb setup
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  0:19     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:10PM -0800, Christoph Hellwig wrote:
> Don't reference the kiocb structure from the common aio code, and move
> any use of it into helper specific to the read/write path.  This is in
> preparation for aio_poll support that wants to use the space for different
> fields.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Looks straightforward enough to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 171 ++++++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 97 insertions(+), 74 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 41fc8ce6bc7f..6295fc00f104 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -170,7 +170,9 @@ struct kioctx {
>  #define KIOCB_CANCELLED		((void *) (~0ULL))
>  
>  struct aio_kiocb {
> -	struct kiocb		common;
> +	union {
> +		struct kiocb		rw;
> +	};
>  
>  	struct kioctx		*ki_ctx;
>  	kiocb_cancel_fn		*ki_cancel;
> @@ -549,7 +551,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
>  
>  void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
>  {
> -	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, common);
> +	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
>  	struct kioctx *ctx = req->ki_ctx;
>  	unsigned long flags;
>  
> @@ -582,7 +584,7 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
>  		cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
>  	} while (cancel != old);
>  
> -	return cancel(&kiocb->common);
> +	return cancel(&kiocb->rw);
>  }
>  
>  static void free_ioctx(struct work_struct *work)
> @@ -1040,15 +1042,6 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
>  	return NULL;
>  }
>  
> -static void kiocb_free(struct aio_kiocb *req)
> -{
> -	if (req->common.ki_filp)
> -		fput(req->common.ki_filp);
> -	if (req->ki_eventfd != NULL)
> -		eventfd_ctx_put(req->ki_eventfd);
> -	kmem_cache_free(kiocb_cachep, req);
> -}
> -
>  static struct kioctx *lookup_ioctx(unsigned long ctx_id)
>  {
>  	struct aio_ring __user *ring  = (void __user *)ctx_id;
> @@ -1079,29 +1072,14 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
>  /* aio_complete
>   *	Called when the io request on the given iocb is complete.
>   */
> -static void aio_complete(struct kiocb *kiocb, long res, long res2)
> +static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
>  {
> -	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, common);
>  	struct kioctx	*ctx = iocb->ki_ctx;
>  	struct aio_ring	*ring;
>  	struct io_event	*ev_page, *event;
>  	unsigned tail, pos, head;
>  	unsigned long	flags;
>  
> -	BUG_ON(is_sync_kiocb(kiocb));
> -
> -	if (kiocb->ki_flags & IOCB_WRITE) {
> -		struct file *file = kiocb->ki_filp;
> -
> -		/*
> -		 * Tell lockdep we inherited freeze protection from submission
> -		 * thread.
> -		 */
> -		if (S_ISREG(file_inode(file)->i_mode))
> -			__sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE);
> -		file_end_write(file);
> -	}
> -
>  	if (iocb->ki_list.next) {
>  		unsigned long flags;
>  
> @@ -1163,11 +1141,12 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
>  	 * eventfd. The eventfd_signal() function is safe to be called
>  	 * from IRQ context.
>  	 */
> -	if (iocb->ki_eventfd != NULL)
> +	if (iocb->ki_eventfd) {
>  		eventfd_signal(iocb->ki_eventfd, 1);
> +		eventfd_ctx_put(iocb->ki_eventfd);
> +	}
>  
> -	/* everything turned out well, dispose of the aiocb. */
> -	kiocb_free(iocb);
> +	kmem_cache_free(kiocb_cachep, iocb);
>  
>  	/*
>  	 * We have to order our ring_info tail store above and test
> @@ -1430,6 +1409,47 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
>  	return -EINVAL;
>  }
>  
> +static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
> +{
> +	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
> +
> +	WARN_ON_ONCE(is_sync_kiocb(kiocb));
> +
> +	if (kiocb->ki_flags & IOCB_WRITE) {
> +		struct inode *inode = file_inode(kiocb->ki_filp);
> +
> +		/*
> +		 * Tell lockdep we inherited freeze protection from submission
> +		 * thread.
> +		 */
> +		if (S_ISREG(inode->i_mode))
> +			__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
> +		file_end_write(kiocb->ki_filp);
> +	}
> +
> +	fput(kiocb->ki_filp);
> +	aio_complete(iocb, res, res2);
> +}
> +
> +static int aio_prep_rw(struct kiocb *req, struct iocb *iocb)
> +{
> +	int ret;
> +
> +	req->ki_filp = fget(iocb->aio_fildes);
> +	if (unlikely(!req->ki_filp))
> +		return -EBADF;
> +	req->ki_complete = aio_complete_rw;
> +	req->ki_pos = iocb->aio_offset;
> +	req->ki_flags = iocb_flags(req->ki_filp);
> +	if (iocb->aio_flags & IOCB_FLAG_RESFD)
> +		req->ki_flags |= IOCB_EVENTFD;
> +	req->ki_hint = file_write_hint(req->ki_filp);
> +	ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
> +	if (unlikely(ret))
> +		fput(req->ki_filp);
> +	return ret;
> +}
> +
>  static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
>  		bool vectored, bool compat, struct iov_iter *iter)
>  {
> @@ -1449,7 +1469,7 @@ static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
>  	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
>  }
>  
> -static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
> +static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
>  {
>  	switch (ret) {
>  	case -EIOCBQUEUED:
> @@ -1465,7 +1485,7 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
>  		ret = -EINTR;
>  		/*FALLTHRU*/
>  	default:
> -		aio_complete(req, ret, 0);
> +		aio_complete_rw(req, ret, 0);
>  		return 0;
>  	}
>  }
> @@ -1473,56 +1493,78 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
>  static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
>  		bool compat)
>  {
> -	struct file *file = req->ki_filp;
>  	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
>  	struct iov_iter iter;
> +	struct file *file;
>  	ssize_t ret;
>  
> +	ret = aio_prep_rw(req, iocb);
> +	if (ret)
> +		return ret;
> +	file = req->ki_filp;
> +
> +	ret = -EBADF;
>  	if (unlikely(!(file->f_mode & FMODE_READ)))
> -		return -EBADF;
> +		goto out_fput;
> +	ret = -EINVAL;
>  	if (unlikely(!file->f_op->read_iter))
> -		return -EINVAL;
> +		goto out_fput;
>  
>  	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
>  	if (ret)
> -		return ret;
> +		goto out_fput;
>  	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
>  	if (!ret)
> -		ret = aio_ret(req, call_read_iter(file, req, &iter));
> +		ret = aio_rw_ret(req, call_read_iter(file, req, &iter));
>  	kfree(iovec);
> +out_fput:
> +	if (unlikely(ret && ret != -EIOCBQUEUED))
> +		fput(file);
>  	return ret;
>  }
>  
>  static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
>  		bool compat)
>  {
> -	struct file *file = req->ki_filp;
>  	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
>  	struct iov_iter iter;
> +	struct file *file;
>  	ssize_t ret;
>  
> +	ret = aio_prep_rw(req, iocb);
> +	if (ret)
> +		return ret;
> +	file = req->ki_filp;
> +
> +	ret = -EBADF;
>  	if (unlikely(!(file->f_mode & FMODE_WRITE)))
> -		return -EBADF;
> +		goto out_fput;
> +	ret = -EINVAL;
>  	if (unlikely(!file->f_op->write_iter))
> -		return -EINVAL;
> +		goto out_fput;
>  
>  	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
>  	if (ret)
> -		return ret;
> +		goto out_fput;
>  	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
>  	if (!ret) {
> +		struct inode *inode = file_inode(file);
> +
>  		req->ki_flags |= IOCB_WRITE;
>  		file_start_write(file);
> -		ret = aio_ret(req, call_write_iter(file, req, &iter));
> +		ret = aio_rw_ret(req, call_write_iter(file, req, &iter));
>  		/*
> -		 * We release freeze protection in aio_complete().  Fool lockdep
> -		 * by telling it the lock got released so that it doesn't
> -		 * complain about held lock when we return to userspace.
> +		 * We release freeze protection in aio_complete_rw().  Fool
> +		 * lockdep by telling it the lock got released so that it
> +		 * doesn't complain about held lock when we return to userspace.
>  		 */
> -		if (S_ISREG(file_inode(file)->i_mode))
> -			__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
> +		if (S_ISREG(inode->i_mode))
> +			__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
>  	}
>  	kfree(iovec);
> +out_fput:
> +	if (unlikely(ret && ret != -EIOCBQUEUED))
> +		fput(file);
>  	return ret;
>  }
>  
> @@ -1530,7 +1572,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  			 struct iocb *iocb, bool compat)
>  {
>  	struct aio_kiocb *req;
> -	struct file *file;
>  	ssize_t ret;
>  
>  	/* enforce forwards compatibility on users */
> @@ -1553,16 +1594,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  	if (unlikely(!req))
>  		return -EAGAIN;
>  
> -	req->common.ki_filp = file = fget(iocb->aio_fildes);
> -	if (unlikely(!req->common.ki_filp)) {
> -		ret = -EBADF;
> -		goto out_put_req;
> -	}
> -	req->common.ki_pos = iocb->aio_offset;
> -	req->common.ki_complete = aio_complete;
> -	req->common.ki_flags = iocb_flags(req->common.ki_filp);
> -	req->common.ki_hint = file_write_hint(file);
> -
>  	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
>  		/*
>  		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
> @@ -1576,14 +1607,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  			req->ki_eventfd = NULL;
>  			goto out_put_req;
>  		}
> -
> -		req->common.ki_flags |= IOCB_EVENTFD;
> -	}
> -
> -	ret = kiocb_set_rw_flags(&req->common, iocb->aio_rw_flags);
> -	if (unlikely(ret)) {
> -		pr_debug("EINVAL: aio_rw_flags\n");
> -		goto out_put_req;
>  	}
>  
>  	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
> @@ -1595,26 +1618,24 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  	req->ki_user_iocb = user_iocb;
>  	req->ki_user_data = iocb->aio_data;
>  
> -	get_file(file);
>  	switch (iocb->aio_lio_opcode) {
>  	case IOCB_CMD_PREAD:
> -		ret = aio_read(&req->common, iocb, false, compat);
> +		ret = aio_read(&req->rw, iocb, false, compat);
>  		break;
>  	case IOCB_CMD_PWRITE:
> -		ret = aio_write(&req->common, iocb, false, compat);
> +		ret = aio_write(&req->rw, iocb, false, compat);
>  		break;
>  	case IOCB_CMD_PREADV:
> -		ret = aio_read(&req->common, iocb, true, compat);
> +		ret = aio_read(&req->rw, iocb, true, compat);
>  		break;
>  	case IOCB_CMD_PWRITEV:
> -		ret = aio_write(&req->common, iocb, true, compat);
> +		ret = aio_write(&req->rw, iocb, true, compat);
>  		break;
>  	default:
>  		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
>  		ret = -EINVAL;
>  		break;
>  	}
> -	fput(file);
>  
>  	if (ret && ret != -EIOCBQUEUED)
>  		goto out_put_req;
> @@ -1622,7 +1643,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  out_put_req:
>  	put_reqs_available(ctx, 1);
>  	percpu_ref_put(&ctx->reqs);
> -	kiocb_free(req);
> +	if (req->ki_eventfd)
> +		eventfd_ctx_put(req->ki_eventfd);
> +	kmem_cache_free(kiocb_cachep, req);
>  	return ret;
>  }
>  
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 03/36] aio: refactor read/write iocb setup
@ 2018-03-20  0:19     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:10PM -0800, Christoph Hellwig wrote:
> Don't reference the kiocb structure from the common aio code, and move
> any use of it into helper specific to the read/write path.  This is in
> preparation for aio_poll support that wants to use the space for different
> fields.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Looks straightforward enough to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 171 ++++++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 97 insertions(+), 74 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 41fc8ce6bc7f..6295fc00f104 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -170,7 +170,9 @@ struct kioctx {
>  #define KIOCB_CANCELLED		((void *) (~0ULL))
>  
>  struct aio_kiocb {
> -	struct kiocb		common;
> +	union {
> +		struct kiocb		rw;
> +	};
>  
>  	struct kioctx		*ki_ctx;
>  	kiocb_cancel_fn		*ki_cancel;
> @@ -549,7 +551,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
>  
>  void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
>  {
> -	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, common);
> +	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
>  	struct kioctx *ctx = req->ki_ctx;
>  	unsigned long flags;
>  
> @@ -582,7 +584,7 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
>  		cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
>  	} while (cancel != old);
>  
> -	return cancel(&kiocb->common);
> +	return cancel(&kiocb->rw);
>  }
>  
>  static void free_ioctx(struct work_struct *work)
> @@ -1040,15 +1042,6 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
>  	return NULL;
>  }
>  
> -static void kiocb_free(struct aio_kiocb *req)
> -{
> -	if (req->common.ki_filp)
> -		fput(req->common.ki_filp);
> -	if (req->ki_eventfd != NULL)
> -		eventfd_ctx_put(req->ki_eventfd);
> -	kmem_cache_free(kiocb_cachep, req);
> -}
> -
>  static struct kioctx *lookup_ioctx(unsigned long ctx_id)
>  {
>  	struct aio_ring __user *ring  = (void __user *)ctx_id;
> @@ -1079,29 +1072,14 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
>  /* aio_complete
>   *	Called when the io request on the given iocb is complete.
>   */
> -static void aio_complete(struct kiocb *kiocb, long res, long res2)
> +static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
>  {
> -	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, common);
>  	struct kioctx	*ctx = iocb->ki_ctx;
>  	struct aio_ring	*ring;
>  	struct io_event	*ev_page, *event;
>  	unsigned tail, pos, head;
>  	unsigned long	flags;
>  
> -	BUG_ON(is_sync_kiocb(kiocb));
> -
> -	if (kiocb->ki_flags & IOCB_WRITE) {
> -		struct file *file = kiocb->ki_filp;
> -
> -		/*
> -		 * Tell lockdep we inherited freeze protection from submission
> -		 * thread.
> -		 */
> -		if (S_ISREG(file_inode(file)->i_mode))
> -			__sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE);
> -		file_end_write(file);
> -	}
> -
>  	if (iocb->ki_list.next) {
>  		unsigned long flags;
>  
> @@ -1163,11 +1141,12 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
>  	 * eventfd. The eventfd_signal() function is safe to be called
>  	 * from IRQ context.
>  	 */
> -	if (iocb->ki_eventfd != NULL)
> +	if (iocb->ki_eventfd) {
>  		eventfd_signal(iocb->ki_eventfd, 1);
> +		eventfd_ctx_put(iocb->ki_eventfd);
> +	}
>  
> -	/* everything turned out well, dispose of the aiocb. */
> -	kiocb_free(iocb);
> +	kmem_cache_free(kiocb_cachep, iocb);
>  
>  	/*
>  	 * We have to order our ring_info tail store above and test
> @@ -1430,6 +1409,47 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
>  	return -EINVAL;
>  }
>  
> +static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
> +{
> +	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
> +
> +	WARN_ON_ONCE(is_sync_kiocb(kiocb));
> +
> +	if (kiocb->ki_flags & IOCB_WRITE) {
> +		struct inode *inode = file_inode(kiocb->ki_filp);
> +
> +		/*
> +		 * Tell lockdep we inherited freeze protection from submission
> +		 * thread.
> +		 */
> +		if (S_ISREG(inode->i_mode))
> +			__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
> +		file_end_write(kiocb->ki_filp);
> +	}
> +
> +	fput(kiocb->ki_filp);
> +	aio_complete(iocb, res, res2);
> +}
> +
> +static int aio_prep_rw(struct kiocb *req, struct iocb *iocb)
> +{
> +	int ret;
> +
> +	req->ki_filp = fget(iocb->aio_fildes);
> +	if (unlikely(!req->ki_filp))
> +		return -EBADF;
> +	req->ki_complete = aio_complete_rw;
> +	req->ki_pos = iocb->aio_offset;
> +	req->ki_flags = iocb_flags(req->ki_filp);
> +	if (iocb->aio_flags & IOCB_FLAG_RESFD)
> +		req->ki_flags |= IOCB_EVENTFD;
> +	req->ki_hint = file_write_hint(req->ki_filp);
> +	ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
> +	if (unlikely(ret))
> +		fput(req->ki_filp);
> +	return ret;
> +}
> +
>  static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
>  		bool vectored, bool compat, struct iov_iter *iter)
>  {
> @@ -1449,7 +1469,7 @@ static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
>  	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
>  }
>  
> -static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
> +static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
>  {
>  	switch (ret) {
>  	case -EIOCBQUEUED:
> @@ -1465,7 +1485,7 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
>  		ret = -EINTR;
>  		/*FALLTHRU*/
>  	default:
> -		aio_complete(req, ret, 0);
> +		aio_complete_rw(req, ret, 0);
>  		return 0;
>  	}
>  }
> @@ -1473,56 +1493,78 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
>  static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
>  		bool compat)
>  {
> -	struct file *file = req->ki_filp;
>  	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
>  	struct iov_iter iter;
> +	struct file *file;
>  	ssize_t ret;
>  
> +	ret = aio_prep_rw(req, iocb);
> +	if (ret)
> +		return ret;
> +	file = req->ki_filp;
> +
> +	ret = -EBADF;
>  	if (unlikely(!(file->f_mode & FMODE_READ)))
> -		return -EBADF;
> +		goto out_fput;
> +	ret = -EINVAL;
>  	if (unlikely(!file->f_op->read_iter))
> -		return -EINVAL;
> +		goto out_fput;
>  
>  	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
>  	if (ret)
> -		return ret;
> +		goto out_fput;
>  	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
>  	if (!ret)
> -		ret = aio_ret(req, call_read_iter(file, req, &iter));
> +		ret = aio_rw_ret(req, call_read_iter(file, req, &iter));
>  	kfree(iovec);
> +out_fput:
> +	if (unlikely(ret && ret != -EIOCBQUEUED))
> +		fput(file);
>  	return ret;
>  }
>  
>  static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
>  		bool compat)
>  {
> -	struct file *file = req->ki_filp;
>  	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
>  	struct iov_iter iter;
> +	struct file *file;
>  	ssize_t ret;
>  
> +	ret = aio_prep_rw(req, iocb);
> +	if (ret)
> +		return ret;
> +	file = req->ki_filp;
> +
> +	ret = -EBADF;
>  	if (unlikely(!(file->f_mode & FMODE_WRITE)))
> -		return -EBADF;
> +		goto out_fput;
> +	ret = -EINVAL;
>  	if (unlikely(!file->f_op->write_iter))
> -		return -EINVAL;
> +		goto out_fput;
>  
>  	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
>  	if (ret)
> -		return ret;
> +		goto out_fput;
>  	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
>  	if (!ret) {
> +		struct inode *inode = file_inode(file);
> +
>  		req->ki_flags |= IOCB_WRITE;
>  		file_start_write(file);
> -		ret = aio_ret(req, call_write_iter(file, req, &iter));
> +		ret = aio_rw_ret(req, call_write_iter(file, req, &iter));
>  		/*
> -		 * We release freeze protection in aio_complete().  Fool lockdep
> -		 * by telling it the lock got released so that it doesn't
> -		 * complain about held lock when we return to userspace.
> +		 * We release freeze protection in aio_complete_rw().  Fool
> +		 * lockdep by telling it the lock got released so that it
> +		 * doesn't complain about held lock when we return to userspace.
>  		 */
> -		if (S_ISREG(file_inode(file)->i_mode))
> -			__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
> +		if (S_ISREG(inode->i_mode))
> +			__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
>  	}
>  	kfree(iovec);
> +out_fput:
> +	if (unlikely(ret && ret != -EIOCBQUEUED))
> +		fput(file);
>  	return ret;
>  }
>  
> @@ -1530,7 +1572,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  			 struct iocb *iocb, bool compat)
>  {
>  	struct aio_kiocb *req;
> -	struct file *file;
>  	ssize_t ret;
>  
>  	/* enforce forwards compatibility on users */
> @@ -1553,16 +1594,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  	if (unlikely(!req))
>  		return -EAGAIN;
>  
> -	req->common.ki_filp = file = fget(iocb->aio_fildes);
> -	if (unlikely(!req->common.ki_filp)) {
> -		ret = -EBADF;
> -		goto out_put_req;
> -	}
> -	req->common.ki_pos = iocb->aio_offset;
> -	req->common.ki_complete = aio_complete;
> -	req->common.ki_flags = iocb_flags(req->common.ki_filp);
> -	req->common.ki_hint = file_write_hint(file);
> -
>  	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
>  		/*
>  		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
> @@ -1576,14 +1607,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  			req->ki_eventfd = NULL;
>  			goto out_put_req;
>  		}
> -
> -		req->common.ki_flags |= IOCB_EVENTFD;
> -	}
> -
> -	ret = kiocb_set_rw_flags(&req->common, iocb->aio_rw_flags);
> -	if (unlikely(ret)) {
> -		pr_debug("EINVAL: aio_rw_flags\n");
> -		goto out_put_req;
>  	}
>  
>  	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
> @@ -1595,26 +1618,24 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  	req->ki_user_iocb = user_iocb;
>  	req->ki_user_data = iocb->aio_data;
>  
> -	get_file(file);
>  	switch (iocb->aio_lio_opcode) {
>  	case IOCB_CMD_PREAD:
> -		ret = aio_read(&req->common, iocb, false, compat);
> +		ret = aio_read(&req->rw, iocb, false, compat);
>  		break;
>  	case IOCB_CMD_PWRITE:
> -		ret = aio_write(&req->common, iocb, false, compat);
> +		ret = aio_write(&req->rw, iocb, false, compat);
>  		break;
>  	case IOCB_CMD_PREADV:
> -		ret = aio_read(&req->common, iocb, true, compat);
> +		ret = aio_read(&req->rw, iocb, true, compat);
>  		break;
>  	case IOCB_CMD_PWRITEV:
> -		ret = aio_write(&req->common, iocb, true, compat);
> +		ret = aio_write(&req->rw, iocb, true, compat);
>  		break;
>  	default:
>  		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
>  		ret = -EINVAL;
>  		break;
>  	}
> -	fput(file);
>  
>  	if (ret && ret != -EIOCBQUEUED)
>  		goto out_put_req;
> @@ -1622,7 +1643,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  out_put_req:
>  	put_reqs_available(ctx, 1);
>  	percpu_ref_put(&ctx->reqs);
> -	kiocb_free(req);
> +	if (req->ki_eventfd)
> +		eventfd_ctx_put(req->ki_eventfd);
> +	kmem_cache_free(kiocb_cachep, req);
>  	return ret;
>  }
>  
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/36] aio: sanitize ki_list handling
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  0:21     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:11PM -0800, Christoph Hellwig wrote:
> Instead of handcoded non-null checks always initialize ki_list to an
> empty list and use list_empty / list_empty_careful on it.  While we're
> at it also error out on a double call to kiocb_set_cancel_fn instead
> of ignoring it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 6295fc00f104..c32c315f05b5 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -555,13 +555,12 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
>  	struct kioctx *ctx = req->ki_ctx;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ctx->ctx_lock, flags);
> -
> -	if (!req->ki_list.next)
> -		list_add(&req->ki_list, &ctx->active_reqs);
> +	if (WARN_ON_ONCE(!list_empty(&req->ki_list)))
> +		return;
>  
> +	spin_lock_irqsave(&ctx->ctx_lock, flags);
> +	list_add_tail(&req->ki_list, &ctx->active_reqs);
>  	req->ki_cancel = cancel;
> -
>  	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
>  }
>  EXPORT_SYMBOL(kiocb_set_cancel_fn);
> @@ -1034,7 +1033,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
>  		goto out_put;
>  
>  	percpu_ref_get(&ctx->reqs);
> -
> +	INIT_LIST_HEAD(&req->ki_list);
>  	req->ki_ctx = ctx;
>  	return req;
>  out_put:
> @@ -1080,7 +1079,7 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
>  	unsigned tail, pos, head;
>  	unsigned long	flags;
>  
> -	if (iocb->ki_list.next) {
> +	if (!list_empty_careful(iocb->ki_list.next)) {
>  		unsigned long flags;
>  
>  		spin_lock_irqsave(&ctx->ctx_lock, flags);
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 04/36] aio: sanitize ki_list handling
@ 2018-03-20  0:21     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:11PM -0800, Christoph Hellwig wrote:
> Instead of handcoded non-null checks always initialize ki_list to an
> empty list and use list_empty / list_empty_careful on it.  While we're
> at it also error out on a double call to kiocb_set_cancel_fn instead
> of ignoring it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 6295fc00f104..c32c315f05b5 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -555,13 +555,12 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
>  	struct kioctx *ctx = req->ki_ctx;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ctx->ctx_lock, flags);
> -
> -	if (!req->ki_list.next)
> -		list_add(&req->ki_list, &ctx->active_reqs);
> +	if (WARN_ON_ONCE(!list_empty(&req->ki_list)))
> +		return;
>  
> +	spin_lock_irqsave(&ctx->ctx_lock, flags);
> +	list_add_tail(&req->ki_list, &ctx->active_reqs);
>  	req->ki_cancel = cancel;
> -
>  	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
>  }
>  EXPORT_SYMBOL(kiocb_set_cancel_fn);
> @@ -1034,7 +1033,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
>  		goto out_put;
>  
>  	percpu_ref_get(&ctx->reqs);
> -
> +	INIT_LIST_HEAD(&req->ki_list);
>  	req->ki_ctx = ctx;
>  	return req;
>  out_put:
> @@ -1080,7 +1079,7 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
>  	unsigned tail, pos, head;
>  	unsigned long	flags;
>  
> -	if (iocb->ki_list.next) {
> +	if (!list_empty_careful(iocb->ki_list.next)) {
>  		unsigned long flags;
>  
>  		spin_lock_irqsave(&ctx->ctx_lock, flags);
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 05/36] aio: simplify cancellation
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  0:25     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:12PM -0800, Christoph Hellwig wrote:
> With the current aio code there is no need for the magic KIOCB_CANCELLED
> value, as a cancelation just kicks the driver to queue the completion
> ASAP, with all actual completion handling done in another thread. Given
> that both the completion path and cancelation take the context lock there
> is no need for magic cmpxchg loops either.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>
> ---
>  fs/aio.c | 37 +++++++++----------------------------
>  1 file changed, 9 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index c32c315f05b5..2d40cf5dd4ec 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -156,19 +156,6 @@ struct kioctx {
>  	unsigned		id;
>  };
>  
> -/*
> - * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
> - * cancelled or completed (this makes a certain amount of sense because
> - * successful cancellation - io_cancel() - does deliver the completion to
> - * userspace).
> - *
> - * And since most things don't implement kiocb cancellation and we'd really like
> - * kiocb completion to be lockless when possible, we use ki_cancel to
> - * synchronize cancellation and completion - we only set it to KIOCB_CANCELLED
> - * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
> - */
> -#define KIOCB_CANCELLED		((void *) (~0ULL))
> -
>  struct aio_kiocb {
>  	union {
>  		struct kiocb		rw;
> @@ -565,24 +552,18 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
>  }
>  EXPORT_SYMBOL(kiocb_set_cancel_fn);
>  
> +/*
> + * Only cancel if there ws a ki_cancel function to start with, and we
> + * are the one how managed to clear it (to protect against simulatinious

"...are the one who managed to clear it (to protect against simultaneous
cancel calls)." ?

Really only complaining because who/how are both English words...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> + * cancel calls).
> + */
>  static int kiocb_cancel(struct aio_kiocb *kiocb)
>  {
> -	kiocb_cancel_fn *old, *cancel;
> -
> -	/*
> -	 * Don't want to set kiocb->ki_cancel = KIOCB_CANCELLED unless it
> -	 * actually has a cancel function, hence the cmpxchg()
> -	 */
> -
> -	cancel = READ_ONCE(kiocb->ki_cancel);
> -	do {
> -		if (!cancel || cancel == KIOCB_CANCELLED)
> -			return -EINVAL;
> -
> -		old = cancel;
> -		cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
> -	} while (cancel != old);
> +	kiocb_cancel_fn *cancel = kiocb->ki_cancel;
>  
> +	if (!cancel)
> +		return -EINVAL;
> +	kiocb->ki_cancel = NULL;
>  	return cancel(&kiocb->rw);
>  }
>  
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 05/36] aio: simplify cancellation
@ 2018-03-20  0:25     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:12PM -0800, Christoph Hellwig wrote:
> With the current aio code there is no need for the magic KIOCB_CANCELLED
> value, as a cancelation just kicks the driver to queue the completion
> ASAP, with all actual completion handling done in another thread. Given
> that both the completion path and cancelation take the context lock there
> is no need for magic cmpxchg loops either.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>
> ---
>  fs/aio.c | 37 +++++++++----------------------------
>  1 file changed, 9 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index c32c315f05b5..2d40cf5dd4ec 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -156,19 +156,6 @@ struct kioctx {
>  	unsigned		id;
>  };
>  
> -/*
> - * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
> - * cancelled or completed (this makes a certain amount of sense because
> - * successful cancellation - io_cancel() - does deliver the completion to
> - * userspace).
> - *
> - * And since most things don't implement kiocb cancellation and we'd really like
> - * kiocb completion to be lockless when possible, we use ki_cancel to
> - * synchronize cancellation and completion - we only set it to KIOCB_CANCELLED
> - * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
> - */
> -#define KIOCB_CANCELLED		((void *) (~0ULL))
> -
>  struct aio_kiocb {
>  	union {
>  		struct kiocb		rw;
> @@ -565,24 +552,18 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
>  }
>  EXPORT_SYMBOL(kiocb_set_cancel_fn);
>  
> +/*
> + * Only cancel if there ws a ki_cancel function to start with, and we
> + * are the one how managed to clear it (to protect against simulatinious

"...are the one who managed to clear it (to protect against simultaneous
cancel calls)." ?

Really only complaining because who/how are both English words...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> + * cancel calls).
> + */
>  static int kiocb_cancel(struct aio_kiocb *kiocb)
>  {
> -	kiocb_cancel_fn *old, *cancel;
> -
> -	/*
> -	 * Don't want to set kiocb->ki_cancel = KIOCB_CANCELLED unless it
> -	 * actually has a cancel function, hence the cmpxchg()
> -	 */
> -
> -	cancel = READ_ONCE(kiocb->ki_cancel);
> -	do {
> -		if (!cancel || cancel == KIOCB_CANCELLED)
> -			return -EINVAL;
> -
> -		old = cancel;
> -		cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
> -	} while (cancel != old);
> +	kiocb_cancel_fn *cancel = kiocb->ki_cancel;
>  
> +	if (!cancel)
> +		return -EINVAL;
> +	kiocb->ki_cancel = NULL;
>  	return cancel(&kiocb->rw);
>  }
>  
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 06/36] aio: delete iocbs from the active_reqs list in kiocb_cancel
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  0:34     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:13PM -0800, Christoph Hellwig wrote:
> One we cancel an iocb there is no reason to keep it on the active_reqs
> list, given that the list is only used to look for cancelation candidates.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 2d40cf5dd4ec..0b6394b4e528 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -561,6 +561,8 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
>  {
>  	kiocb_cancel_fn *cancel = kiocb->ki_cancel;
>  
> +	list_del_init(&kiocb->ki_list);
> +
>  	if (!cancel)
>  		return -EINVAL;
>  	kiocb->ki_cancel = NULL;
> @@ -607,8 +609,6 @@ static void free_ioctx_users(struct percpu_ref *ref)
>  	while (!list_empty(&ctx->active_reqs)) {
>  		req = list_first_entry(&ctx->active_reqs,
>  				       struct aio_kiocb, ki_list);
> -
> -		list_del_init(&req->ki_list);
>  		kiocb_cancel(req);
>  	}
>  
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 06/36] aio: delete iocbs from the active_reqs list in kiocb_cancel
@ 2018-03-20  0:34     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  0:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:13PM -0800, Christoph Hellwig wrote:
> One we cancel an iocb there is no reason to keep it on the active_reqs
> list, given that the list is only used to look for cancelation candidates.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/aio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 2d40cf5dd4ec..0b6394b4e528 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -561,6 +561,8 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
>  {
>  	kiocb_cancel_fn *cancel = kiocb->ki_cancel;
>  
> +	list_del_init(&kiocb->ki_list);
> +
>  	if (!cancel)
>  		return -EINVAL;
>  	kiocb->ki_cancel = NULL;
> @@ -607,8 +609,6 @@ static void free_ioctx_users(struct percpu_ref *ref)
>  	while (!list_empty(&ctx->active_reqs)) {
>  		req = list_first_entry(&ctx->active_reqs,
>  				       struct aio_kiocb, ki_list);
> -
> -		list_del_init(&req->ki_list);
>  		kiocb_cancel(req);
>  	}
>  
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  2:12     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:15PM -0800, Christoph Hellwig wrote:
> This is the io_getevents equivalent of ppoll/pselect and allows to
> properly mix signals and aio completions (especially with IOCB_CMD_POLL)
> and atomically executes the following sequence:
> 
> 	sigset_t origmask;
> 
> 	pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
> 	ret = io_getevents(ctx, min_nr, nr, events, timeout);
> 	pthread_sigmask(SIG_SETMASK, &origmask, NULL);
> 
> Note that unlike many other signal related calls we do not pass a sigmask
> size, as that would get us to 7 arguments, which aren't easily supported
> by the syscall infrastructure.  It seems a lot less painful to just add a
> new syscall variant in the unlikely case we're going to increase the
> sigset size.

I'm assuming there's a proposed manpage update for this somewhere? :)

--D

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  fs/aio.c                               | 114 ++++++++++++++++++++++++++++++---
>  include/linux/compat.h                 |   7 ++
>  include/linux/syscalls.h               |   6 ++
>  include/uapi/asm-generic/unistd.h      |   4 +-
>  include/uapi/linux/aio_abi.h           |   6 ++
>  kernel/sys_ni.c                        |   2 +
>  8 files changed, 130 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 448ac2161112..5997c3e9ac3e 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -391,3 +391,4 @@
>  382	i386	pkey_free		sys_pkey_free
>  383	i386	statx			sys_statx
>  384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
> +385	i386	io_pgetevents		sys_io_pgetevents		compat_sys_io_pgetevents
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 5aef183e2f85..e995cd2b4e65 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -339,6 +339,7 @@
>  330	common	pkey_alloc		sys_pkey_alloc
>  331	common	pkey_free		sys_pkey_free
>  332	common	statx			sys_statx
> +333	common	io_pgetevents		sys_io_pgetevents
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/aio.c b/fs/aio.c
> index 9d7d6e4cde87..da87cbf7c67a 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1291,10 +1291,6 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
>  		wait_event_interruptible_hrtimeout(ctx->wait,
>  				aio_read_events(ctx, min_nr, nr, event, &ret),
>  				until);
> -
> -	if (!ret && signal_pending(current))
> -		ret = -EINTR;
> -
>  	return ret;
>  }
>  
> @@ -1874,13 +1870,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
>  		struct timespec __user *, timeout)
>  {
>  	struct timespec64	ts;
> +	int			ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +SYSCALL_DEFINE6(io_pgetevents,
> +		aio_context_t, ctx_id,
> +		long, min_nr,
> +		long, nr,
> +		struct io_event __user *, events,
> +		struct timespec __user *, timeout,
> +		const struct __aio_sigset __user *, usig)
> +{
> +	struct __aio_sigset	ksig = { NULL, };
> +	sigset_t		ksigmask, sigsaved;
> +	struct timespec64	ts;
> +	int ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (unlikely(get_timespec64(&ts, timeout)))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(sigset_t))
> +			return -EINVAL;
> +		if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	return ret;
>  }
>  
>  #ifdef CONFIG_COMPAT
> @@ -1891,13 +1934,64 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id,
>  		       struct compat_timespec __user *, timeout)
>  {
>  	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +
> +struct __compat_aio_sigset {
> +	compat_sigset_t __user	*sigmask;
> +	compat_size_t		sigsetsize;
> +};
> +
> +COMPAT_SYSCALL_DEFINE6(io_pgetevents,
> +		compat_aio_context_t, ctx_id,
> +		compat_long_t, min_nr,
> +		compat_long_t, nr,
> +		struct io_event __user *, events,
> +		struct compat_timespec __user *, timeout,
> +		const struct __compat_aio_sigset __user *, usig)
> +{
> +	struct __compat_aio_sigset ksig = { NULL, };
> +	sigset_t ksigmask, sigsaved;
> +	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (compat_get_timespec64(&t, timeout))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(compat_sigset_t))
> +			return -EINVAL;
> +		if (get_compat_sigset(&ksigmask, ksig.sigmask))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
>  
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	return ret;
>  }
>  #endif
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index 8a9643857c4a..bfb8a94fbabd 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -303,6 +303,7 @@ extern int put_compat_rusage(const struct rusage *,
>  			     struct compat_rusage __user *);
>  
>  struct compat_siginfo;
> +struct __compat_aio_sigset;
>  
>  extern asmlinkage long compat_sys_waitid(int, compat_pid_t,
>  		struct compat_siginfo __user *, int,
> @@ -634,6 +635,12 @@ asmlinkage long compat_sys_io_getevents(compat_aio_context_t ctx_id,
>  					compat_long_t nr,
>  					struct io_event __user *events,
>  					struct compat_timespec __user *timeout);
> +asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
> +					compat_long_t min_nr,
> +					compat_long_t nr,
> +					struct io_event __user *events,
> +					struct compat_timespec __user *timeout,
> +					const struct __compat_aio_sigset __user *usig);
>  asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr,
>  				     u32 __user *iocb);
>  asmlinkage long compat_sys_mount(const char __user *dev_name,
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index a78186d826d7..8515ec53c81b 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -539,6 +539,12 @@ asmlinkage long sys_io_getevents(aio_context_t ctx_id,
>  				long nr,
>  				struct io_event __user *events,
>  				struct timespec __user *timeout);
> +asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
> +				long min_nr,
> +				long nr,
> +				struct io_event __user *events,
> +				struct timespec __user *timeout,
> +				const struct __aio_sigset *sig);
>  asmlinkage long sys_io_submit(aio_context_t, long,
>  				struct iocb __user * __user *);
>  asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 8b87de067bc7..ce2ebbeece10 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
>  __SYSCALL(__NR_pkey_free,     sys_pkey_free)
>  #define __NR_statx 291
>  __SYSCALL(__NR_statx,     sys_statx)
> +#define __NR_io_pgetevents 292
> +__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 292
> +#define __NR_syscalls 293
>  
>  /*
>   * All syscalls below here should go away really,
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index a04adbc70ddf..2c0a3415beee 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -29,6 +29,7 @@
>  
>  #include <linux/types.h>
>  #include <linux/fs.h>
> +#include <linux/signal.h>
>  #include <asm/byteorder.h>
>  
>  typedef __kernel_ulong_t aio_context_t;
> @@ -108,5 +109,10 @@ struct iocb {
>  #undef IFBIG
>  #undef IFLITTLE
>  
> +struct __aio_sigset {
> +	sigset_t __user	*sigmask;
> +	size_t		sigsetsize;
> +};
> +
>  #endif /* __LINUX__AIO_ABI_H */
>  
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index b5189762d275..8f7705559b38 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -151,9 +151,11 @@ cond_syscall(sys_io_destroy);
>  cond_syscall(sys_io_submit);
>  cond_syscall(sys_io_cancel);
>  cond_syscall(sys_io_getevents);
> +cond_syscall(sys_io_pgetevents);
>  cond_syscall(compat_sys_io_setup);
>  cond_syscall(compat_sys_io_submit);
>  cond_syscall(compat_sys_io_getevents);
> +cond_syscall(compat_sys_io_pgetevents);
>  cond_syscall(sys_sysfs);
>  cond_syscall(sys_syslog);
>  cond_syscall(sys_process_vm_readv);
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-20  2:12     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:15PM -0800, Christoph Hellwig wrote:
> This is the io_getevents equivalent of ppoll/pselect and allows to
> properly mix signals and aio completions (especially with IOCB_CMD_POLL)
> and atomically executes the following sequence:
> 
> 	sigset_t origmask;
> 
> 	pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
> 	ret = io_getevents(ctx, min_nr, nr, events, timeout);
> 	pthread_sigmask(SIG_SETMASK, &origmask, NULL);
> 
> Note that unlike many other signal related calls we do not pass a sigmask
> size, as that would get us to 7 arguments, which aren't easily supported
> by the syscall infrastructure.  It seems a lot less painful to just add a
> new syscall variant in the unlikely case we're going to increase the
> sigset size.

I'm assuming there's a proposed manpage update for this somewhere? :)

--D

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  fs/aio.c                               | 114 ++++++++++++++++++++++++++++++---
>  include/linux/compat.h                 |   7 ++
>  include/linux/syscalls.h               |   6 ++
>  include/uapi/asm-generic/unistd.h      |   4 +-
>  include/uapi/linux/aio_abi.h           |   6 ++
>  kernel/sys_ni.c                        |   2 +
>  8 files changed, 130 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 448ac2161112..5997c3e9ac3e 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -391,3 +391,4 @@
>  382	i386	pkey_free		sys_pkey_free
>  383	i386	statx			sys_statx
>  384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
> +385	i386	io_pgetevents		sys_io_pgetevents		compat_sys_io_pgetevents
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 5aef183e2f85..e995cd2b4e65 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -339,6 +339,7 @@
>  330	common	pkey_alloc		sys_pkey_alloc
>  331	common	pkey_free		sys_pkey_free
>  332	common	statx			sys_statx
> +333	common	io_pgetevents		sys_io_pgetevents
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/aio.c b/fs/aio.c
> index 9d7d6e4cde87..da87cbf7c67a 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1291,10 +1291,6 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
>  		wait_event_interruptible_hrtimeout(ctx->wait,
>  				aio_read_events(ctx, min_nr, nr, event, &ret),
>  				until);
> -
> -	if (!ret && signal_pending(current))
> -		ret = -EINTR;
> -
>  	return ret;
>  }
>  
> @@ -1874,13 +1870,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
>  		struct timespec __user *, timeout)
>  {
>  	struct timespec64	ts;
> +	int			ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +SYSCALL_DEFINE6(io_pgetevents,
> +		aio_context_t, ctx_id,
> +		long, min_nr,
> +		long, nr,
> +		struct io_event __user *, events,
> +		struct timespec __user *, timeout,
> +		const struct __aio_sigset __user *, usig)
> +{
> +	struct __aio_sigset	ksig = { NULL, };
> +	sigset_t		ksigmask, sigsaved;
> +	struct timespec64	ts;
> +	int ret;
> +
> +	if (timeout && unlikely(get_timespec64(&ts, timeout)))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (unlikely(get_timespec64(&ts, timeout)))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(sigset_t))
> +			return -EINVAL;
> +		if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
> +	return ret;
>  }
>  
>  #ifdef CONFIG_COMPAT
> @@ -1891,13 +1934,64 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id,
>  		       struct compat_timespec __user *, timeout)
>  {
>  	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
> +
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (!ret && signal_pending(current))
> +		ret = -EINTR;
> +	return ret;
> +}
> +
> +
> +struct __compat_aio_sigset {
> +	compat_sigset_t __user	*sigmask;
> +	compat_size_t		sigsetsize;
> +};
> +
> +COMPAT_SYSCALL_DEFINE6(io_pgetevents,
> +		compat_aio_context_t, ctx_id,
> +		compat_long_t, min_nr,
> +		compat_long_t, nr,
> +		struct io_event __user *, events,
> +		struct compat_timespec __user *, timeout,
> +		const struct __compat_aio_sigset __user *, usig)
> +{
> +	struct __compat_aio_sigset ksig = { NULL, };
> +	sigset_t ksigmask, sigsaved;
> +	struct timespec64 t;
> +	int ret;
> +
> +	if (timeout && compat_get_timespec64(&t, timeout))
> +		return -EFAULT;
>  
> -	if (timeout) {
> -		if (compat_get_timespec64(&t, timeout))
> +	if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
> +		return -EFAULT;
> +
> +	if (ksig.sigmask) {
> +		if (ksig.sigsetsize != sizeof(compat_sigset_t))
> +			return -EINVAL;
> +		if (get_compat_sigset(&ksigmask, ksig.sigmask))
>  			return -EFAULT;
> +		sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
> +		sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
> +	}
>  
> +	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	if (signal_pending(current)) {
> +		if (ksig.sigmask) {
> +			current->saved_sigmask = sigsaved;
> +			set_restore_sigmask();
> +		}
> +		if (!ret)
> +			ret = -ERESTARTNOHAND;
> +	} else {
> +		if (ksig.sigmask)
> +			sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>  	}
>  
> -	return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
> +	return ret;
>  }
>  #endif
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index 8a9643857c4a..bfb8a94fbabd 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -303,6 +303,7 @@ extern int put_compat_rusage(const struct rusage *,
>  			     struct compat_rusage __user *);
>  
>  struct compat_siginfo;
> +struct __compat_aio_sigset;
>  
>  extern asmlinkage long compat_sys_waitid(int, compat_pid_t,
>  		struct compat_siginfo __user *, int,
> @@ -634,6 +635,12 @@ asmlinkage long compat_sys_io_getevents(compat_aio_context_t ctx_id,
>  					compat_long_t nr,
>  					struct io_event __user *events,
>  					struct compat_timespec __user *timeout);
> +asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
> +					compat_long_t min_nr,
> +					compat_long_t nr,
> +					struct io_event __user *events,
> +					struct compat_timespec __user *timeout,
> +					const struct __compat_aio_sigset __user *usig);
>  asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr,
>  				     u32 __user *iocb);
>  asmlinkage long compat_sys_mount(const char __user *dev_name,
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index a78186d826d7..8515ec53c81b 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -539,6 +539,12 @@ asmlinkage long sys_io_getevents(aio_context_t ctx_id,
>  				long nr,
>  				struct io_event __user *events,
>  				struct timespec __user *timeout);
> +asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
> +				long min_nr,
> +				long nr,
> +				struct io_event __user *events,
> +				struct timespec __user *timeout,
> +				const struct __aio_sigset *sig);
>  asmlinkage long sys_io_submit(aio_context_t, long,
>  				struct iocb __user * __user *);
>  asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 8b87de067bc7..ce2ebbeece10 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
>  __SYSCALL(__NR_pkey_free,     sys_pkey_free)
>  #define __NR_statx 291
>  __SYSCALL(__NR_statx,     sys_statx)
> +#define __NR_io_pgetevents 292
> +__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 292
> +#define __NR_syscalls 293
>  
>  /*
>   * All syscalls below here should go away really,
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index a04adbc70ddf..2c0a3415beee 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -29,6 +29,7 @@
>  
>  #include <linux/types.h>
>  #include <linux/fs.h>
> +#include <linux/signal.h>
>  #include <asm/byteorder.h>
>  
>  typedef __kernel_ulong_t aio_context_t;
> @@ -108,5 +109,10 @@ struct iocb {
>  #undef IFBIG
>  #undef IFLITTLE
>  
> +struct __aio_sigset {
> +	sigset_t __user	*sigmask;
> +	size_t		sigsetsize;
> +};
> +
>  #endif /* __LINUX__AIO_ABI_H */
>  
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index b5189762d275..8f7705559b38 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -151,9 +151,11 @@ cond_syscall(sys_io_destroy);
>  cond_syscall(sys_io_submit);
>  cond_syscall(sys_io_cancel);
>  cond_syscall(sys_io_getevents);
> +cond_syscall(sys_io_pgetevents);
>  cond_syscall(compat_sys_io_setup);
>  cond_syscall(compat_sys_io_submit);
>  cond_syscall(compat_sys_io_getevents);
> +cond_syscall(compat_sys_io_pgetevents);
>  cond_syscall(sys_sysfs);
>  cond_syscall(sys_syslog);
>  cond_syscall(sys_process_vm_readv);
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 09/36] fs: unexport poll_schedule_timeout
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  2:13     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:16PM -0800, Christoph Hellwig wrote:
> No users outside of select.c.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/select.c          | 3 +--
>  include/linux/poll.h | 2 --
>  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index b6c36254028a..686de7b3a1db 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -233,7 +233,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
>  	add_wait_queue(wait_address, &entry->wait);
>  }
>  
> -int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
> +static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
>  			  ktime_t *expires, unsigned long slack)
>  {
>  	int rc = -EINTR;
> @@ -258,7 +258,6 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
>  
>  	return rc;
>  }
> -EXPORT_SYMBOL(poll_schedule_timeout);
>  
>  /**
>   * poll_select_set_timeout - helper function to setup the timeout value
> diff --git a/include/linux/poll.h b/include/linux/poll.h
> index f45ebd017eaa..a3576da63377 100644
> --- a/include/linux/poll.h
> +++ b/include/linux/poll.h
> @@ -96,8 +96,6 @@ struct poll_wqueues {
>  
>  extern void poll_initwait(struct poll_wqueues *pwq);
>  extern void poll_freewait(struct poll_wqueues *pwq);
> -extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
> -				 ktime_t *expires, unsigned long slack);
>  extern u64 select_estimate_accuracy(struct timespec64 *tv);
>  
>  #define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1)
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 09/36] fs: unexport poll_schedule_timeout
@ 2018-03-20  2:13     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:16PM -0800, Christoph Hellwig wrote:
> No users outside of select.c.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/select.c          | 3 +--
>  include/linux/poll.h | 2 --
>  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index b6c36254028a..686de7b3a1db 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -233,7 +233,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
>  	add_wait_queue(wait_address, &entry->wait);
>  }
>  
> -int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
> +static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
>  			  ktime_t *expires, unsigned long slack)
>  {
>  	int rc = -EINTR;
> @@ -258,7 +258,6 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
>  
>  	return rc;
>  }
> -EXPORT_SYMBOL(poll_schedule_timeout);
>  
>  /**
>   * poll_select_set_timeout - helper function to setup the timeout value
> diff --git a/include/linux/poll.h b/include/linux/poll.h
> index f45ebd017eaa..a3576da63377 100644
> --- a/include/linux/poll.h
> +++ b/include/linux/poll.h
> @@ -96,8 +96,6 @@ struct poll_wqueues {
>  
>  extern void poll_initwait(struct poll_wqueues *pwq);
>  extern void poll_freewait(struct poll_wqueues *pwq);
> -extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
> -				 ktime_t *expires, unsigned long slack);
>  extern u64 select_estimate_accuracy(struct timespec64 *tv);
>  
>  #define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1)
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 10/36] fs: cleanup do_pollfd
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  2:14     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:17PM -0800, Christoph Hellwig wrote:
> Use straigline code with failure handling gotos instead of a lot
> of nested conditionals.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/select.c | 48 +++++++++++++++++++++++-------------------------
>  1 file changed, 23 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index 686de7b3a1db..c6c504a814f9 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -806,34 +806,32 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
>  				     bool *can_busy_poll,
>  				     __poll_t busy_flag)
>  {
> -	__poll_t mask;
> -	int fd;
> -
> -	mask = 0;
> -	fd = pollfd->fd;
> -	if (fd >= 0) {
> -		struct fd f = fdget(fd);
> -		mask = EPOLLNVAL;
> -		if (f.file) {
> -			/* userland u16 ->events contains POLL... bitmap */
> -			__poll_t filter = demangle_poll(pollfd->events) |
> -						EPOLLERR | EPOLLHUP;
> -			mask = DEFAULT_POLLMASK;
> -			if (f.file->f_op->poll) {
> -				pwait->_key = filter;
> -				pwait->_key |= busy_flag;
> -				mask = f.file->f_op->poll(f.file, pwait);
> -				if (mask & busy_flag)
> -					*can_busy_poll = true;
> -			}
> -			/* Mask out unneeded events. */
> -			mask &= filter;
> -			fdput(f);
> -		}
> +	int fd = pollfd->fd;
> +	__poll_t mask = 0, filter;
> +	struct fd f;
> +
> +	if (fd < 0)
> +		goto out;
> +	mask = EPOLLNVAL;
> +	f = fdget(fd);
> +	if (!f.file)
> +		goto out;
> +
> +	/* userland u16 ->events contains POLL... bitmap */
> +	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
> +	mask = DEFAULT_POLLMASK;
> +	if (f.file->f_op->poll) {
> +		pwait->_key = filter | busy_flag;
> +		mask = f.file->f_op->poll(f.file, pwait);
> +		if (mask & busy_flag)
> +			*can_busy_poll = true;
>  	}
> +	mask &= filter;		/* Mask out unneeded events. */
> +	fdput(f);
> +
> +out:
>  	/* ... and so does ->revents */
>  	pollfd->revents = mangle_poll(mask);
> -
>  	return mask;
>  }
>  
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 10/36] fs: cleanup do_pollfd
@ 2018-03-20  2:14     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:17PM -0800, Christoph Hellwig wrote:
> Use straigline code with failure handling gotos instead of a lot
> of nested conditionals.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/select.c | 48 +++++++++++++++++++++++-------------------------
>  1 file changed, 23 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index 686de7b3a1db..c6c504a814f9 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -806,34 +806,32 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
>  				     bool *can_busy_poll,
>  				     __poll_t busy_flag)
>  {
> -	__poll_t mask;
> -	int fd;
> -
> -	mask = 0;
> -	fd = pollfd->fd;
> -	if (fd >= 0) {
> -		struct fd f = fdget(fd);
> -		mask = EPOLLNVAL;
> -		if (f.file) {
> -			/* userland u16 ->events contains POLL... bitmap */
> -			__poll_t filter = demangle_poll(pollfd->events) |
> -						EPOLLERR | EPOLLHUP;
> -			mask = DEFAULT_POLLMASK;
> -			if (f.file->f_op->poll) {
> -				pwait->_key = filter;
> -				pwait->_key |= busy_flag;
> -				mask = f.file->f_op->poll(f.file, pwait);
> -				if (mask & busy_flag)
> -					*can_busy_poll = true;
> -			}
> -			/* Mask out unneeded events. */
> -			mask &= filter;
> -			fdput(f);
> -		}
> +	int fd = pollfd->fd;
> +	__poll_t mask = 0, filter;
> +	struct fd f;
> +
> +	if (fd < 0)
> +		goto out;
> +	mask = EPOLLNVAL;
> +	f = fdget(fd);
> +	if (!f.file)
> +		goto out;
> +
> +	/* userland u16 ->events contains POLL... bitmap */
> +	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
> +	mask = DEFAULT_POLLMASK;
> +	if (f.file->f_op->poll) {
> +		pwait->_key = filter | busy_flag;
> +		mask = f.file->f_op->poll(f.file, pwait);
> +		if (mask & busy_flag)
> +			*can_busy_poll = true;
>  	}
> +	mask &= filter;		/* Mask out unneeded events. */
> +	fdput(f);
> +
> +out:
>  	/* ... and so does ->revents */
>  	pollfd->revents = mangle_poll(mask);
> -
>  	return mask;
>  }
>  
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 11/36] fs: update documentation for __poll_t
  2018-03-05 21:27 ` [PATCH 11/36] fs: update documentation for __poll_t Christoph Hellwig
@ 2018-03-20  2:19     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:18PM -0800, Christoph Hellwig wrote:

No commit message... "Update documentation to match the headers"?

--D

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  Documentation/filesystems/Locking | 2 +-
>  Documentation/filesystems/vfs.txt | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index 75d2d57e2c44..220bba28f72b 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -439,7 +439,7 @@ prototypes:
>  	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
>  	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>  	int (*iterate) (struct file *, struct dir_context *);
> -	unsigned int (*poll) (struct file *, struct poll_table_struct *);
> +	__poll_t (*poll) (struct file *, struct poll_table_struct *);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index 5fd325df59e2..f608180ad59d 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -856,7 +856,7 @@ struct file_operations {
>  	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
>  	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>  	int (*iterate) (struct file *, struct dir_context *);
> -	unsigned int (*poll) (struct file *, struct poll_table_struct *);
> +	__poll_t (*poll) (struct file *, struct poll_table_struct *);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 11/36] fs: update documentation for __poll_t
@ 2018-03-20  2:19     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:18PM -0800, Christoph Hellwig wrote:

No commit message... "Update documentation to match the headers"?

--D

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  Documentation/filesystems/Locking | 2 +-
>  Documentation/filesystems/vfs.txt | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index 75d2d57e2c44..220bba28f72b 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -439,7 +439,7 @@ prototypes:
>  	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
>  	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>  	int (*iterate) (struct file *, struct dir_context *);
> -	unsigned int (*poll) (struct file *, struct poll_table_struct *);
> +	__poll_t (*poll) (struct file *, struct poll_table_struct *);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index 5fd325df59e2..f608180ad59d 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -856,7 +856,7 @@ struct file_operations {
>  	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
>  	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>  	int (*iterate) (struct file *, struct dir_context *);
> -	unsigned int (*poll) (struct file *, struct poll_table_struct *);
> +	__poll_t (*poll) (struct file *, struct poll_table_struct *);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 12/36] fs: add new vfs_poll and file_can_poll helpers
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  2:27     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:19PM -0800, Christoph Hellwig wrote:
> These abstract out calls to the poll method in preparation for changes
> in how we poll.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/staging/comedi/drivers/serial2002.c |  4 ++--
>  drivers/vfio/virqfd.c                       |  2 +-
>  drivers/vhost/vhost.c                       |  2 +-
>  fs/eventpoll.c                              |  5 ++---
>  fs/select.c                                 | 23 ++++++++---------------
>  include/linux/poll.h                        | 12 ++++++++++++
>  mm/memcontrol.c                             |  2 +-

For the fs/include/mm changes,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

The other conversions look fine to me too but I've never looked at them
before. :)

--D

>  net/9p/trans_fd.c                           | 18 ++++--------------
>  virt/kvm/eventfd.c                          |  2 +-
>  9 files changed, 32 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/staging/comedi/drivers/serial2002.c b/drivers/staging/comedi/drivers/serial2002.c
> index b3f3b4a201af..5471b2212a62 100644
> --- a/drivers/staging/comedi/drivers/serial2002.c
> +++ b/drivers/staging/comedi/drivers/serial2002.c
> @@ -113,7 +113,7 @@ static void serial2002_tty_read_poll_wait(struct file *f, int timeout)
>  		long elapsed;
>  		__poll_t mask;
>  
> -		mask = f->f_op->poll(f, &table.pt);
> +		mask = vfs_poll(f, &table.pt);
>  		if (mask & (EPOLLRDNORM | EPOLLRDBAND | EPOLLIN |
>  			    EPOLLHUP | EPOLLERR)) {
>  			break;
> @@ -136,7 +136,7 @@ static int serial2002_tty_read(struct file *f, int timeout)
>  
>  	result = -1;
>  	if (!IS_ERR(f)) {
> -		if (f->f_op->poll) {
> +		if (file_can_poll(f)) {
>  			serial2002_tty_read_poll_wait(f, timeout);
>  
>  			if (kernel_read(f, &ch, 1, &pos) == 1)
> diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
> index 085700f1be10..2a1be859ee71 100644
> --- a/drivers/vfio/virqfd.c
> +++ b/drivers/vfio/virqfd.c
> @@ -166,7 +166,7 @@ int vfio_virqfd_enable(void *opaque,
>  	init_waitqueue_func_entry(&virqfd->wait, virqfd_wakeup);
>  	init_poll_funcptr(&virqfd->pt, virqfd_ptable_queue_proc);
>  
> -	events = irqfd.file->f_op->poll(irqfd.file, &virqfd->pt);
> +	events = vfs_poll(irqfd.file, &virqfd->pt);
>  
>  	/*
>  	 * Check if there was an event already pending on the eventfd
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 1b3e8d2d5c8b..4d27e288bb1d 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -208,7 +208,7 @@ int vhost_poll_start(struct vhost_poll *poll, struct file *file)
>  	if (poll->wqh)
>  		return 0;
>  
> -	mask = file->f_op->poll(file, &poll->table);
> +	mask = vfs_poll(file, &poll->table);
>  	if (mask)
>  		vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
>  	if (mask & EPOLLERR) {
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 0f3494ed3ed0..2bebae5a38cf 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -884,8 +884,7 @@ static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt,
>  
>  	pt->_key = epi->event.events;
>  	if (!is_file_epoll(epi->ffd.file))
> -		return epi->ffd.file->f_op->poll(epi->ffd.file, pt) &
> -		       epi->event.events;
> +		return vfs_poll(epi->ffd.file, pt) & epi->event.events;
>  
>  	ep = epi->ffd.file->private_data;
>  	poll_wait(epi->ffd.file, &ep->poll_wait, pt);
> @@ -2020,7 +2019,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
>  
>  	/* The target file descriptor must support poll */
>  	error = -EPERM;
> -	if (!tf.file->f_op->poll)
> +	if (!file_can_poll(tf.file))
>  		goto error_tgt_fput;
>  
>  	/* Check if EPOLLWAKEUP is allowed */
> diff --git a/fs/select.c b/fs/select.c
> index c6c504a814f9..ba91103707ea 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -502,14 +502,10 @@ static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
>  					continue;
>  				f = fdget(i);
>  				if (f.file) {
> -					const struct file_operations *f_op;
> -					f_op = f.file->f_op;
> -					mask = DEFAULT_POLLMASK;
> -					if (f_op->poll) {
> -						wait_key_set(wait, in, out,
> -							     bit, busy_flag);
> -						mask = (*f_op->poll)(f.file, wait);
> -					}
> +					wait_key_set(wait, in, out, bit,
> +						     busy_flag);
> +					mask = vfs_poll(f.file, wait);
> +
>  					fdput(f);
>  					if ((mask & POLLIN_SET) && (in & bit)) {
>  						res_in |= bit;
> @@ -819,13 +815,10 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
>  
>  	/* userland u16 ->events contains POLL... bitmap */
>  	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
> -	mask = DEFAULT_POLLMASK;
> -	if (f.file->f_op->poll) {
> -		pwait->_key = filter | busy_flag;
> -		mask = f.file->f_op->poll(f.file, pwait);
> -		if (mask & busy_flag)
> -			*can_busy_poll = true;
> -	}
> +	pwait->_key = filter | busy_flag;
> +	mask = vfs_poll(f.file, pwait);
> +	if (mask & busy_flag)
> +		*can_busy_poll = true;
>  	mask &= filter;		/* Mask out unneeded events. */
>  	fdput(f);
>  
> diff --git a/include/linux/poll.h b/include/linux/poll.h
> index a3576da63377..7e0fdcf905d2 100644
> --- a/include/linux/poll.h
> +++ b/include/linux/poll.h
> @@ -74,6 +74,18 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
>  	pt->_key   = ~(__poll_t)0; /* all events enabled */
>  }
>  
> +static inline bool file_can_poll(struct file *file)
> +{
> +	return file->f_op->poll;
> +}
> +
> +static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
> +{
> +	if (unlikely(!file->f_op->poll))
> +		return DEFAULT_POLLMASK;
> +	return file->f_op->poll(file, pt);
> +}
> +
>  struct poll_table_entry {
>  	struct file *filp;
>  	__poll_t key;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 670e99b68aa6..8774ece5c3c3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3849,7 +3849,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
>  	if (ret)
>  		goto out_put_css;
>  
> -	efile.file->f_op->poll(efile.file, &event->pt);
> +	vfs_poll(efile.file, &event->pt);
>  
>  	spin_lock(&memcg->event_list_lock);
>  	list_add(&event->list, &memcg->event_list);
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 0cfba919d167..3811775692d0 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -231,7 +231,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
>  static __poll_t
>  p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
>  {
> -	__poll_t ret, n;
> +	__poll_t ret;
>  	struct p9_trans_fd *ts = NULL;
>  
>  	if (client && client->status == Connected)
> @@ -243,19 +243,9 @@ p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
>  		return EPOLLERR;
>  	}
>  
> -	if (!ts->rd->f_op->poll)
> -		ret = DEFAULT_POLLMASK;
> -	else
> -		ret = ts->rd->f_op->poll(ts->rd, pt);
> -
> -	if (ts->rd != ts->wr) {
> -		if (!ts->wr->f_op->poll)
> -			n = DEFAULT_POLLMASK;
> -		else
> -			n = ts->wr->f_op->poll(ts->wr, pt);
> -		ret = (ret & ~EPOLLOUT) | (n & ~EPOLLIN);
> -	}
> -
> +	ret = vfs_poll(ts->rd, pt);
> +	if (ts->rd != ts->wr)
> +		ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
>  	return ret;
>  }
>  
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 6e865e8b5b10..90d30fbe95ae 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -397,7 +397,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  	 * Check if there was an event already pending on the eventfd
>  	 * before we registered, and trigger it as if we didn't miss it.
>  	 */
> -	events = f.file->f_op->poll(f.file, &irqfd->pt);
> +	events = vfs_poll(f.file, &irqfd->pt);
>  
>  	if (events & EPOLLIN)
>  		schedule_work(&irqfd->inject);
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 12/36] fs: add new vfs_poll and file_can_poll helpers
@ 2018-03-20  2:27     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  2:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:19PM -0800, Christoph Hellwig wrote:
> These abstract out calls to the poll method in preparation for changes
> in how we poll.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/staging/comedi/drivers/serial2002.c |  4 ++--
>  drivers/vfio/virqfd.c                       |  2 +-
>  drivers/vhost/vhost.c                       |  2 +-
>  fs/eventpoll.c                              |  5 ++---
>  fs/select.c                                 | 23 ++++++++---------------
>  include/linux/poll.h                        | 12 ++++++++++++
>  mm/memcontrol.c                             |  2 +-

For the fs/include/mm changes,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

The other conversions look fine to me too but I've never looked at them
before. :)

--D

>  net/9p/trans_fd.c                           | 18 ++++--------------
>  virt/kvm/eventfd.c                          |  2 +-
>  9 files changed, 32 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/staging/comedi/drivers/serial2002.c b/drivers/staging/comedi/drivers/serial2002.c
> index b3f3b4a201af..5471b2212a62 100644
> --- a/drivers/staging/comedi/drivers/serial2002.c
> +++ b/drivers/staging/comedi/drivers/serial2002.c
> @@ -113,7 +113,7 @@ static void serial2002_tty_read_poll_wait(struct file *f, int timeout)
>  		long elapsed;
>  		__poll_t mask;
>  
> -		mask = f->f_op->poll(f, &table.pt);
> +		mask = vfs_poll(f, &table.pt);
>  		if (mask & (EPOLLRDNORM | EPOLLRDBAND | EPOLLIN |
>  			    EPOLLHUP | EPOLLERR)) {
>  			break;
> @@ -136,7 +136,7 @@ static int serial2002_tty_read(struct file *f, int timeout)
>  
>  	result = -1;
>  	if (!IS_ERR(f)) {
> -		if (f->f_op->poll) {
> +		if (file_can_poll(f)) {
>  			serial2002_tty_read_poll_wait(f, timeout);
>  
>  			if (kernel_read(f, &ch, 1, &pos) == 1)
> diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
> index 085700f1be10..2a1be859ee71 100644
> --- a/drivers/vfio/virqfd.c
> +++ b/drivers/vfio/virqfd.c
> @@ -166,7 +166,7 @@ int vfio_virqfd_enable(void *opaque,
>  	init_waitqueue_func_entry(&virqfd->wait, virqfd_wakeup);
>  	init_poll_funcptr(&virqfd->pt, virqfd_ptable_queue_proc);
>  
> -	events = irqfd.file->f_op->poll(irqfd.file, &virqfd->pt);
> +	events = vfs_poll(irqfd.file, &virqfd->pt);
>  
>  	/*
>  	 * Check if there was an event already pending on the eventfd
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 1b3e8d2d5c8b..4d27e288bb1d 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -208,7 +208,7 @@ int vhost_poll_start(struct vhost_poll *poll, struct file *file)
>  	if (poll->wqh)
>  		return 0;
>  
> -	mask = file->f_op->poll(file, &poll->table);
> +	mask = vfs_poll(file, &poll->table);
>  	if (mask)
>  		vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
>  	if (mask & EPOLLERR) {
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 0f3494ed3ed0..2bebae5a38cf 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -884,8 +884,7 @@ static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt,
>  
>  	pt->_key = epi->event.events;
>  	if (!is_file_epoll(epi->ffd.file))
> -		return epi->ffd.file->f_op->poll(epi->ffd.file, pt) &
> -		       epi->event.events;
> +		return vfs_poll(epi->ffd.file, pt) & epi->event.events;
>  
>  	ep = epi->ffd.file->private_data;
>  	poll_wait(epi->ffd.file, &ep->poll_wait, pt);
> @@ -2020,7 +2019,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
>  
>  	/* The target file descriptor must support poll */
>  	error = -EPERM;
> -	if (!tf.file->f_op->poll)
> +	if (!file_can_poll(tf.file))
>  		goto error_tgt_fput;
>  
>  	/* Check if EPOLLWAKEUP is allowed */
> diff --git a/fs/select.c b/fs/select.c
> index c6c504a814f9..ba91103707ea 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -502,14 +502,10 @@ static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
>  					continue;
>  				f = fdget(i);
>  				if (f.file) {
> -					const struct file_operations *f_op;
> -					f_op = f.file->f_op;
> -					mask = DEFAULT_POLLMASK;
> -					if (f_op->poll) {
> -						wait_key_set(wait, in, out,
> -							     bit, busy_flag);
> -						mask = (*f_op->poll)(f.file, wait);
> -					}
> +					wait_key_set(wait, in, out, bit,
> +						     busy_flag);
> +					mask = vfs_poll(f.file, wait);
> +
>  					fdput(f);
>  					if ((mask & POLLIN_SET) && (in & bit)) {
>  						res_in |= bit;
> @@ -819,13 +815,10 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
>  
>  	/* userland u16 ->events contains POLL... bitmap */
>  	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
> -	mask = DEFAULT_POLLMASK;
> -	if (f.file->f_op->poll) {
> -		pwait->_key = filter | busy_flag;
> -		mask = f.file->f_op->poll(f.file, pwait);
> -		if (mask & busy_flag)
> -			*can_busy_poll = true;
> -	}
> +	pwait->_key = filter | busy_flag;
> +	mask = vfs_poll(f.file, pwait);
> +	if (mask & busy_flag)
> +		*can_busy_poll = true;
>  	mask &= filter;		/* Mask out unneeded events. */
>  	fdput(f);
>  
> diff --git a/include/linux/poll.h b/include/linux/poll.h
> index a3576da63377..7e0fdcf905d2 100644
> --- a/include/linux/poll.h
> +++ b/include/linux/poll.h
> @@ -74,6 +74,18 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
>  	pt->_key   = ~(__poll_t)0; /* all events enabled */
>  }
>  
> +static inline bool file_can_poll(struct file *file)
> +{
> +	return file->f_op->poll;
> +}
> +
> +static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
> +{
> +	if (unlikely(!file->f_op->poll))
> +		return DEFAULT_POLLMASK;
> +	return file->f_op->poll(file, pt);
> +}
> +
>  struct poll_table_entry {
>  	struct file *filp;
>  	__poll_t key;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 670e99b68aa6..8774ece5c3c3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3849,7 +3849,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
>  	if (ret)
>  		goto out_put_css;
>  
> -	efile.file->f_op->poll(efile.file, &event->pt);
> +	vfs_poll(efile.file, &event->pt);
>  
>  	spin_lock(&memcg->event_list_lock);
>  	list_add(&event->list, &memcg->event_list);
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 0cfba919d167..3811775692d0 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -231,7 +231,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
>  static __poll_t
>  p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
>  {
> -	__poll_t ret, n;
> +	__poll_t ret;
>  	struct p9_trans_fd *ts = NULL;
>  
>  	if (client && client->status == Connected)
> @@ -243,19 +243,9 @@ p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
>  		return EPOLLERR;
>  	}
>  
> -	if (!ts->rd->f_op->poll)
> -		ret = DEFAULT_POLLMASK;
> -	else
> -		ret = ts->rd->f_op->poll(ts->rd, pt);
> -
> -	if (ts->rd != ts->wr) {
> -		if (!ts->wr->f_op->poll)
> -			n = DEFAULT_POLLMASK;
> -		else
> -			n = ts->wr->f_op->poll(ts->wr, pt);
> -		ret = (ret & ~EPOLLOUT) | (n & ~EPOLLIN);
> -	}
> -
> +	ret = vfs_poll(ts->rd, pt);
> +	if (ts->rd != ts->wr)
> +		ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
>  	return ret;
>  }
>  
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 6e865e8b5b10..90d30fbe95ae 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -397,7 +397,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  	 * Check if there was an event already pending on the eventfd
>  	 * before we registered, and trigger it as if we didn't miss it.
>  	 */
> -	events = f.file->f_op->poll(f.file, &irqfd->pt);
> +	events = vfs_poll(f.file, &irqfd->pt);
>  
>  	if (events & EPOLLIN)
>  		schedule_work(&irqfd->inject);
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 07/36] aio: add delayed cancel support
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  3:19     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  3:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:14PM -0800, Christoph Hellwig wrote:
> The upcoming aio poll support would like to be able to complete the
> iocb inline from the cancellation context, but that would cause
> a lock order reversal.  Add support for optionally moving the cancelation
> outside the context lock to avoid this reversal.

I started to wonder which lock order reversal the commit message refers
to?

I think the reason for adding delayed cancellations is that we want to
be able to call io_cancel -> kiocb_cancel -> aio_poll_cancel ->
aio_complete without double locking ctx_lock?

--D

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>
> ---
>  fs/aio.c | 49 ++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 0b6394b4e528..9d7d6e4cde87 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -170,6 +170,10 @@ struct aio_kiocb {
>  	struct list_head	ki_list;	/* the aio core uses this
>  						 * for cancellation */
>  
> +	unsigned int		flags;		/* protected by ctx->ctx_lock */
> +#define AIO_IOCB_DELAYED_CANCEL	(1 << 0)
> +#define AIO_IOCB_CANCELLED	(1 << 1)
> +
>  	/*
>  	 * If the aio_resfd field of the userspace iocb is not zero,
>  	 * this is the underlying eventfd context to deliver events to.
> @@ -536,9 +540,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
>  #define AIO_EVENTS_FIRST_PAGE	((PAGE_SIZE - sizeof(struct aio_ring)) / sizeof(struct io_event))
>  #define AIO_EVENTS_OFFSET	(AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE)
>  
> -void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
> +static void __kiocb_set_cancel_fn(struct aio_kiocb *req,
> +		kiocb_cancel_fn *cancel, unsigned int iocb_flags)
>  {
> -	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
>  	struct kioctx *ctx = req->ki_ctx;
>  	unsigned long flags;
>  
> @@ -548,8 +552,15 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
>  	spin_lock_irqsave(&ctx->ctx_lock, flags);
>  	list_add_tail(&req->ki_list, &ctx->active_reqs);
>  	req->ki_cancel = cancel;
> +	req->flags |= iocb_flags;
>  	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
>  }
> +
> +void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
> +{
> +	return __kiocb_set_cancel_fn(container_of(iocb, struct aio_kiocb, rw),
> +			cancel, 0);
> +}
>  EXPORT_SYMBOL(kiocb_set_cancel_fn);
>  
>  /*
> @@ -603,17 +614,27 @@ static void free_ioctx_users(struct percpu_ref *ref)
>  {
>  	struct kioctx *ctx = container_of(ref, struct kioctx, users);
>  	struct aio_kiocb *req;
> +	LIST_HEAD(list);
>  
>  	spin_lock_irq(&ctx->ctx_lock);
> -
>  	while (!list_empty(&ctx->active_reqs)) {
>  		req = list_first_entry(&ctx->active_reqs,
>  				       struct aio_kiocb, ki_list);
> -		kiocb_cancel(req);
> -	}
>  
> +		if (req->flags & AIO_IOCB_DELAYED_CANCEL) {
> +			req->flags |= AIO_IOCB_CANCELLED;
> +			list_move_tail(&req->ki_list, &list);
> +		} else {
> +			kiocb_cancel(req);
> +		}
> +	}
>  	spin_unlock_irq(&ctx->ctx_lock);
>  
> +	while (!list_empty(&list)) {
> +		req = list_first_entry(&list, struct aio_kiocb, ki_list);
> +		kiocb_cancel(req);
> +	}
> +
>  	percpu_ref_kill(&ctx->reqs);
>  	percpu_ref_put(&ctx->reqs);
>  }
> @@ -1785,15 +1806,22 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
>  	if (unlikely(!ctx))
>  		return -EINVAL;
>  
> -	spin_lock_irq(&ctx->ctx_lock);
> +	ret = -EINVAL;
>  
> +	spin_lock_irq(&ctx->ctx_lock);
>  	kiocb = lookup_kiocb(ctx, iocb, key);
> +	if (kiocb) {
> +		if (kiocb->flags & AIO_IOCB_DELAYED_CANCEL) {
> +			kiocb->flags |= AIO_IOCB_CANCELLED;
> +		} else {
> +			ret = kiocb_cancel(kiocb);
> +			kiocb = NULL;
> +		}
> +	}
> +	spin_unlock_irq(&ctx->ctx_lock);
> +
>  	if (kiocb)
>  		ret = kiocb_cancel(kiocb);
> -	else
> -		ret = -EINVAL;
> -
> -	spin_unlock_irq(&ctx->ctx_lock);
>  
>  	if (!ret) {
>  		/*
> @@ -1805,7 +1833,6 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
>  	}
>  
>  	percpu_ref_put(&ctx->users);
> -
>  	return ret;
>  }
>  
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 07/36] aio: add delayed cancel support
@ 2018-03-20  3:19     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  3:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:14PM -0800, Christoph Hellwig wrote:
> The upcoming aio poll support would like to be able to complete the
> iocb inline from the cancellation context, but that would cause
> a lock order reversal.  Add support for optionally moving the cancelation
> outside the context lock to avoid this reversal.

I started to wonder which lock order reversal the commit message refers
to?

I think the reason for adding delayed cancellations is that we want to
be able to call io_cancel -> kiocb_cancel -> aio_poll_cancel ->
aio_complete without double locking ctx_lock?

--D

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>
> ---
>  fs/aio.c | 49 ++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 0b6394b4e528..9d7d6e4cde87 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -170,6 +170,10 @@ struct aio_kiocb {
>  	struct list_head	ki_list;	/* the aio core uses this
>  						 * for cancellation */
>  
> +	unsigned int		flags;		/* protected by ctx->ctx_lock */
> +#define AIO_IOCB_DELAYED_CANCEL	(1 << 0)
> +#define AIO_IOCB_CANCELLED	(1 << 1)
> +
>  	/*
>  	 * If the aio_resfd field of the userspace iocb is not zero,
>  	 * this is the underlying eventfd context to deliver events to.
> @@ -536,9 +540,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
>  #define AIO_EVENTS_FIRST_PAGE	((PAGE_SIZE - sizeof(struct aio_ring)) / sizeof(struct io_event))
>  #define AIO_EVENTS_OFFSET	(AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE)
>  
> -void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
> +static void __kiocb_set_cancel_fn(struct aio_kiocb *req,
> +		kiocb_cancel_fn *cancel, unsigned int iocb_flags)
>  {
> -	struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
>  	struct kioctx *ctx = req->ki_ctx;
>  	unsigned long flags;
>  
> @@ -548,8 +552,15 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
>  	spin_lock_irqsave(&ctx->ctx_lock, flags);
>  	list_add_tail(&req->ki_list, &ctx->active_reqs);
>  	req->ki_cancel = cancel;
> +	req->flags |= iocb_flags;
>  	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
>  }
> +
> +void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
> +{
> +	return __kiocb_set_cancel_fn(container_of(iocb, struct aio_kiocb, rw),
> +			cancel, 0);
> +}
>  EXPORT_SYMBOL(kiocb_set_cancel_fn);
>  
>  /*
> @@ -603,17 +614,27 @@ static void free_ioctx_users(struct percpu_ref *ref)
>  {
>  	struct kioctx *ctx = container_of(ref, struct kioctx, users);
>  	struct aio_kiocb *req;
> +	LIST_HEAD(list);
>  
>  	spin_lock_irq(&ctx->ctx_lock);
> -
>  	while (!list_empty(&ctx->active_reqs)) {
>  		req = list_first_entry(&ctx->active_reqs,
>  				       struct aio_kiocb, ki_list);
> -		kiocb_cancel(req);
> -	}
>  
> +		if (req->flags & AIO_IOCB_DELAYED_CANCEL) {
> +			req->flags |= AIO_IOCB_CANCELLED;
> +			list_move_tail(&req->ki_list, &list);
> +		} else {
> +			kiocb_cancel(req);
> +		}
> +	}
>  	spin_unlock_irq(&ctx->ctx_lock);
>  
> +	while (!list_empty(&list)) {
> +		req = list_first_entry(&list, struct aio_kiocb, ki_list);
> +		kiocb_cancel(req);
> +	}
> +
>  	percpu_ref_kill(&ctx->reqs);
>  	percpu_ref_put(&ctx->reqs);
>  }
> @@ -1785,15 +1806,22 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
>  	if (unlikely(!ctx))
>  		return -EINVAL;
>  
> -	spin_lock_irq(&ctx->ctx_lock);
> +	ret = -EINVAL;
>  
> +	spin_lock_irq(&ctx->ctx_lock);
>  	kiocb = lookup_kiocb(ctx, iocb, key);
> +	if (kiocb) {
> +		if (kiocb->flags & AIO_IOCB_DELAYED_CANCEL) {
> +			kiocb->flags |= AIO_IOCB_CANCELLED;
> +		} else {
> +			ret = kiocb_cancel(kiocb);
> +			kiocb = NULL;
> +		}
> +	}
> +	spin_unlock_irq(&ctx->ctx_lock);
> +
>  	if (kiocb)
>  		ret = kiocb_cancel(kiocb);
> -	else
> -		ret = -EINVAL;
> -
> -	spin_unlock_irq(&ctx->ctx_lock);
>  
>  	if (!ret) {
>  		/*
> @@ -1805,7 +1833,6 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
>  	}
>  
>  	percpu_ref_put(&ctx->users);
> -
>  	return ret;
>  }
>  
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 13/36] fs: introduce new ->get_poll_head and ->poll_mask methods
  2018-03-05 21:27   ` Christoph Hellwig
@ 2018-03-20  3:29     ` Darrick J. Wong
  -1 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  3:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:20PM -0800, Christoph Hellwig wrote:
> ->get_poll_head returns the waitqueue that the poll operation is going
> to sleep on.  Note that this means we can only use a single waitqueue
> for the poll, unlike some current drivers that use two waitqueues for
> different events.  But now that we have keyed wakeups and heavily use
> those for poll there aren't that many good reason left to keep the
> multiple waitqueues, and if there are any ->poll is still around, the
> driver just won't support aio poll.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I've been wondering, how does a regular filesystem connect with this?
Also, does anything implement get_poll_head?  It looks to me like an aio
poll provider has to provide both...

--D

> ---
>  Documentation/filesystems/Locking |  7 ++++++-
>  Documentation/filesystems/vfs.txt | 13 +++++++++++++
>  fs/select.c                       | 28 ++++++++++++++++++++++++++++
>  include/linux/fs.h                |  2 ++
>  include/linux/poll.h              | 27 +++++++++++++++++++++++----
>  5 files changed, 72 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index 220bba28f72b..6d227f9d7bd9 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -440,6 +440,8 @@ prototypes:
>  	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>  	int (*iterate) (struct file *, struct dir_context *);
>  	__poll_t (*poll) (struct file *, struct poll_table_struct *);
> +	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
> +	__poll_t (*poll_mask) (struct file *, __poll_t);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> @@ -470,7 +472,7 @@ prototypes:
>  };
>  
>  locking rules:
> -	All may block.
> +	All except for ->poll_mask may block.
>  
>  ->llseek() locking has moved from llseek to the individual llseek
>  implementations.  If your fs is not using generic_file_llseek, you
> @@ -498,6 +500,9 @@ in sys_read() and friends.
>  the lease within the individual filesystem to record the result of the
>  operation
>  
> +->poll_mask can be called with or without the waitqueue lock for the waitqueue
> +returned from ->get_poll_head.
> +
>  --------------------------- dquot_operations -------------------------------
>  prototypes:
>  	int (*write_dquot) (struct dquot *);
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index f608180ad59d..50ee13563271 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -857,6 +857,8 @@ struct file_operations {
>  	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>  	int (*iterate) (struct file *, struct dir_context *);
>  	__poll_t (*poll) (struct file *, struct poll_table_struct *);
> +	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
> +	__poll_t (*poll_mask) (struct file *, __poll_t);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> @@ -901,6 +903,17 @@ otherwise noted.
>  	activity on this file and (optionally) go to sleep until there
>  	is activity. Called by the select(2) and poll(2) system calls
>  
> +  get_poll_head: Returns the struct wait_queue_head that poll, select,
> +  epoll or aio poll should wait on in case this instance only has single
> +  waitqueue.  Can return NULL to indicate polling is not supported,
> +  or a POLL* value using the POLL_TO_PTR helper in case a grave error
> +  occured and ->poll_mask shall not be called.
> +
> +  poll_mask: return the mask of POLL* values describing the file descriptor
> +  state.  Called either before going to sleep on the waitqueue returned by
> +  get_poll_head, or after it has been woken.  If ->get_poll_head and
> +  ->poll_mask are implemented ->poll does not need to be implement.
> +
>    unlocked_ioctl: called by the ioctl(2) system call.
>  
>    compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
> diff --git a/fs/select.c b/fs/select.c
> index ba91103707ea..cc270d7f6192 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -34,6 +34,34 @@
>  
>  #include <linux/uaccess.h>
>  
> +__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
> +{
> +	unsigned int events = poll_requested_events(pt);
> +	struct wait_queue_head *head;
> +
> +	if (unlikely(!file_can_poll(file)))
> +		return DEFAULT_POLLMASK;
> +
> +	if (file->f_op->poll)
> +		return file->f_op->poll(file, pt);
> +
> +	/*
> +	 * Only get the poll head and do the first mask check if we are actually
> +	 * going to sleep on this file:
> +	 */
> +	if (pt && pt->_qproc) {
> +		head = vfs_get_poll_head(file, events);
> +		if (!head)
> +			return DEFAULT_POLLMASK;
> +		if (IS_ERR(head))
> +			return PTR_TO_POLL(head);
> +
> +		pt->_qproc(file, head, pt);
> +	}
> +
> +	return file->f_op->poll_mask(file, events);
> +}
> +EXPORT_SYMBOL_GPL(vfs_poll);
>  
>  /*
>   * Estimate expected accuracy in ns from a timeval.
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 79c413985305..6ea2c0843bb1 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1708,6 +1708,8 @@ struct file_operations {
>  	int (*iterate) (struct file *, struct dir_context *);
>  	int (*iterate_shared) (struct file *, struct dir_context *);
>  	__poll_t (*poll) (struct file *, struct poll_table_struct *);
> +	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
> +	__poll_t (*poll_mask) (struct file *, __poll_t);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> diff --git a/include/linux/poll.h b/include/linux/poll.h
> index 7e0fdcf905d2..42e8e8665fb0 100644
> --- a/include/linux/poll.h
> +++ b/include/linux/poll.h
> @@ -74,18 +74,37 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
>  	pt->_key   = ~(__poll_t)0; /* all events enabled */
>  }
>  
> +/*
> + * ->get_poll_head can return a __poll_t in the PTR_ERR, use these macros
> + * to return the value and recover it.  It takes care of the negation as
> + * well as off the annotations.
> + */
> +#define POLL_TO_PTR(mask)	(ERR_PTR(-(__force int)(mask)))
> +#define PTR_TO_POLL(ptr)	((__force __poll_t)-PTR_ERR((ptr)))
> +
>  static inline bool file_can_poll(struct file *file)
>  {
> -	return file->f_op->poll;
> +	return file->f_op->poll ||
> +		(file->f_op->get_poll_head && file->f_op->poll_mask);
>  }
>  
> -static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
> +static inline struct wait_queue_head *vfs_get_poll_head(struct file *file,
> +		__poll_t events)
>  {
> -	if (unlikely(!file->f_op->poll))
> +	if (unlikely(!file->f_op->get_poll_head || !file->f_op->poll_mask))
> +		return NULL;
> +	return file->f_op->get_poll_head(file, events);
> +}
> +
> +static inline __poll_t vfs_poll_mask(struct file *file, __poll_t events)
> +{
> +	if (unlikely(!file->f_op->poll_mask))
>  		return DEFAULT_POLLMASK;
> -	return file->f_op->poll(file, pt);
> +	return file->f_op->poll_mask(file, events) & events;
>  }
>  
> +__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt);
> +
>  struct poll_table_entry {
>  	struct file *filp;
>  	__poll_t key;
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 13/36] fs: introduce new ->get_poll_head and ->poll_mask methods
@ 2018-03-20  3:29     ` Darrick J. Wong
  0 siblings, 0 replies; 120+ messages in thread
From: Darrick J. Wong @ 2018-03-20  3:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel

On Mon, Mar 05, 2018 at 01:27:20PM -0800, Christoph Hellwig wrote:
> ->get_poll_head returns the waitqueue that the poll operation is going
> to sleep on.  Note that this means we can only use a single waitqueue
> for the poll, unlike some current drivers that use two waitqueues for
> different events.  But now that we have keyed wakeups and heavily use
> those for poll there aren't that many good reason left to keep the
> multiple waitqueues, and if there are any ->poll is still around, the
> driver just won't support aio poll.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I've been wondering, how does a regular filesystem connect with this?
Also, does anything implement get_poll_head?  It looks to me like an aio
poll provider has to provide both...

--D

> ---
>  Documentation/filesystems/Locking |  7 ++++++-
>  Documentation/filesystems/vfs.txt | 13 +++++++++++++
>  fs/select.c                       | 28 ++++++++++++++++++++++++++++
>  include/linux/fs.h                |  2 ++
>  include/linux/poll.h              | 27 +++++++++++++++++++++++----
>  5 files changed, 72 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index 220bba28f72b..6d227f9d7bd9 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -440,6 +440,8 @@ prototypes:
>  	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>  	int (*iterate) (struct file *, struct dir_context *);
>  	__poll_t (*poll) (struct file *, struct poll_table_struct *);
> +	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
> +	__poll_t (*poll_mask) (struct file *, __poll_t);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> @@ -470,7 +472,7 @@ prototypes:
>  };
>  
>  locking rules:
> -	All may block.
> +	All except for ->poll_mask may block.
>  
>  ->llseek() locking has moved from llseek to the individual llseek
>  implementations.  If your fs is not using generic_file_llseek, you
> @@ -498,6 +500,9 @@ in sys_read() and friends.
>  the lease within the individual filesystem to record the result of the
>  operation
>  
> +->poll_mask can be called with or without the waitqueue lock for the waitqueue
> +returned from ->get_poll_head.
> +
>  --------------------------- dquot_operations -------------------------------
>  prototypes:
>  	int (*write_dquot) (struct dquot *);
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index f608180ad59d..50ee13563271 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -857,6 +857,8 @@ struct file_operations {
>  	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>  	int (*iterate) (struct file *, struct dir_context *);
>  	__poll_t (*poll) (struct file *, struct poll_table_struct *);
> +	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
> +	__poll_t (*poll_mask) (struct file *, __poll_t);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> @@ -901,6 +903,17 @@ otherwise noted.
>  	activity on this file and (optionally) go to sleep until there
>  	is activity. Called by the select(2) and poll(2) system calls
>  
> +  get_poll_head: Returns the struct wait_queue_head that poll, select,
> +  epoll or aio poll should wait on in case this instance only has single
> +  waitqueue.  Can return NULL to indicate polling is not supported,
> +  or a POLL* value using the POLL_TO_PTR helper in case a grave error
> +  occured and ->poll_mask shall not be called.
> +
> +  poll_mask: return the mask of POLL* values describing the file descriptor
> +  state.  Called either before going to sleep on the waitqueue returned by
> +  get_poll_head, or after it has been woken.  If ->get_poll_head and
> +  ->poll_mask are implemented ->poll does not need to be implement.
> +
>    unlocked_ioctl: called by the ioctl(2) system call.
>  
>    compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
> diff --git a/fs/select.c b/fs/select.c
> index ba91103707ea..cc270d7f6192 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -34,6 +34,34 @@
>  
>  #include <linux/uaccess.h>
>  
> +__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
> +{
> +	unsigned int events = poll_requested_events(pt);
> +	struct wait_queue_head *head;
> +
> +	if (unlikely(!file_can_poll(file)))
> +		return DEFAULT_POLLMASK;
> +
> +	if (file->f_op->poll)
> +		return file->f_op->poll(file, pt);
> +
> +	/*
> +	 * Only get the poll head and do the first mask check if we are actually
> +	 * going to sleep on this file:
> +	 */
> +	if (pt && pt->_qproc) {
> +		head = vfs_get_poll_head(file, events);
> +		if (!head)
> +			return DEFAULT_POLLMASK;
> +		if (IS_ERR(head))
> +			return PTR_TO_POLL(head);
> +
> +		pt->_qproc(file, head, pt);
> +	}
> +
> +	return file->f_op->poll_mask(file, events);
> +}
> +EXPORT_SYMBOL_GPL(vfs_poll);
>  
>  /*
>   * Estimate expected accuracy in ns from a timeval.
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 79c413985305..6ea2c0843bb1 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1708,6 +1708,8 @@ struct file_operations {
>  	int (*iterate) (struct file *, struct dir_context *);
>  	int (*iterate_shared) (struct file *, struct dir_context *);
>  	__poll_t (*poll) (struct file *, struct poll_table_struct *);
> +	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
> +	__poll_t (*poll_mask) (struct file *, __poll_t);
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> diff --git a/include/linux/poll.h b/include/linux/poll.h
> index 7e0fdcf905d2..42e8e8665fb0 100644
> --- a/include/linux/poll.h
> +++ b/include/linux/poll.h
> @@ -74,18 +74,37 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
>  	pt->_key   = ~(__poll_t)0; /* all events enabled */
>  }
>  
> +/*
> + * ->get_poll_head can return a __poll_t in the PTR_ERR, use these macros
> + * to return the value and recover it.  It takes care of the negation as
> + * well as off the annotations.
> + */
> +#define POLL_TO_PTR(mask)	(ERR_PTR(-(__force int)(mask)))
> +#define PTR_TO_POLL(ptr)	((__force __poll_t)-PTR_ERR((ptr)))
> +
>  static inline bool file_can_poll(struct file *file)
>  {
> -	return file->f_op->poll;
> +	return file->f_op->poll ||
> +		(file->f_op->get_poll_head && file->f_op->poll_mask);
>  }
>  
> -static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
> +static inline struct wait_queue_head *vfs_get_poll_head(struct file *file,
> +		__poll_t events)
>  {
> -	if (unlikely(!file->f_op->poll))
> +	if (unlikely(!file->f_op->get_poll_head || !file->f_op->poll_mask))
> +		return NULL;
> +	return file->f_op->get_poll_head(file, events);
> +}
> +
> +static inline __poll_t vfs_poll_mask(struct file *file, __poll_t events)
> +{
> +	if (unlikely(!file->f_op->poll_mask))
>  		return DEFAULT_POLLMASK;
> -	return file->f_op->poll(file, pt);
> +	return file->f_op->poll_mask(file, events) & events;
>  }
>  
> +__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt);
> +
>  struct poll_table_entry {
>  	struct file *filp;
>  	__poll_t key;
> -- 
> 2.14.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 07/36] aio: add delayed cancel support
  2018-03-20  3:19     ` Darrick J. Wong
@ 2018-03-20 15:20       ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-20 15:20 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

On Mon, Mar 19, 2018 at 08:19:57PM -0700, Darrick J. Wong wrote:
> On Mon, Mar 05, 2018 at 01:27:14PM -0800, Christoph Hellwig wrote:
> > The upcoming aio poll support would like to be able to complete the
> > iocb inline from the cancellation context, but that would cause
> > a lock order reversal.  Add support for optionally moving the cancelation
> > outside the context lock to avoid this reversal.
> 
> I started to wonder which lock order reversal the commit message refers
> to?
> 
> I think the reason for adding delayed cancellations is that we want to
> be able to call io_cancel -> kiocb_cancel -> aio_poll_cancel ->
> aio_complete without double locking ctx_lock?

It is.  I've updated the commit message.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 07/36] aio: add delayed cancel support
@ 2018-03-20 15:20       ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-20 15:20 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

On Mon, Mar 19, 2018 at 08:19:57PM -0700, Darrick J. Wong wrote:
> On Mon, Mar 05, 2018 at 01:27:14PM -0800, Christoph Hellwig wrote:
> > The upcoming aio poll support would like to be able to complete the
> > iocb inline from the cancellation context, but that would cause
> > a lock order reversal.  Add support for optionally moving the cancelation
> > outside the context lock to avoid this reversal.
> 
> I started to wonder which lock order reversal the commit message refers
> to?
> 
> I think the reason for adding delayed cancellations is that we want to
> be able to call io_cancel -> kiocb_cancel -> aio_poll_cancel ->
> aio_complete without double locking ctx_lock?

It is.  I've updated the commit message.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
  2018-03-20  2:12     ` Darrick J. Wong
@ 2018-03-20 15:22       ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-20 15:22 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

On Mon, Mar 19, 2018 at 07:12:41PM -0700, Darrick J. Wong wrote:
> > Note that unlike many other signal related calls we do not pass a sigmask
> > size, as that would get us to 7 arguments, which aren't easily supported
> > by the syscall infrastructure.  It seems a lot less painful to just add a
> > new syscall variant in the unlikely case we're going to increase the
> > sigset size.
> 
> I'm assuming there's a proposed manpage update for this somewhere? :)

In this commit:

http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-20 15:22       ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-20 15:22 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

On Mon, Mar 19, 2018 at 07:12:41PM -0700, Darrick J. Wong wrote:
> > Note that unlike many other signal related calls we do not pass a sigmask
> > size, as that would get us to 7 arguments, which aren't easily supported
> > by the syscall infrastructure.  It seems a lot less painful to just add a
> > new syscall variant in the unlikely case we're going to increase the
> > sigset size.
> 
> I'm assuming there's a proposed manpage update for this somewhere? :)

In this commit:

http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
  2018-03-20 15:22       ` Christoph Hellwig
  (?)
@ 2018-03-20 15:30         ` Jeff Moyer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-20 15:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> On Mon, Mar 19, 2018 at 07:12:41PM -0700, Darrick J. Wong wrote:
>> > Note that unlike many other signal related calls we do not pass a sigmask
>> > size, as that would get us to 7 arguments, which aren't easily supported
>> > by the syscall infrastructure.  It seems a lot less painful to just add a
>> > new syscall variant in the unlikely case we're going to increase the
>> > sigset size.
>> 
>> I'm assuming there's a proposed manpage update for this somewhere? :)
>
> In this commit:
>
> http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56

BTW, the man pages are shipped in the man-pages package, now.
Christoph, I can forward the change to Michael after this series goes
in.  Just let me know what's easiest for you.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-20 15:30         ` Jeff Moyer
  0 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-20 15:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> On Mon, Mar 19, 2018 at 07:12:41PM -0700, Darrick J. Wong wrote:
>> > Note that unlike many other signal related calls we do not pass a sigmask
>> > size, as that would get us to 7 arguments, which aren't easily supported
>> > by the syscall infrastructure.  It seems a lot less painful to just add a
>> > new syscall variant in the unlikely case we're going to increase the
>> > sigset size.
>> 
>> I'm assuming there's a proposed manpage update for this somewhere? :)
>
> In this commit:
>
> http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56

BTW, the man pages are shipped in the man-pages package, now.
Christoph, I can forward the change to Michael after this series goes
in.  Just let me know what's easiest for you.

Cheers,
Jeff

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-20 15:30         ` Jeff Moyer
  0 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-20 15:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> On Mon, Mar 19, 2018 at 07:12:41PM -0700, Darrick J. Wong wrote:
>> > Note that unlike many other signal related calls we do not pass a sigmask
>> > size, as that would get us to 7 arguments, which aren't easily supported
>> > by the syscall infrastructure.  It seems a lot less painful to just add a
>> > new syscall variant in the unlikely case we're going to increase the
>> > sigset size.
>> 
>> I'm assuming there's a proposed manpage update for this somewhere? :)
>
> In this commit:
>
> http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56

BTW, the man pages are shipped in the man-pages package, now.
Christoph, I can forward the change to Michael after this series goes
in.  Just let me know what's easiest for you.

Cheers,
Jeff

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
  2018-03-20 15:30         ` Jeff Moyer
@ 2018-03-20 15:31           ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-20 15:31 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, Darrick J. Wong, viro, Avi Kivity, linux-aio,
	linux-fsdevel, netdev, linux-api, linux-kernel

On Tue, Mar 20, 2018 at 11:30:29AM -0400, Jeff Moyer wrote:
> > In this commit:
> >
> > http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56
> 
> BTW, the man pages are shipped in the man-pages package, now.
> Christoph, I can forward the change to Michael after this series goes
> in.  Just let me know what's easiest for you.

I'd be more than happy to send patch to Michael if we get these
patches merged finally :)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-20 15:31           ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-20 15:31 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, Darrick J. Wong, viro, Avi Kivity, linux-aio,
	linux-fsdevel, netdev, linux-api, linux-kernel

On Tue, Mar 20, 2018 at 11:30:29AM -0400, Jeff Moyer wrote:
> > In this commit:
> >
> > http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56
> 
> BTW, the man pages are shipped in the man-pages package, now.
> Christoph, I can forward the change to Michael after this series goes
> in.  Just let me know what's easiest for you.

I'd be more than happy to send patch to Michael if we get these
patches merged finally :)

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
  2018-03-20 15:31           ` Christoph Hellwig
  (?)
@ 2018-03-20 15:34             ` Jeff Moyer
  -1 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-20 15:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> On Tue, Mar 20, 2018 at 11:30:29AM -0400, Jeff Moyer wrote:
>> > In this commit:
>> >
>> > http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56
>> 
>> BTW, the man pages are shipped in the man-pages package, now.
>> Christoph, I can forward the change to Michael after this series goes
>> in.  Just let me know what's easiest for you.
>
> I'd be more than happy to send patch to Michael if we get these
> patches merged finally :)

hahaha, least of your problems, I know.  :)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-20 15:34             ` Jeff Moyer
  0 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-20 15:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> On Tue, Mar 20, 2018 at 11:30:29AM -0400, Jeff Moyer wrote:
>> > In this commit:
>> >
>> > http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56
>> 
>> BTW, the man pages are shipped in the man-pages package, now.
>> Christoph, I can forward the change to Michael after this series goes
>> in.  Just let me know what's easiest for you.
>
> I'd be more than happy to send patch to Michael if we get these
> patches merged finally :)

hahaha, least of your problems, I know.  :)

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/36] aio: implement io_pgetevents
@ 2018-03-20 15:34             ` Jeff Moyer
  0 siblings, 0 replies; 120+ messages in thread
From: Jeff Moyer @ 2018-03-20 15:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> On Tue, Mar 20, 2018 at 11:30:29AM -0400, Jeff Moyer wrote:
>> > In this commit:
>> >
>> > http://git.infradead.org/users/hch/libaio.git/commitdiff/49f77d595210393ce7b125cb00233cf737402f56
>> 
>> BTW, the man pages are shipped in the man-pages package, now.
>> Christoph, I can forward the change to Michael after this series goes
>> in.  Just let me know what's easiest for you.
>
> I'd be more than happy to send patch to Michael if we get these
> patches merged finally :)

hahaha, least of your problems, I know.  :)

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 13/36] fs: introduce new ->get_poll_head and ->poll_mask methods
  2018-03-20  3:29     ` Darrick J. Wong
@ 2018-03-20 15:39       ` Christoph Hellwig
  -1 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-20 15:39 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

On Mon, Mar 19, 2018 at 08:29:38PM -0700, Darrick J. Wong wrote:
> On Mon, Mar 05, 2018 at 01:27:20PM -0800, Christoph Hellwig wrote:
> > ->get_poll_head returns the waitqueue that the poll operation is going
> > to sleep on.  Note that this means we can only use a single waitqueue
> > for the poll, unlike some current drivers that use two waitqueues for
> > different events.  But now that we have keyed wakeups and heavily use
> > those for poll there aren't that many good reason left to keep the
> > multiple waitqueues, and if there are any ->poll is still around, the
> > driver just won't support aio poll.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> I've been wondering, how does a regular filesystem connect with this?

In general, it doesn't.  In Unix regular files aren't pollable.

> Also, does anything implement get_poll_head?  It looks to me like an aio
> poll provider has to provide both...

Yes.  Everyone who implements ->poll_mask also needs to implement
get_poll_head.  For sockets we just happen to be able to use a generic
implementation in net/socket.c for most of them.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 13/36] fs: introduce new ->get_poll_head and ->poll_mask methods
@ 2018-03-20 15:39       ` Christoph Hellwig
  0 siblings, 0 replies; 120+ messages in thread
From: Christoph Hellwig @ 2018-03-20 15:39 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro, Avi Kivity, linux-aio, linux-fsdevel,
	netdev, linux-api, linux-kernel

On Mon, Mar 19, 2018 at 08:29:38PM -0700, Darrick J. Wong wrote:
> On Mon, Mar 05, 2018 at 01:27:20PM -0800, Christoph Hellwig wrote:
> > ->get_poll_head returns the waitqueue that the poll operation is going
> > to sleep on.  Note that this means we can only use a single waitqueue
> > for the poll, unlike some current drivers that use two waitqueues for
> > different events.  But now that we have keyed wakeups and heavily use
> > those for poll there aren't that many good reason left to keep the
> > multiple waitqueues, and if there are any ->poll is still around, the
> > driver just won't support aio poll.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> I've been wondering, how does a regular filesystem connect with this?

In general, it doesn't.  In Unix regular files aren't pollable.

> Also, does anything implement get_poll_head?  It looks to me like an aio
> poll provider has to provide both...

Yes.  Everyone who implements ->poll_mask also needs to implement
get_poll_head.  For sockets we just happen to be able to use a generic
implementation in net/socket.c for most of them.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2018-03-20 15:39 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05 21:27 aio poll, io_pgetevents and a new in-kernel poll API V5 Christoph Hellwig
2018-03-05 21:27 ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 01/36] aio: don't print the page size at boot time Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  0:11   ` Darrick J. Wong
2018-03-20  0:11     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 02/36] aio: remove an outdated comment in aio_complete Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  0:12   ` Darrick J. Wong
2018-03-20  0:12     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 03/36] aio: refactor read/write iocb setup Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  0:19   ` Darrick J. Wong
2018-03-20  0:19     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 04/36] aio: sanitize ki_list handling Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  0:21   ` Darrick J. Wong
2018-03-20  0:21     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 05/36] aio: simplify cancellation Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  0:25   ` Darrick J. Wong
2018-03-20  0:25     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 06/36] aio: delete iocbs from the active_reqs list in kiocb_cancel Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  0:34   ` Darrick J. Wong
2018-03-20  0:34     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 07/36] aio: add delayed cancel support Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  3:19   ` Darrick J. Wong
2018-03-20  3:19     ` Darrick J. Wong
2018-03-20 15:20     ` Christoph Hellwig
2018-03-20 15:20       ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 08/36] aio: implement io_pgetevents Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:51   ` Jeff Moyer
2018-03-05 21:51     ` Jeff Moyer
2018-03-05 21:51     ` Jeff Moyer
2018-03-20  2:12   ` Darrick J. Wong
2018-03-20  2:12     ` Darrick J. Wong
2018-03-20 15:22     ` Christoph Hellwig
2018-03-20 15:22       ` Christoph Hellwig
2018-03-20 15:30       ` Jeff Moyer
2018-03-20 15:30         ` Jeff Moyer
2018-03-20 15:30         ` Jeff Moyer
2018-03-20 15:31         ` Christoph Hellwig
2018-03-20 15:31           ` Christoph Hellwig
2018-03-20 15:34           ` Jeff Moyer
2018-03-20 15:34             ` Jeff Moyer
2018-03-20 15:34             ` Jeff Moyer
2018-03-05 21:27 ` [PATCH 09/36] fs: unexport poll_schedule_timeout Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  2:13   ` Darrick J. Wong
2018-03-20  2:13     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 10/36] fs: cleanup do_pollfd Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  2:14   ` Darrick J. Wong
2018-03-20  2:14     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 11/36] fs: update documentation for __poll_t Christoph Hellwig
2018-03-20  2:19   ` Darrick J. Wong
2018-03-20  2:19     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 12/36] fs: add new vfs_poll and file_can_poll helpers Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  2:27   ` Darrick J. Wong
2018-03-20  2:27     ` Darrick J. Wong
2018-03-05 21:27 ` [PATCH 13/36] fs: introduce new ->get_poll_head and ->poll_mask methods Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-20  3:29   ` Darrick J. Wong
2018-03-20  3:29     ` Darrick J. Wong
2018-03-20 15:39     ` Christoph Hellwig
2018-03-20 15:39       ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 14/36] aio: implement IOCB_CMD_POLL Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:51   ` Jeff Moyer
2018-03-05 21:27 ` [PATCH 15/36] net: refactor socket_poll Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 16/36] net: add support for ->poll_mask in proto_ops Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 17/36] net: remove sock_no_poll Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 18/36] net/tcp: convert to ->poll_mask Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 19/36] net/unix: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 20/36] net: convert datagram_poll users tp ->poll_mask Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 21/36] net/dccp: convert to ->poll_mask Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 22/36] net/atm: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 23/36] net/vmw_vsock: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 24/36] net/tipc: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 25/36] net/sctp: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 26/36] net/bluetooth: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 27/36] net/caif: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 28/36] net/nfc: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 29/36] net/phonet: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 30/36] net/iucv: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 31/36] net/rxrpc: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 32/36] crypto: af_alg: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 33/36] pipe: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 34/36] eventfd: switch " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 35/36] timerfd: convert " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-05 21:27 ` [PATCH 36/36] random: " Christoph Hellwig
2018-03-05 21:27   ` Christoph Hellwig
2018-03-13  7:46 ` aio poll, io_pgetevents and a new in-kernel poll API V5 Christoph Hellwig
2018-03-19  8:35 ` Christoph Hellwig
2018-03-19  8:35   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.