linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* aio poll and a new in-kernel poll API V13
@ 2018-05-23 19:19 Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 01/33] fix io_destroy()/aio_complete() race Christoph Hellwig
                   ` (33 more replies)
  0 siblings, 34 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Hi all,

this series adds support for the IOCB_CMD_POLL operation to poll for the
readyness of file descriptors using the aio subsystem.  The API is based
on patches that existed in RHAS2.1 and RHEL3, which means it already is
supported by libaio.  To implement the poll support efficiently new
methods to poll are introduced in struct file_operations:  get_poll_head
and poll_mask.  The first one returns a wait_queue_head to wait on
(lifetime is bound by the file), and the second does a non-blocking
check for the POLL* events.  This allows aio poll to work without
any additional context switches, unlike epoll.

This series sits on top of the aio-fsync series that also includes
support for io_pgetevents.

The changes were sponsored by Scylladb, and improve performance
of the seastar framework up to 10%, while also removing the need
for a privileged SCHED_FIFO epoll listener thread.

    git://git.infradead.org/users/hch/vfs.git aio-poll.13

Gitweb:

    http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.13

Libaio changes:

    https://pagure.io/libaio.git io-poll

Seastar changes (not updated for the new io_pgetevens ABI yet):

    https://github.com/avikivity/seastar/commits/aio

Changes since v12:
 - remove iocb from ki_list only after ki_cancel has completed
 - fix __poll_t annotations
 - turn __poll_t sparse checkin on by default
 - call fput after aio_complete
 - only add the iocb to active_reqs if we wait for it

Changes since v11:
 - simplify cancellation by completion poll requests from a workqueue
   if we can't take the ctx_lock

Changes since v10:
 - fixed a mismerge that let a sock_rps_record_flow sneak into
   tcp_poll_mask
 - remove the now unused struct proto_ops get_poll_head method

Changes since v9:
 - add to the delayed_cancel_reqs earlier to avoid a race
 - get rid of POLL_TO_PTR magic

Changes since v8:
 - make delayed cancellation conditional again
 - add a cancel_kiocb file operation to split delayed vs normal cancel

Changes since v7:
 - make delayed cancellation safe and unconditional

Changes since v6:
 - reworked cancellation

Changes since v5:
 - small changelog updates
 - rebased on top of the aio-fsync changes

Changes since v4:
 - rebased ontop of Linux 4.16-rc4

Changes since v3:
 - remove the pre-sleep ->poll_mask call in vfs_poll,
   allow ->get_poll_head to return POLL* values.

Changes since v2:
 - removed a double initialization
 - new vfs_get_poll_head helper
 - document that ->get_poll_head can return NULL
 - call ->poll_mask before sleeping
 - various ACKs
 - add conversion of random to ->poll_mask
 - add conversion of af_alg to ->poll_mask
 - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
 - reshuffled the series so that prep patches and everything not
   requiring the new in-kernel poll API is in the beginning

Changes since v1:
 - handle the NULL ->poll case in vfs_poll
 - dropped the file argument to the ->poll_mask socket operation
 - replace the ->pre_poll socket operation with ->get_poll_head as
   in the file operations

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 01/33] fix io_destroy()/aio_complete() race
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 02/33] uapi: turn __poll_t sparse checkin on by default Christoph Hellwig
                   ` (32 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel, stable

From: Al Viro <viro@zeniv.linux.org.uk>

If io_destroy() gets to cancelling everything that can be cancelled and
gets to kiocb_cancel() calling the function driver has left in ->ki_cancel,
it becomes vulnerable to a race with IO completion.  At that point req
is already taken off the list and aio_complete() does *NOT* spin until
we (in free_ioctx_users()) releases ->ctx_lock.  As the result, it proceeds
to kiocb_free(), freing req just it gets passed to ->ki_cancel().

Fix is simple - remove from the list after the call of kiocb_cancel().  All
instances of ->ki_cancel() already have to cope with the being called with
iocb still on list - that's what happens in io_cancel(2).

Cc: stable@kernel.org
Fixes: 0460fef2a921 "aio: use cancellation list lazily"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 755d3f57bcc8..1c383bb44b2d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -639,9 +639,8 @@ static void free_ioctx_users(struct percpu_ref *ref)
 	while (!list_empty(&ctx->active_reqs)) {
 		req = list_first_entry(&ctx->active_reqs,
 				       struct aio_kiocb, ki_list);
-
-		list_del_init(&req->ki_list);
 		kiocb_cancel(req);
+		list_del_init(&req->ki_list);
 	}
 
 	spin_unlock_irq(&ctx->ctx_lock);
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 02/33] uapi: turn __poll_t sparse checkin on by default
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 01/33] fix io_destroy()/aio_complete() race Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 03/33] fs: unexport poll_schedule_timeout Christoph Hellwig
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/uapi/linux/types.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/uapi/linux/types.h b/include/uapi/linux/types.h
index cd4f0b897a48..2fce8b6876e9 100644
--- a/include/uapi/linux/types.h
+++ b/include/uapi/linux/types.h
@@ -49,11 +49,7 @@ typedef __u32 __bitwise __wsum;
 #define __aligned_be64 __be64 __attribute__((aligned(8)))
 #define __aligned_le64 __le64 __attribute__((aligned(8)))
 
-#ifdef __CHECK_POLL
 typedef unsigned __bitwise __poll_t;
-#else
-typedef unsigned __poll_t;
-#endif
 
 #endif /*  __ASSEMBLY__ */
 #endif /* _UAPI_LINUX_TYPES_H */
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 03/33] fs: unexport poll_schedule_timeout
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 01/33] fix io_destroy()/aio_complete() race Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 02/33] uapi: turn __poll_t sparse checkin on by default Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 04/33] fs: cleanup do_pollfd Christoph Hellwig
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

No users outside of select.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/select.c          | 3 +--
 include/linux/poll.h | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index ba879c51288f..a87f396f0313 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -233,7 +233,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
 	add_wait_queue(wait_address, &entry->wait);
 }
 
-int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
+static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
 			  ktime_t *expires, unsigned long slack)
 {
 	int rc = -EINTR;
@@ -258,7 +258,6 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
 
 	return rc;
 }
-EXPORT_SYMBOL(poll_schedule_timeout);
 
 /**
  * poll_select_set_timeout - helper function to setup the timeout value
diff --git a/include/linux/poll.h b/include/linux/poll.h
index f45ebd017eaa..a3576da63377 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -96,8 +96,6 @@ struct poll_wqueues {
 
 extern void poll_initwait(struct poll_wqueues *pwq);
 extern void poll_freewait(struct poll_wqueues *pwq);
-extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
-				 ktime_t *expires, unsigned long slack);
 extern u64 select_estimate_accuracy(struct timespec64 *tv);
 
 #define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1)
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 04/33] fs: cleanup do_pollfd
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-05-23 19:19 ` [PATCH 03/33] fs: unexport poll_schedule_timeout Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 05/33] fs: update documentation to mention __poll_t and match the code Christoph Hellwig
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Use straightline code with failure handling gotos instead of a lot
of nested conditionals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/select.c | 48 +++++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index a87f396f0313..25da26253485 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -812,34 +812,32 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 				     bool *can_busy_poll,
 				     __poll_t busy_flag)
 {
-	__poll_t mask;
-	int fd;
-
-	mask = 0;
-	fd = pollfd->fd;
-	if (fd >= 0) {
-		struct fd f = fdget(fd);
-		mask = EPOLLNVAL;
-		if (f.file) {
-			/* userland u16 ->events contains POLL... bitmap */
-			__poll_t filter = demangle_poll(pollfd->events) |
-						EPOLLERR | EPOLLHUP;
-			mask = DEFAULT_POLLMASK;
-			if (f.file->f_op->poll) {
-				pwait->_key = filter;
-				pwait->_key |= busy_flag;
-				mask = f.file->f_op->poll(f.file, pwait);
-				if (mask & busy_flag)
-					*can_busy_poll = true;
-			}
-			/* Mask out unneeded events. */
-			mask &= filter;
-			fdput(f);
-		}
+	int fd = pollfd->fd;
+	__poll_t mask = 0, filter;
+	struct fd f;
+
+	if (fd < 0)
+		goto out;
+	mask = EPOLLNVAL;
+	f = fdget(fd);
+	if (!f.file)
+		goto out;
+
+	/* userland u16 ->events contains POLL... bitmap */
+	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
+	mask = DEFAULT_POLLMASK;
+	if (f.file->f_op->poll) {
+		pwait->_key = filter | busy_flag;
+		mask = f.file->f_op->poll(f.file, pwait);
+		if (mask & busy_flag)
+			*can_busy_poll = true;
 	}
+	mask &= filter;		/* Mask out unneeded events. */
+	fdput(f);
+
+out:
 	/* ... and so does ->revents */
 	pollfd->revents = mangle_poll(mask);
-
 	return mask;
 }
 
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 05/33] fs: update documentation to mention __poll_t and match the code
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-05-23 19:19 ` [PATCH 04/33] fs: cleanup do_pollfd Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 06/33] fs: add new vfs_poll and file_can_poll helpers Christoph Hellwig
                   ` (28 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/filesystems/Locking | 2 +-
 Documentation/filesystems/vfs.txt | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 75d2d57e2c44..220bba28f72b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -439,7 +439,7 @@ prototypes:
 	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
-	unsigned int (*poll) (struct file *, struct poll_table_struct *);
+	__poll_t (*poll) (struct file *, struct poll_table_struct *);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 5fd325df59e2..f608180ad59d 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -856,7 +856,7 @@ struct file_operations {
 	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
-	unsigned int (*poll) (struct file *, struct poll_table_struct *);
+	__poll_t (*poll) (struct file *, struct poll_table_struct *);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 06/33] fs: add new vfs_poll and file_can_poll helpers
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-05-23 19:19 ` [PATCH 05/33] fs: update documentation to mention __poll_t and match the code Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 07/33] fs: introduce new ->get_poll_head and ->poll_mask methods Christoph Hellwig
                   ` (27 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

These abstract out calls to the poll method in preparation for changes
in how we poll.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 drivers/staging/comedi/drivers/serial2002.c |  4 ++--
 drivers/vfio/virqfd.c                       |  2 +-
 drivers/vhost/vhost.c                       |  2 +-
 fs/eventpoll.c                              |  5 ++---
 fs/select.c                                 | 23 +++++++--------------
 include/linux/poll.h                        | 12 +++++++++++
 mm/memcontrol.c                             |  2 +-
 net/9p/trans_fd.c                           | 18 ++++------------
 virt/kvm/eventfd.c                          |  2 +-
 9 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/comedi/drivers/serial2002.c b/drivers/staging/comedi/drivers/serial2002.c
index b3f3b4a201af..5471b2212a62 100644
--- a/drivers/staging/comedi/drivers/serial2002.c
+++ b/drivers/staging/comedi/drivers/serial2002.c
@@ -113,7 +113,7 @@ static void serial2002_tty_read_poll_wait(struct file *f, int timeout)
 		long elapsed;
 		__poll_t mask;
 
-		mask = f->f_op->poll(f, &table.pt);
+		mask = vfs_poll(f, &table.pt);
 		if (mask & (EPOLLRDNORM | EPOLLRDBAND | EPOLLIN |
 			    EPOLLHUP | EPOLLERR)) {
 			break;
@@ -136,7 +136,7 @@ static int serial2002_tty_read(struct file *f, int timeout)
 
 	result = -1;
 	if (!IS_ERR(f)) {
-		if (f->f_op->poll) {
+		if (file_can_poll(f)) {
 			serial2002_tty_read_poll_wait(f, timeout);
 
 			if (kernel_read(f, &ch, 1, &pos) == 1)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 085700f1be10..2a1be859ee71 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -166,7 +166,7 @@ int vfio_virqfd_enable(void *opaque,
 	init_waitqueue_func_entry(&virqfd->wait, virqfd_wakeup);
 	init_poll_funcptr(&virqfd->pt, virqfd_ptable_queue_proc);
 
-	events = irqfd.file->f_op->poll(irqfd.file, &virqfd->pt);
+	events = vfs_poll(irqfd.file, &virqfd->pt);
 
 	/*
 	 * Check if there was an event already pending on the eventfd
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f3bd8e941224..f6022881f147 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -208,7 +208,7 @@ int vhost_poll_start(struct vhost_poll *poll, struct file *file)
 	if (poll->wqh)
 		return 0;
 
-	mask = file->f_op->poll(file, &poll->table);
+	mask = vfs_poll(file, &poll->table);
 	if (mask)
 		vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
 	if (mask & EPOLLERR) {
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 602ca4285b2e..67db22fe99c5 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -884,8 +884,7 @@ static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt,
 
 	pt->_key = epi->event.events;
 	if (!is_file_epoll(epi->ffd.file))
-		return epi->ffd.file->f_op->poll(epi->ffd.file, pt) &
-		       epi->event.events;
+		return vfs_poll(epi->ffd.file, pt) & epi->event.events;
 
 	ep = epi->ffd.file->private_data;
 	poll_wait(epi->ffd.file, &ep->poll_wait, pt);
@@ -2025,7 +2024,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 
 	/* The target file descriptor must support poll */
 	error = -EPERM;
-	if (!tf.file->f_op->poll)
+	if (!file_can_poll(tf.file))
 		goto error_tgt_fput;
 
 	/* Check if EPOLLWAKEUP is allowed */
diff --git a/fs/select.c b/fs/select.c
index 25da26253485..e30def680b2e 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -502,14 +502,10 @@ static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
 					continue;
 				f = fdget(i);
 				if (f.file) {
-					const struct file_operations *f_op;
-					f_op = f.file->f_op;
-					mask = DEFAULT_POLLMASK;
-					if (f_op->poll) {
-						wait_key_set(wait, in, out,
-							     bit, busy_flag);
-						mask = (*f_op->poll)(f.file, wait);
-					}
+					wait_key_set(wait, in, out, bit,
+						     busy_flag);
+					mask = vfs_poll(f.file, wait);
+
 					fdput(f);
 					if ((mask & POLLIN_SET) && (in & bit)) {
 						res_in |= bit;
@@ -825,13 +821,10 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 
 	/* userland u16 ->events contains POLL... bitmap */
 	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
-	mask = DEFAULT_POLLMASK;
-	if (f.file->f_op->poll) {
-		pwait->_key = filter | busy_flag;
-		mask = f.file->f_op->poll(f.file, pwait);
-		if (mask & busy_flag)
-			*can_busy_poll = true;
-	}
+	pwait->_key = filter | busy_flag;
+	mask = vfs_poll(f.file, pwait);
+	if (mask & busy_flag)
+		*can_busy_poll = true;
 	mask &= filter;		/* Mask out unneeded events. */
 	fdput(f);
 
diff --git a/include/linux/poll.h b/include/linux/poll.h
index a3576da63377..7e0fdcf905d2 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -74,6 +74,18 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 	pt->_key   = ~(__poll_t)0; /* all events enabled */
 }
 
+static inline bool file_can_poll(struct file *file)
+{
+	return file->f_op->poll;
+}
+
+static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+{
+	if (unlikely(!file->f_op->poll))
+		return DEFAULT_POLLMASK;
+	return file->f_op->poll(file, pt);
+}
+
 struct poll_table_entry {
 	struct file *filp;
 	__poll_t key;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2bd3df3d101a..1695f38630f1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3849,7 +3849,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	if (ret)
 		goto out_put_css;
 
-	efile.file->f_op->poll(efile.file, &event->pt);
+	vfs_poll(efile.file, &event->pt);
 
 	spin_lock(&memcg->event_list_lock);
 	list_add(&event->list, &memcg->event_list);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 848969fe7979..588bf88c3305 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -231,7 +231,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
 static __poll_t
 p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
 {
-	__poll_t ret, n;
+	__poll_t ret;
 	struct p9_trans_fd *ts = NULL;
 
 	if (client && client->status == Connected)
@@ -243,19 +243,9 @@ p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
 		return EPOLLERR;
 	}
 
-	if (!ts->rd->f_op->poll)
-		ret = DEFAULT_POLLMASK;
-	else
-		ret = ts->rd->f_op->poll(ts->rd, pt);
-
-	if (ts->rd != ts->wr) {
-		if (!ts->wr->f_op->poll)
-			n = DEFAULT_POLLMASK;
-		else
-			n = ts->wr->f_op->poll(ts->wr, pt);
-		ret = (ret & ~EPOLLOUT) | (n & ~EPOLLIN);
-	}
-
+	ret = vfs_poll(ts->rd, pt);
+	if (ts->rd != ts->wr)
+		ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
 	return ret;
 }
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 6e865e8b5b10..90d30fbe95ae 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -397,7 +397,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	 * Check if there was an event already pending on the eventfd
 	 * before we registered, and trigger it as if we didn't miss it.
 	 */
-	events = f.file->f_op->poll(f.file, &irqfd->pt);
+	events = vfs_poll(f.file, &irqfd->pt);
 
 	if (events & EPOLLIN)
 		schedule_work(&irqfd->inject);
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 07/33] fs: introduce new ->get_poll_head and ->poll_mask methods
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-05-23 19:19 ` [PATCH 06/33] fs: add new vfs_poll and file_can_poll helpers Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 08/33] aio: simplify KIOCB_KEY handling Christoph Hellwig
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

->get_poll_head returns the waitqueue that the poll operation is going
to sleep on.  Note that this means we can only use a single waitqueue
for the poll, unlike some current drivers that use two waitqueues for
different events.  But now that we have keyed wakeups and heavily use
those for poll there aren't that many good reason left to keep the
multiple waitqueues, and if there are any ->poll is still around, the
driver just won't support aio poll.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 Documentation/filesystems/Locking |  7 ++++++-
 Documentation/filesystems/vfs.txt | 13 +++++++++++++
 fs/select.c                       | 23 +++++++++++++++++++++++
 include/linux/fs.h                |  2 ++
 include/linux/poll.h              | 12 ++++++------
 5 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 220bba28f72b..6d227f9d7bd9 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -440,6 +440,8 @@ prototypes:
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
+	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
@@ -470,7 +472,7 @@ prototypes:
 };
 
 locking rules:
-	All may block.
+	All except for ->poll_mask may block.
 
 ->llseek() locking has moved from llseek to the individual llseek
 implementations.  If your fs is not using generic_file_llseek, you
@@ -498,6 +500,9 @@ in sys_read() and friends.
 the lease within the individual filesystem to record the result of the
 operation
 
+->poll_mask can be called with or without the waitqueue lock for the waitqueue
+returned from ->get_poll_head.
+
 --------------------------- dquot_operations -------------------------------
 prototypes:
 	int (*write_dquot) (struct dquot *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index f608180ad59d..829a7b7857a4 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -857,6 +857,8 @@ struct file_operations {
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
+	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
@@ -901,6 +903,17 @@ otherwise noted.
 	activity on this file and (optionally) go to sleep until there
 	is activity. Called by the select(2) and poll(2) system calls
 
+  get_poll_head: Returns the struct wait_queue_head that callers can
+  wait on.  Callers need to check the returned events using ->poll_mask
+  once woken.  Can return NULL to indicate polling is not supported,
+  or any error code using the ERR_PTR convention to indicate that a
+  grave error occured and ->poll_mask shall not be called.
+
+  poll_mask: return the mask of EPOLL* values describing the file descriptor
+  state.  Called either before going to sleep on the waitqueue returned by
+  get_poll_head, or after it has been woken.  If ->get_poll_head and
+  ->poll_mask are implemented ->poll does not need to be implement.
+
   unlocked_ioctl: called by the ioctl(2) system call.
 
   compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
diff --git a/fs/select.c b/fs/select.c
index e30def680b2e..bc3cc0f98896 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -34,6 +34,29 @@
 
 #include <linux/uaccess.h>
 
+__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+{
+	if (file->f_op->poll) {
+		return file->f_op->poll(file, pt);
+	} else if (file_has_poll_mask(file)) {
+		unsigned int events = poll_requested_events(pt);
+		struct wait_queue_head *head;
+
+		if (pt && pt->_qproc) {
+			head = file->f_op->get_poll_head(file, events);
+			if (!head)
+				return DEFAULT_POLLMASK;
+			if (IS_ERR(head))
+				return EPOLLERR;
+			pt->_qproc(file, head, pt);
+		}
+
+		return file->f_op->poll_mask(file, events);
+	} else {
+		return DEFAULT_POLLMASK;
+	}
+}
+EXPORT_SYMBOL_GPL(vfs_poll);
 
 /*
  * Estimate expected accuracy in ns from a timeval.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7f07977bdfd7..d467bd7b35b7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1711,6 +1711,8 @@ struct file_operations {
 	int (*iterate) (struct file *, struct dir_context *);
 	int (*iterate_shared) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
+	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 7e0fdcf905d2..fdf86b4cbc71 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -74,18 +74,18 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 	pt->_key   = ~(__poll_t)0; /* all events enabled */
 }
 
-static inline bool file_can_poll(struct file *file)
+static inline bool file_has_poll_mask(struct file *file)
 {
-	return file->f_op->poll;
+	return file->f_op->get_poll_head && file->f_op->poll_mask;
 }
 
-static inline __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+static inline bool file_can_poll(struct file *file)
 {
-	if (unlikely(!file->f_op->poll))
-		return DEFAULT_POLLMASK;
-	return file->f_op->poll(file, pt);
+	return file->f_op->poll || file_has_poll_mask(file);
 }
 
+__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt);
+
 struct poll_table_entry {
 	struct file *filp;
 	__poll_t key;
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 08/33] aio: simplify KIOCB_KEY handling
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-05-23 19:19 ` [PATCH 07/33] fs: introduce new ->get_poll_head and ->poll_mask methods Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 09/33] aio: simplify cancellation Christoph Hellwig
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

No need to pass the key field to lookup_iocb to compare it with KIOCB_KEY,
as we can do that right after retrieving it from userspace.  Also move the
KIOCB_KEY definition to aio.c as it is an internal value not used by any
other place in the kernel.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c            | 14 +++++++-------
 include/linux/aio.h |  2 --
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 1c383bb44b2d..50a90e5581ed 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -46,6 +46,8 @@
 
 #include "internal.h"
 
+#define KIOCB_KEY		0
+
 #define AIO_RING_MAGIC			0xa10a10a1
 #define AIO_RING_COMPAT_FEATURES	1
 #define AIO_RING_INCOMPAT_FEATURES	0
@@ -1811,15 +1813,12 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
  *	Finds a given iocb for cancellation.
  */
 static struct aio_kiocb *
-lookup_kiocb(struct kioctx *ctx, struct iocb __user *iocb, u32 key)
+lookup_kiocb(struct kioctx *ctx, struct iocb __user *iocb)
 {
 	struct aio_kiocb *kiocb;
 
 	assert_spin_locked(&ctx->ctx_lock);
 
-	if (key != KIOCB_KEY)
-		return NULL;
-
 	/* TODO: use a hash or array, this sucks. */
 	list_for_each_entry(kiocb, &ctx->active_reqs, ki_list) {
 		if (kiocb->ki_user_iocb == iocb)
@@ -1846,9 +1845,10 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 	u32 key;
 	int ret;
 
-	ret = get_user(key, &iocb->aio_key);
-	if (unlikely(ret))
+	if (unlikely(get_user(key, &iocb->aio_key)))
 		return -EFAULT;
+	if (unlikely(key != KIOCB_KEY))
+		return -EINVAL;
 
 	ctx = lookup_ioctx(ctx_id);
 	if (unlikely(!ctx))
@@ -1856,7 +1856,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 
 	spin_lock_irq(&ctx->ctx_lock);
 
-	kiocb = lookup_kiocb(ctx, iocb, key);
+	kiocb = lookup_kiocb(ctx, iocb);
 	if (kiocb)
 		ret = kiocb_cancel(kiocb);
 	else
diff --git a/include/linux/aio.h b/include/linux/aio.h
index 9d8aabecfe2d..b83e68dd006f 100644
--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -8,8 +8,6 @@ struct kioctx;
 struct kiocb;
 struct mm_struct;
 
-#define KIOCB_KEY		0
-
 typedef int (kiocb_cancel_fn)(struct kiocb *);
 
 /* prototypes */
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 09/33] aio: simplify cancellation
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-05-23 19:19 ` [PATCH 08/33] aio: simplify KIOCB_KEY handling Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:19 ` [PATCH 10/33] aio: implement IOCB_CMD_POLL Christoph Hellwig
                   ` (24 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

With the current aio code there is no need for the magic KIOCB_CANCELLED
value, as a cancelation just kicks the driver to queue the completion
ASAP, with all actual completion handling done in another thread. Given
that both the completion path and cancelation take the context lock there
is no need for magic cmpxchg loops either.  If we remove iocbs from the
active list after calling ->ki_cancel (but with ctx_lock still held), we
can also rely on the invariant thay anything found on the list has a
->ki_cancel callback and can be cancelled, further simplifing the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c | 48 ++++++------------------------------------------
 1 file changed, 6 insertions(+), 42 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 50a90e5581ed..0633cf3b325c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -164,19 +164,6 @@ struct fsync_iocb {
 	bool			datasync;
 };
 
-/*
- * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
- * cancelled or completed (this makes a certain amount of sense because
- * successful cancellation - io_cancel() - does deliver the completion to
- * userspace).
- *
- * And since most things don't implement kiocb cancellation and we'd really like
- * kiocb completion to be lockless when possible, we use ki_cancel to
- * synchronize cancellation and completion - we only set it to KIOCB_CANCELLED
- * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
- */
-#define KIOCB_CANCELLED		((void *) (~0ULL))
-
 struct aio_kiocb {
 	union {
 		struct kiocb		rw;
@@ -574,27 +561,6 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
-static int kiocb_cancel(struct aio_kiocb *kiocb)
-{
-	kiocb_cancel_fn *old, *cancel;
-
-	/*
-	 * Don't want to set kiocb->ki_cancel = KIOCB_CANCELLED unless it
-	 * actually has a cancel function, hence the cmpxchg()
-	 */
-
-	cancel = READ_ONCE(kiocb->ki_cancel);
-	do {
-		if (!cancel || cancel == KIOCB_CANCELLED)
-			return -EINVAL;
-
-		old = cancel;
-		cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
-	} while (cancel != old);
-
-	return cancel(&kiocb->rw);
-}
-
 /*
  * free_ioctx() should be RCU delayed to synchronize against the RCU
  * protected lookup_ioctx() and also needs process context to call
@@ -641,7 +607,7 @@ static void free_ioctx_users(struct percpu_ref *ref)
 	while (!list_empty(&ctx->active_reqs)) {
 		req = list_first_entry(&ctx->active_reqs,
 				       struct aio_kiocb, ki_list);
-		kiocb_cancel(req);
+		req->ki_cancel(&req->rw);
 		list_del_init(&req->ki_list);
 	}
 
@@ -1842,8 +1808,8 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 {
 	struct kioctx *ctx;
 	struct aio_kiocb *kiocb;
+	int ret = -EINVAL;
 	u32 key;
-	int ret;
 
 	if (unlikely(get_user(key, &iocb->aio_key)))
 		return -EFAULT;
@@ -1855,13 +1821,11 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 		return -EINVAL;
 
 	spin_lock_irq(&ctx->ctx_lock);
-
 	kiocb = lookup_kiocb(ctx, iocb);
-	if (kiocb)
-		ret = kiocb_cancel(kiocb);
-	else
-		ret = -EINVAL;
-
+	if (kiocb) {
+		ret = kiocb->ki_cancel(&kiocb->rw);
+		list_del_init(&kiocb->ki_list);
+	}
 	spin_unlock_irq(&ctx->ctx_lock);
 
 	if (!ret) {
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 10/33] aio: implement IOCB_CMD_POLL
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2018-05-23 19:19 ` [PATCH 09/33] aio: simplify cancellation Christoph Hellwig
@ 2018-05-23 19:19 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 11/33] aio: try to complete poll iocbs without context switch Christoph Hellwig
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:19 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Simple one-shot poll through the io_submit() interface.  To poll for
a file descriptor the application should submit an iocb of type
IOCB_CMD_POLL.  It will poll the fd for the events specified in the
the first 32 bits of the aio_buf field of the iocb.

Unlike poll or epoll without EPOLLONESHOT this interface always works
in one shot mode, that is once the iocb is completed, it will have to be
resubmitted.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/aio.c                     | 134 ++++++++++++++++++++++++++++++++++-
 include/uapi/linux/aio_abi.h |   6 +-
 2 files changed, 135 insertions(+), 5 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 0633cf3b325c..bd711cfa4a1f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -5,6 +5,7 @@
  *	Implements an efficient asynchronous io interface.
  *
  *	Copyright 2000, 2001, 2002 Red Hat, Inc.  All Rights Reserved.
+ *	Copyright 2018 Christoph Hellwig.
  *
  *	See ../COPYING for licensing terms.
  */
@@ -164,10 +165,22 @@ struct fsync_iocb {
 	bool			datasync;
 };
 
+struct poll_iocb {
+	struct file		*file;
+	__poll_t		events;
+	struct wait_queue_head	*head;
+
+	union {
+		struct wait_queue_entry	wait;
+		struct work_struct	work;
+	};
+};
+
 struct aio_kiocb {
 	union {
 		struct kiocb		rw;
 		struct fsync_iocb	fsync;
+		struct poll_iocb	poll;
 	};
 
 	struct kioctx		*ki_ctx;
@@ -1558,7 +1571,6 @@ static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync)
 	if (unlikely(iocb->aio_buf || iocb->aio_offset || iocb->aio_nbytes ||
 			iocb->aio_rw_flags))
 		return -EINVAL;
-
 	req->file = fget(iocb->aio_fildes);
 	if (unlikely(!req->file))
 		return -EBADF;
@@ -1573,6 +1585,124 @@ static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync)
 	return -EIOCBQUEUED;
 }
 
+/* need to use list_del_init so we can check if item was present */
+static inline bool __aio_poll_remove(struct poll_iocb *req)
+{
+	if (list_empty(&req->wait.entry))
+		return false;
+	list_del_init(&req->wait.entry);
+	return true;
+}
+
+static inline void __aio_poll_complete(struct poll_iocb *req, __poll_t mask)
+{
+	struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
+	struct file *file = req->file;
+
+	aio_complete(iocb, mangle_poll(mask), 0);
+	fput(file);
+}
+
+static void aio_poll_work(struct work_struct *work)
+{
+	struct poll_iocb *req = container_of(work, struct poll_iocb, work);
+
+	__aio_poll_complete(req, req->events);
+}
+
+static int aio_poll_cancel(struct kiocb *iocb)
+{
+	struct aio_kiocb *aiocb = container_of(iocb, struct aio_kiocb, rw);
+	struct poll_iocb *req = &aiocb->poll;
+	struct wait_queue_head *head = req->head;
+	bool found = false;
+
+	spin_lock(&head->lock);
+	found = __aio_poll_remove(req);
+	spin_unlock(&head->lock);
+
+	if (found) {
+		req->events = 0;
+		INIT_WORK(&req->work, aio_poll_work);
+		schedule_work(&req->work);
+	}
+	return 0;
+}
+
+static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
+		void *key)
+{
+	struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
+	struct file *file = req->file;
+	__poll_t mask = key_to_poll(key);
+
+	assert_spin_locked(&req->head->lock);
+
+	/* for instances that support it check for an event match first: */
+	if (mask && !(mask & req->events))
+		return 0;
+
+	mask = file->f_op->poll_mask(file, req->events);
+	if (!mask)
+		return 0;
+
+	__aio_poll_remove(req);
+
+	req->events = mask;
+	INIT_WORK(&req->work, aio_poll_work);
+	schedule_work(&req->work);
+	return 1;
+}
+
+static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
+{
+	struct kioctx *ctx = aiocb->ki_ctx;
+	struct poll_iocb *req = &aiocb->poll;
+	__poll_t mask;
+
+	/* reject any unknown events outside the normal event mask. */
+	if ((u16)iocb->aio_buf != iocb->aio_buf)
+		return -EINVAL;
+	/* reject fields that are not defined for poll */
+	if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
+		return -EINVAL;
+
+	req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP;
+	req->file = fget(iocb->aio_fildes);
+	if (unlikely(!req->file))
+		return -EBADF;
+	if (!file_has_poll_mask(req->file))
+		goto out_fail;
+
+	req->head = req->file->f_op->get_poll_head(req->file, req->events);
+	if (!req->head)
+		goto out_fail;
+	if (IS_ERR(req->head)) {
+		mask = EPOLLERR;
+		goto done;
+	}
+
+	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
+	aiocb->ki_cancel = aio_poll_cancel;
+
+	spin_lock_irq(&ctx->ctx_lock);
+	spin_lock(&req->head->lock);
+	mask = req->file->f_op->poll_mask(req->file, req->events);
+	if (!mask) {
+		__add_wait_queue(req->head, &req->wait);
+		list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
+	}
+	spin_unlock(&req->head->lock);
+	spin_unlock_irq(&ctx->ctx_lock);
+done:
+	if (mask)
+		__aio_poll_complete(req, mask);
+	return -EIOCBQUEUED;
+out_fail:
+	fput(req->file);
+	return -EINVAL; /* same as no support for IOCB_CMD_POLL */
+}
+
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 struct iocb *iocb, bool compat)
 {
@@ -1641,6 +1771,8 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 		break;
 	case IOCB_CMD_FDSYNC:
 		ret = aio_fsync(&req->fsync, iocb, true);
+	case IOCB_CMD_POLL:
+		ret = aio_poll(req, iocb);
 		break;
 	default:
 		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 2c0a3415beee..ed0185945bb2 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -39,10 +39,8 @@ enum {
 	IOCB_CMD_PWRITE = 1,
 	IOCB_CMD_FSYNC = 2,
 	IOCB_CMD_FDSYNC = 3,
-	/* These two are experimental.
-	 * IOCB_CMD_PREADX = 4,
-	 * IOCB_CMD_POLL = 5,
-	 */
+	/* 4 was the experimental IOCB_CMD_PREADX */
+	IOCB_CMD_POLL = 5,
 	IOCB_CMD_NOOP = 6,
 	IOCB_CMD_PREADV = 7,
 	IOCB_CMD_PWRITEV = 8,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 11/33] aio: try to complete poll iocbs without context switch
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2018-05-23 19:19 ` [PATCH 10/33] aio: implement IOCB_CMD_POLL Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 12/33] net: refactor socket_poll Christoph Hellwig
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

If we can acquire ctx_lock without spinning we can just remove our
iocb from the active_reqs list, and thus complete the iocbs from the
wakeup context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index bd711cfa4a1f..8274d09d44a2 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1633,6 +1633,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 		void *key)
 {
 	struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
+	struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
 	struct file *file = req->file;
 	__poll_t mask = key_to_poll(key);
 
@@ -1648,9 +1649,22 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 
 	__aio_poll_remove(req);
 
-	req->events = mask;
-	INIT_WORK(&req->work, aio_poll_work);
-	schedule_work(&req->work);
+	/*
+	 * Try completing without a context switch if we can acquire ctx_lock
+	 * without spinning.  Otherwise we need to defer to a workqueue to
+	 * avoid a deadlock due to the lock order.
+	 */
+	if (spin_trylock(&iocb->ki_ctx->ctx_lock)) {
+		list_del_init(&iocb->ki_list);
+		spin_unlock(&iocb->ki_ctx->ctx_lock);
+
+		__aio_poll_complete(req, mask);
+	} else {
+		req->events = mask;
+		INIT_WORK(&req->work, aio_poll_work);
+		schedule_work(&req->work);
+	}
+
 	return 1;
 }
 
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 12/33] net: refactor socket_poll
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 11/33] aio: try to complete poll iocbs without context switch Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 13/33] net: add support for ->poll_mask in proto_ops Christoph Hellwig
                   ` (21 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Factor out two busy poll related helpers for late reuse, and remove
a command that isn't very helpful, especially with the __poll_t
annotations in place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/busy_poll.h | 15 +++++++++++++++
 net/socket.c            | 21 ++++-----------------
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 71c72a939bf8..c5187438af38 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -121,6 +121,21 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
 #endif
 }
 
+static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
+{
+	if (sk_can_busy_loop(sock->sk) &&
+	    events && (events & POLL_BUSY_LOOP)) {
+		/* once, only if requested by syscall */
+		sk_busy_loop(sock->sk, 1);
+	}
+}
+
+/* if this socket can poll_ll, tell the system call */
+static inline __poll_t sock_poll_busy_flag(struct socket *sock)
+{
+	return sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0;
+}
+
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
 				    struct napi_struct *napi)
diff --git a/net/socket.c b/net/socket.c
index f10f1d947c78..571ee4005192 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1117,24 +1117,11 @@ EXPORT_SYMBOL(sock_create_lite);
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
-	__poll_t busy_flag = 0;
-	struct socket *sock;
-
-	/*
-	 *      We can't return errors to poll, so it's either yes or no.
-	 */
-	sock = file->private_data;
-
-	if (sk_can_busy_loop(sock->sk)) {
-		/* this socket can poll_ll so tell the system call */
-		busy_flag = POLL_BUSY_LOOP;
-
-		/* once, only if requested by syscall */
-		if (wait && (wait->_key & POLL_BUSY_LOOP))
-			sk_busy_loop(sock->sk, 1);
-	}
+	struct socket *sock = file->private_data;
+	__poll_t events = poll_requested_events(wait);
 
-	return busy_flag | sock->ops->poll(file, sock, wait);
+	sock_poll_busy_loop(sock, events);
+	return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 13/33] net: add support for ->poll_mask in proto_ops
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 12/33] net: refactor socket_poll Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 14/33] net: remove sock_no_poll Christoph Hellwig
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The socket file operations still implement ->poll until all protocols are
switched over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/net.h |  1 +
 net/socket.c        | 48 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 2248a052061d..3fd9d8c16581 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -147,6 +147,7 @@ struct proto_ops {
 	int		(*getname)   (struct socket *sock,
 				      struct sockaddr *addr,
 				      int peer);
+	__poll_t	(*poll_mask) (struct socket *sock, __poll_t events);
 	__poll_t	(*poll)	     (struct file *file, struct socket *sock,
 				      struct poll_table_struct *wait);
 	int		(*ioctl)     (struct socket *sock, unsigned int cmd,
diff --git a/net/socket.c b/net/socket.c
index 571ee4005192..2d752e9eb3f9 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -117,8 +117,10 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from);
 static int sock_mmap(struct file *file, struct vm_area_struct *vma);
 
 static int sock_close(struct inode *inode, struct file *file);
-static __poll_t sock_poll(struct file *file,
-			      struct poll_table_struct *wait);
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+		__poll_t events);
+static __poll_t sock_poll_mask(struct file *file, __poll_t);
+static __poll_t sock_poll(struct file *file, struct poll_table_struct *wait);
 static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 #ifdef CONFIG_COMPAT
 static long compat_sock_ioctl(struct file *file,
@@ -141,6 +143,8 @@ static const struct file_operations socket_file_ops = {
 	.llseek =	no_llseek,
 	.read_iter =	sock_read_iter,
 	.write_iter =	sock_write_iter,
+	.get_poll_head = sock_get_poll_head,
+	.poll_mask =	sock_poll_mask,
 	.poll =		sock_poll,
 	.unlocked_ioctl = sock_ioctl,
 #ifdef CONFIG_COMPAT
@@ -1114,14 +1118,48 @@ int sock_create_lite(int family, int type, int protocol, struct socket **res)
 }
 EXPORT_SYMBOL(sock_create_lite);
 
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+		__poll_t events)
+{
+	struct socket *sock = file->private_data;
+
+	if (!sock->ops->poll_mask)
+		return NULL;
+	sock_poll_busy_loop(sock, events);
+	return sk_sleep(sock->sk);
+}
+
+static __poll_t sock_poll_mask(struct file *file, __poll_t events)
+{
+	struct socket *sock = file->private_data;
+
+	/*
+	 * We need to be sure we are in sync with the socket flags modification.
+	 *
+	 * This memory barrier is paired in the wq_has_sleeper.
+	 */
+	smp_mb();
+
+	/* this socket can poll_ll so tell the system call */
+	return sock->ops->poll_mask(sock, events) |
+		(sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0);
+}
+
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
 	struct socket *sock = file->private_data;
-	__poll_t events = poll_requested_events(wait);
+	__poll_t events = poll_requested_events(wait), mask = 0;
 
-	sock_poll_busy_loop(sock, events);
-	return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
+	if (sock->ops->poll) {
+		sock_poll_busy_loop(sock, events);
+		mask = sock->ops->poll(file, sock, wait);
+	} else if (sock->ops->poll_mask) {
+		sock_poll_wait(file, sock_get_poll_head(file, events), wait);
+		mask = sock->ops->poll_mask(sock, events);
+	}
+
+	return mask | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 14/33] net: remove sock_no_poll
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 13/33] net: add support for ->poll_mask in proto_ops Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 15/33] net/tcp: convert to ->poll_mask Christoph Hellwig
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Now that sock_poll handles a NULL ->poll or ->poll_mask there is no need
for a stub.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 crypto/af_alg.c             | 1 -
 crypto/algif_hash.c         | 2 --
 crypto/algif_rng.c          | 1 -
 drivers/isdn/mISDN/socket.c | 1 -
 drivers/net/ppp/pptp.c      | 1 -
 include/net/sock.h          | 2 --
 net/bluetooth/bnep/sock.c   | 1 -
 net/bluetooth/cmtp/sock.c   | 1 -
 net/bluetooth/hidp/sock.c   | 1 -
 net/core/sock.c             | 6 ------
 10 files changed, 17 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 7846c0c20cfe..80838c1cef94 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -347,7 +347,6 @@ static const struct proto_ops alg_proto_ops = {
 	.sendpage	=	sock_no_sendpage,
 	.sendmsg	=	sock_no_sendmsg,
 	.recvmsg	=	sock_no_recvmsg,
-	.poll		=	sock_no_poll,
 
 	.bind		=	alg_bind,
 	.release	=	af_alg_release,
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 6c9b1927a520..bfcf595fd8f9 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -288,7 +288,6 @@ static struct proto_ops algif_hash_ops = {
 	.mmap		=	sock_no_mmap,
 	.bind		=	sock_no_bind,
 	.setsockopt	=	sock_no_setsockopt,
-	.poll		=	sock_no_poll,
 
 	.release	=	af_alg_release,
 	.sendmsg	=	hash_sendmsg,
@@ -396,7 +395,6 @@ static struct proto_ops algif_hash_ops_nokey = {
 	.mmap		=	sock_no_mmap,
 	.bind		=	sock_no_bind,
 	.setsockopt	=	sock_no_setsockopt,
-	.poll		=	sock_no_poll,
 
 	.release	=	af_alg_release,
 	.sendmsg	=	hash_sendmsg_nokey,
diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
index 150c2b6480ed..22df3799a17b 100644
--- a/crypto/algif_rng.c
+++ b/crypto/algif_rng.c
@@ -106,7 +106,6 @@ static struct proto_ops algif_rng_ops = {
 	.bind		=	sock_no_bind,
 	.accept		=	sock_no_accept,
 	.setsockopt	=	sock_no_setsockopt,
-	.poll		=	sock_no_poll,
 	.sendmsg	=	sock_no_sendmsg,
 	.sendpage	=	sock_no_sendpage,
 
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 1f8f489b4167..18c0a1281914 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -745,7 +745,6 @@ static const struct proto_ops base_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index c4267ecefd85..157b67c1bf8e 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -624,7 +624,6 @@ static const struct proto_ops pptp_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept     = sock_no_accept,
 	.getname    = pptp_getname,
-	.poll       = sock_no_poll,
 	.listen     = sock_no_listen,
 	.shutdown   = sock_no_shutdown,
 	.setsockopt = sock_no_setsockopt,
diff --git a/include/net/sock.h b/include/net/sock.h
index 74d725fdbe0f..4d2e8ad98985 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1591,8 +1591,6 @@ int sock_no_connect(struct socket *, struct sockaddr *, int, int);
 int sock_no_socketpair(struct socket *, struct socket *);
 int sock_no_accept(struct socket *, struct socket *, int, bool);
 int sock_no_getname(struct socket *, struct sockaddr *, int);
-__poll_t sock_no_poll(struct file *, struct socket *,
-			  struct poll_table_struct *);
 int sock_no_ioctl(struct socket *, unsigned int, unsigned long);
 int sock_no_listen(struct socket *, int);
 int sock_no_shutdown(struct socket *, int);
diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c
index b5116fa9835e..00deacdcb51c 100644
--- a/net/bluetooth/bnep/sock.c
+++ b/net/bluetooth/bnep/sock.c
@@ -175,7 +175,6 @@ static const struct proto_ops bnep_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index ce86a7bae844..e08f28fadd65 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -178,7 +178,6 @@ static const struct proto_ops cmtp_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c
index 008ba439bd62..1eaac01f85de 100644
--- a/net/bluetooth/hidp/sock.c
+++ b/net/bluetooth/hidp/sock.c
@@ -208,7 +208,6 @@ static const struct proto_ops hidp_sock_ops = {
 	.getname	= sock_no_getname,
 	.sendmsg	= sock_no_sendmsg,
 	.recvmsg	= sock_no_recvmsg,
-	.poll		= sock_no_poll,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/net/core/sock.c b/net/core/sock.c
index 835a22f94bc5..b542c6a84165 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2567,12 +2567,6 @@ int sock_no_getname(struct socket *sock, struct sockaddr *saddr,
 }
 EXPORT_SYMBOL(sock_no_getname);
 
-__poll_t sock_no_poll(struct file *file, struct socket *sock, poll_table *pt)
-{
-	return 0;
-}
-EXPORT_SYMBOL(sock_no_poll);
-
 int sock_no_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
 	return -EOPNOTSUPP;
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 15/33] net/tcp: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 14/33] net: remove sock_no_poll Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 16/33] net/unix: " Christoph Hellwig
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/tcp.h   |  3 +--
 net/ipv4/af_inet.c  |  2 +-
 net/ipv4/tcp.c      | 23 ++++++-----------------
 net/ipv6/af_inet6.c |  2 +-
 4 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 51dc7a26a2fa..f88f8a2cab0d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -388,8 +388,7 @@ bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst);
 void tcp_close(struct sock *sk, long timeout);
 void tcp_init_sock(struct sock *sk);
 void tcp_init_transfer(struct sock *sk, int bpf_op);
-__poll_t tcp_poll(struct file *file, struct socket *sock,
-		      struct poll_table_struct *wait);
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events);
 int tcp_getsockopt(struct sock *sk, int level, int optname,
 		   char __user *optval, int __user *optlen);
 int tcp_setsockopt(struct sock *sk, int level, int optname,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index eaed0367e669..116e3cd11515 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -986,7 +986,7 @@ const struct proto_ops inet_stream_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = inet_getname,
-	.poll		   = tcp_poll,
+	.poll_mask	   = tcp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = inet_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c9d00ef54dec..dec47e6789e7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -494,32 +494,21 @@ static inline bool tcp_stream_is_readable(const struct tcp_sock *tp,
 }
 
 /*
- *	Wait for a TCP event.
- *
- *	Note that we don't need to lock the socket, as the upper poll layers
- *	take care of normal races (between the test and the event) and we don't
- *	go look at any of the socket buffers directly.
+ * Socket is not locked. We are protected from async events by poll logic and
+ * correct handling of state changes made by other threads is impossible in
+ * any case.
  */
-__poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events)
 {
-	__poll_t mask;
 	struct sock *sk = sock->sk;
 	const struct tcp_sock *tp = tcp_sk(sk);
+	__poll_t mask = 0;
 	int state;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	state = inet_sk_state_load(sk);
 	if (state == TCP_LISTEN)
 		return inet_csk_listen_poll(sk);
 
-	/* Socket is not locked. We are protected from async events
-	 * by poll logic and correct handling of state changes
-	 * made by other threads is impossible in any case.
-	 */
-
-	mask = 0;
-
 	/*
 	 * EPOLLHUP is certainly not done right. But poll() doesn't
 	 * have a notion of HUP in just one direction, and for a
@@ -600,7 +589,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 
 	return mask;
 }
-EXPORT_SYMBOL(tcp_poll);
+EXPORT_SYMBOL(tcp_poll_mask);
 
 int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 8da0b513f188..57b85ea438e9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -571,7 +571,7 @@ const struct proto_ops inet6_stream_ops = {
 	.socketpair	   = sock_no_socketpair,	/* a do nothing	*/
 	.accept		   = inet_accept,		/* ok		*/
 	.getname	   = inet6_getname,
-	.poll		   = tcp_poll,			/* ok		*/
+	.poll_mask	   = tcp_poll_mask,		/* ok		*/
 	.ioctl		   = inet6_ioctl,		/* must change  */
 	.listen		   = inet_listen,		/* ok		*/
 	.shutdown	   = inet_shutdown,		/* ok		*/
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 16/33] net/unix: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 15/33] net/tcp: convert to ->poll_mask Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 17/33] net: convert datagram_poll users tp ->poll_mask Christoph Hellwig
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/unix/af_unix.c | 30 +++++++++++-------------------
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e5473c03d667..95b02a71fd47 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -638,9 +638,8 @@ static int unix_stream_connect(struct socket *, struct sockaddr *,
 static int unix_socketpair(struct socket *, struct socket *);
 static int unix_accept(struct socket *, struct socket *, int, bool);
 static int unix_getname(struct socket *, struct sockaddr *, int);
-static __poll_t unix_poll(struct file *, struct socket *, poll_table *);
-static __poll_t unix_dgram_poll(struct file *, struct socket *,
-				    poll_table *);
+static __poll_t unix_poll_mask(struct socket *, __poll_t);
+static __poll_t unix_dgram_poll_mask(struct socket *, __poll_t);
 static int unix_ioctl(struct socket *, unsigned int, unsigned long);
 static int unix_shutdown(struct socket *, int);
 static int unix_stream_sendmsg(struct socket *, struct msghdr *, size_t);
@@ -681,7 +680,7 @@ static const struct proto_ops unix_stream_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	unix_accept,
 	.getname =	unix_getname,
-	.poll =		unix_poll,
+	.poll_mask =	unix_poll_mask,
 	.ioctl =	unix_ioctl,
 	.listen =	unix_listen,
 	.shutdown =	unix_shutdown,
@@ -704,7 +703,7 @@ static const struct proto_ops unix_dgram_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	unix_getname,
-	.poll =		unix_dgram_poll,
+	.poll_mask =	unix_dgram_poll_mask,
 	.ioctl =	unix_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	unix_shutdown,
@@ -726,7 +725,7 @@ static const struct proto_ops unix_seqpacket_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	unix_accept,
 	.getname =	unix_getname,
-	.poll =		unix_dgram_poll,
+	.poll_mask =	unix_dgram_poll_mask,
 	.ioctl =	unix_ioctl,
 	.listen =	unix_listen,
 	.shutdown =	unix_shutdown,
@@ -2630,13 +2629,10 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 	return err;
 }
 
-static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wait)
+static __poll_t unix_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err)
@@ -2665,15 +2661,11 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
 	return mask;
 }
 
-static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
-				    poll_table *wait)
+static __poll_t unix_dgram_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk, *other;
-	unsigned int writable;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	int writable;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
@@ -2699,7 +2691,7 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
 	}
 
 	/* No write status requested, avoid expensive OUT tests. */
-	if (!(poll_requested_events(wait) & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
+	if (!(events & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
 		return mask;
 
 	writable = unix_writable(sk);
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 17/33] net: convert datagram_poll users tp ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 16/33] net/unix: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 18/33] net/dccp: convert to ->poll_mask Christoph Hellwig
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/isdn/mISDN/socket.c  |  2 +-
 drivers/net/ppp/pppoe.c      |  2 +-
 drivers/staging/ipx/af_ipx.c |  2 +-
 include/linux/skbuff.h       |  3 +--
 include/net/udp.h            |  2 +-
 net/appletalk/ddp.c          |  2 +-
 net/ax25/af_ax25.c           |  2 +-
 net/bluetooth/hci_sock.c     |  2 +-
 net/can/bcm.c                |  2 +-
 net/can/raw.c                |  2 +-
 net/core/datagram.c          | 13 ++++---------
 net/decnet/af_decnet.c       |  6 +++---
 net/ieee802154/socket.c      |  4 ++--
 net/ipv4/af_inet.c           |  6 +++---
 net/ipv4/udp.c               | 10 +++++-----
 net/ipv6/af_inet6.c          |  2 +-
 net/ipv6/raw.c               |  4 ++--
 net/kcm/kcmsock.c            | 10 +++++-----
 net/key/af_key.c             |  2 +-
 net/l2tp/l2tp_ip.c           |  2 +-
 net/l2tp/l2tp_ip6.c          |  2 +-
 net/l2tp/l2tp_ppp.c          |  2 +-
 net/llc/af_llc.c             |  2 +-
 net/netlink/af_netlink.c     |  2 +-
 net/netrom/af_netrom.c       |  2 +-
 net/nfc/rawsock.c            |  4 ++--
 net/packet/af_packet.c       |  9 ++++-----
 net/phonet/socket.c          |  2 +-
 net/qrtr/qrtr.c              |  2 +-
 net/rose/af_rose.c           |  2 +-
 net/x25/af_x25.c             |  2 +-
 31 files changed, 52 insertions(+), 59 deletions(-)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 18c0a1281914..98f90aadd141 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -588,7 +588,7 @@ static const struct proto_ops data_sock_ops = {
 	.getname	= data_sock_getname,
 	.sendmsg	= mISDN_sock_sendmsg,
 	.recvmsg	= mISDN_sock_recvmsg,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= data_sock_setsockopt,
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index ce61231e96ea..de51e8f70f44 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -1107,7 +1107,7 @@ static const struct proto_ops pppoe_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= pppoe_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c
index 5703dd176787..208b5c161631 100644
--- a/drivers/staging/ipx/af_ipx.c
+++ b/drivers/staging/ipx/af_ipx.c
@@ -1965,7 +1965,7 @@ static const struct proto_ops ipx_dgram_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= ipx_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= ipx_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ipx_compat_ioctl,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9065477ed255..89198379b39d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3250,8 +3250,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
 				    int *peeked, int *off, int *err);
 struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
 				  int *err);
-__poll_t datagram_poll(struct file *file, struct socket *sock,
-			   struct poll_table_struct *wait);
+__poll_t datagram_poll_mask(struct socket *sock, __poll_t events);
 int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
 			   struct iov_iter *to, int size);
 static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
diff --git a/include/net/udp.h b/include/net/udp.h
index 621778b80e3d..d8ca3b26964d 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -276,7 +276,7 @@ int udp_init_sock(struct sock *sk);
 int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 int __udp_disconnect(struct sock *sk, int flags);
 int udp_disconnect(struct sock *sk, int flags);
-__poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t udp_poll_mask(struct socket *sock, __poll_t events);
 struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 				       netdev_features_t features,
 				       bool is_ipv6);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 9b6bc5abe946..55fdba05d7d9 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1869,7 +1869,7 @@ static const struct proto_ops atalk_dgram_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= atalk_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= atalk_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= atalk_compat_ioctl,
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index c603d33d5410..d1d2442ce573 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1941,7 +1941,7 @@ static const struct proto_ops ax25_proto_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= ax25_accept,
 	.getname	= ax25_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= ax25_ioctl,
 	.listen		= ax25_listen,
 	.shutdown	= ax25_shutdown,
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 1506e1632394..d6c099861538 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -1975,7 +1975,7 @@ static const struct proto_ops hci_sock_ops = {
 	.sendmsg	= hci_sock_sendmsg,
 	.recvmsg	= hci_sock_recvmsg,
 	.ioctl		= hci_sock_ioctl,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= hci_sock_setsockopt,
diff --git a/net/can/bcm.c b/net/can/bcm.c
index 6ad89f49b341..97fedff3f0c4 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1657,7 +1657,7 @@ static const struct proto_ops bcm_ops = {
 	.socketpair    = sock_no_socketpair,
 	.accept        = sock_no_accept,
 	.getname       = sock_no_getname,
-	.poll          = datagram_poll,
+	.poll_mask     = datagram_poll_mask,
 	.ioctl         = can_ioctl,	/* use can_ioctl() from af_can.c */
 	.listen        = sock_no_listen,
 	.shutdown      = sock_no_shutdown,
diff --git a/net/can/raw.c b/net/can/raw.c
index 1051eee82581..fd7e2f49ea6a 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -843,7 +843,7 @@ static const struct proto_ops raw_ops = {
 	.socketpair    = sock_no_socketpair,
 	.accept        = sock_no_accept,
 	.getname       = raw_getname,
-	.poll          = datagram_poll,
+	.poll_mask     = datagram_poll_mask,
 	.ioctl         = can_ioctl,	/* use can_ioctl() from af_can.c */
 	.listen        = sock_no_listen,
 	.shutdown      = sock_no_shutdown,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 9938952c5c78..f19bf3dc2bd6 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -819,9 +819,8 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
 
 /**
  * 	datagram_poll - generic datagram poll
- *	@file: file struct
  *	@sock: socket
- *	@wait: poll table
+ *	@events to wait for
  *
  *	Datagram poll: Again totally generic. This also handles
  *	sequenced packet sockets providing the socket receive queue
@@ -831,14 +830,10 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
  *	and you use a different write policy from sock_writeable()
  *	then please supply your own write_space callback.
  */
-__poll_t datagram_poll(struct file *file, struct socket *sock,
-			   poll_table *wait)
+__poll_t datagram_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
@@ -871,4 +866,4 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
 
 	return mask;
 }
-EXPORT_SYMBOL(datagram_poll);
+EXPORT_SYMBOL(datagram_poll_mask);
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 7d6ff983ba2c..9a686d890bfa 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -1207,11 +1207,11 @@ static int dn_getname(struct socket *sock, struct sockaddr *uaddr,int peer)
 }
 
 
-static __poll_t dn_poll(struct file *file, struct socket *sock, poll_table  *wait)
+static __poll_t dn_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct dn_scp *scp = DN_SK(sk);
-	__poll_t mask = datagram_poll(file, sock, wait);
+	__poll_t mask = datagram_poll_mask(sock, events);
 
 	if (!skb_queue_empty(&scp->other_receive_queue))
 		mask |= EPOLLRDBAND;
@@ -2331,7 +2331,7 @@ static const struct proto_ops dn_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	dn_accept,
 	.getname =	dn_getname,
-	.poll =		dn_poll,
+	.poll_mask =	dn_poll_mask,
 	.ioctl =	dn_ioctl,
 	.listen =	dn_listen,
 	.shutdown =	dn_shutdown,
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index a60658c85a9a..a0768d2759b8 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -423,7 +423,7 @@ static const struct proto_ops ieee802154_raw_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = sock_no_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = ieee802154_sock_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = sock_no_shutdown,
@@ -969,7 +969,7 @@ static const struct proto_ops ieee802154_dgram_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = sock_no_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = ieee802154_sock_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = sock_no_shutdown,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 116e3cd11515..8a59428e63ab 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1018,7 +1018,7 @@ const struct proto_ops inet_dgram_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = inet_getname,
-	.poll		   = udp_poll,
+	.poll_mask	   = udp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
@@ -1039,7 +1039,7 @@ EXPORT_SYMBOL(inet_dgram_ops);
 
 /*
  * For SOCK_RAW sockets; should be the same as inet_dgram_ops but without
- * udp_poll
+ * udp_poll_mask
  */
 static const struct proto_ops inet_sockraw_ops = {
 	.family		   = PF_INET,
@@ -1050,7 +1050,7 @@ static const struct proto_ops inet_sockraw_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = inet_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 051a43ff3fb8..675433eb53a8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2501,7 +2501,7 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
  * 	udp_poll - wait for a UDP event.
  *	@file - file struct
  *	@sock - socket
- *	@wait - poll table
+ *	@events - events to wait for
  *
  *	This is same as datagram poll, except for the special case of
  *	blocking sockets. If application is using a blocking fd
@@ -2510,23 +2510,23 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
  *	but then block when reading it. Add special case code
  *	to work around these arguably broken applications.
  */
-__poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t udp_poll_mask(struct socket *sock, __poll_t events)
 {
-	__poll_t mask = datagram_poll(file, sock, wait);
+	__poll_t mask = datagram_poll_mask(sock, events);
 	struct sock *sk = sock->sk;
 
 	if (!skb_queue_empty(&udp_sk(sk)->reader_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Check for false positives due to checksum errors */
-	if ((mask & EPOLLRDNORM) && !(file->f_flags & O_NONBLOCK) &&
+	if ((mask & EPOLLRDNORM) && !(sock->file->f_flags & O_NONBLOCK) &&
 	    !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
 		mask &= ~(EPOLLIN | EPOLLRDNORM);
 
 	return mask;
 
 }
-EXPORT_SYMBOL(udp_poll);
+EXPORT_SYMBOL(udp_poll_mask);
 
 int udp_abort(struct sock *sk, int err)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 57b85ea438e9..d443c18b45fe 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -601,7 +601,7 @@ const struct proto_ops inet6_dgram_ops = {
 	.socketpair	   = sock_no_socketpair,	/* a do nothing	*/
 	.accept		   = sock_no_accept,		/* a do nothing	*/
 	.getname	   = inet6_getname,
-	.poll		   = udp_poll,			/* ok		*/
+	.poll_mask	   = udp_poll_mask,		/* ok		*/
 	.ioctl		   = inet6_ioctl,		/* must change  */
 	.listen		   = sock_no_listen,		/* ok		*/
 	.shutdown	   = inet_shutdown,		/* ok		*/
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index afc307c89d1a..ce6f0d15b5dd 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1334,7 +1334,7 @@ void raw6_proc_exit(void)
 }
 #endif	/* CONFIG_PROC_FS */
 
-/* Same as inet6_dgram_ops, sans udp_poll.  */
+/* Same as inet6_dgram_ops, sans udp_poll_mask.  */
 const struct proto_ops inet6_sockraw_ops = {
 	.family		   = PF_INET6,
 	.owner		   = THIS_MODULE,
@@ -1344,7 +1344,7 @@ const struct proto_ops inet6_sockraw_ops = {
 	.socketpair	   = sock_no_socketpair,	/* a do nothing	*/
 	.accept		   = sock_no_accept,		/* a do nothing	*/
 	.getname	   = inet6_getname,
-	.poll		   = datagram_poll,		/* ok		*/
+	.poll_mask	   = datagram_poll_mask,	/* ok		*/
 	.ioctl		   = inet6_ioctl,		/* must change  */
 	.listen		   = sock_no_listen,		/* ok		*/
 	.shutdown	   = inet_shutdown,		/* ok		*/
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index dc76bc346829..d67734c99027 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1336,9 +1336,9 @@ static void init_kcm_sock(struct kcm_sock *kcm, struct kcm_mux *mux)
 	struct list_head *head;
 	int index = 0;
 
-	/* For SOCK_SEQPACKET sock type, datagram_poll checks the sk_state, so
-	 * we set sk_state, otherwise epoll_wait always returns right away with
-	 * EPOLLHUP
+	/* For SOCK_SEQPACKET sock type, datagram_poll_mask checks the sk_state,
+	 * so  we set sk_state, otherwise epoll_wait always returns right away
+	 * with EPOLLHUP
 	 */
 	kcm->sk.sk_state = TCP_ESTABLISHED;
 
@@ -1903,7 +1903,7 @@ static const struct proto_ops kcm_dgram_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	sock_no_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	kcm_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
@@ -1924,7 +1924,7 @@ static const struct proto_ops kcm_seqpacket_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	sock_no_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	kcm_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 5e1d2946ffbf..8bdc1cbe490a 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3751,7 +3751,7 @@ static const struct proto_ops pfkey_ops = {
 
 	/* Now the operations that really occur. */
 	.release	=	pfkey_release,
-	.poll		=	datagram_poll,
+	.poll_mask	=	datagram_poll_mask,
 	.sendmsg	=	pfkey_sendmsg,
 	.recvmsg	=	pfkey_recvmsg,
 };
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index a9c05b2bc1b0..181073bf6925 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -613,7 +613,7 @@ static const struct proto_ops l2tp_ip_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = l2tp_ip_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 957369192ca1..336e4c00abbc 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -754,7 +754,7 @@ static const struct proto_ops l2tp_ip6_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = sock_no_accept,
 	.getname	   = l2tp_ip6_getname,
-	.poll		   = datagram_poll,
+	.poll_mask	   = datagram_poll_mask,
 	.ioctl		   = inet6_ioctl,
 	.listen		   = sock_no_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 830469766c1f..3d8ca1231f8f 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1788,7 +1788,7 @@ static const struct proto_ops pppol2tp_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= pppol2tp_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= pppol2tp_setsockopt,
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 1beeea9549fa..804de8490186 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -1192,7 +1192,7 @@ static const struct proto_ops llc_ui_ops = {
 	.socketpair  = sock_no_socketpair,
 	.accept      = llc_ui_accept,
 	.getname     = llc_ui_getname,
-	.poll	     = datagram_poll,
+	.poll_mask   = datagram_poll_mask,
 	.ioctl       = llc_ui_ioctl,
 	.listen      = llc_ui_listen,
 	.shutdown    = llc_ui_shutdown,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 393573a99a5a..1189b84413d5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2658,7 +2658,7 @@ static const struct proto_ops netlink_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	netlink_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	netlink_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index c2888c78d4c1..b97eb766a1d5 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1355,7 +1355,7 @@ static const struct proto_ops nr_proto_ops = {
 	.socketpair	=	sock_no_socketpair,
 	.accept		=	nr_accept,
 	.getname	=	nr_getname,
-	.poll		=	datagram_poll,
+	.poll_mask	=	datagram_poll_mask,
 	.ioctl		=	nr_ioctl,
 	.listen		=	nr_listen,
 	.shutdown	=	sock_no_shutdown,
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index e2188deb08dc..60c322531c49 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -284,7 +284,7 @@ static const struct proto_ops rawsock_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = sock_no_accept,
 	.getname        = sock_no_getname,
-	.poll           = datagram_poll,
+	.poll_mask      = datagram_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = sock_no_listen,
 	.shutdown       = sock_no_shutdown,
@@ -304,7 +304,7 @@ static const struct proto_ops rawsock_raw_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = sock_no_accept,
 	.getname        = sock_no_getname,
-	.poll           = datagram_poll,
+	.poll_mask      = datagram_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = sock_no_listen,
 	.shutdown       = sock_no_shutdown,
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 833e65252f1f..78c32d6fe4ce 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4108,12 +4108,11 @@ static int packet_ioctl(struct socket *sock, unsigned int cmd,
 	return 0;
 }
 
-static __poll_t packet_poll(struct file *file, struct socket *sock,
-				poll_table *wait)
+static __poll_t packet_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct packet_sock *po = pkt_sk(sk);
-	__poll_t mask = datagram_poll(file, sock, wait);
+	__poll_t mask = datagram_poll_mask(sock, events);
 
 	spin_lock_bh(&sk->sk_receive_queue.lock);
 	if (po->rx_ring.pg_vec) {
@@ -4455,7 +4454,7 @@ static const struct proto_ops packet_ops_spkt = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	packet_getname_spkt,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	packet_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
@@ -4476,7 +4475,7 @@ static const struct proto_ops packet_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	packet_getname,
-	.poll =		packet_poll,
+	.poll_mask =	packet_poll_mask,
 	.ioctl =	packet_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 30187990257f..59f5b5dc5308 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -448,7 +448,7 @@ const struct proto_ops phonet_dgram_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= pn_socket_getname,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.ioctl		= pn_socket_ioctl,
 	.listen		= sock_no_listen,
 	.shutdown	= sock_no_shutdown,
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
index 2aa07b547b16..1b5025ea5b04 100644
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -1023,7 +1023,7 @@ static const struct proto_ops qrtr_proto_ops = {
 	.recvmsg	= qrtr_recvmsg,
 	.getname	= qrtr_getname,
 	.ioctl		= qrtr_ioctl,
-	.poll		= datagram_poll,
+	.poll_mask	= datagram_poll_mask,
 	.shutdown	= sock_no_shutdown,
 	.setsockopt	= sock_no_setsockopt,
 	.getsockopt	= sock_no_getsockopt,
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 22a7f2b413ac..5b73fea849df 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1470,7 +1470,7 @@ static const struct proto_ops rose_proto_ops = {
 	.socketpair	=	sock_no_socketpair,
 	.accept		=	rose_accept,
 	.getname	=	rose_getname,
-	.poll		=	datagram_poll,
+	.poll_mask	=	datagram_poll_mask,
 	.ioctl		=	rose_ioctl,
 	.listen		=	rose_listen,
 	.shutdown	=	sock_no_shutdown,
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index d49aa79b7997..f93365ae0fdd 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1750,7 +1750,7 @@ static const struct proto_ops x25_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	x25_accept,
 	.getname =	x25_getname,
-	.poll =		datagram_poll,
+	.poll_mask =	datagram_poll_mask,
 	.ioctl =	x25_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl = compat_x25_ioctl,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 18/33] net/dccp: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 17/33] net: convert datagram_poll users tp ->poll_mask Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 19/33] net/atm: " Christoph Hellwig
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/dccp/dccp.h  |  3 +--
 net/dccp/ipv4.c  |  2 +-
 net/dccp/ipv6.c  |  2 +-
 net/dccp/proto.c | 13 ++-----------
 4 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index f91e3816806b..0ea2ee56ac1b 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -316,8 +316,7 @@ int dccp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		 int flags, int *addr_len);
 void dccp_shutdown(struct sock *sk, int how);
 int inet_dccp_listen(struct socket *sock, int backlog);
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-		       poll_table *wait);
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events);
 int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 void dccp_req_err(struct sock *sk, u64 seq);
 
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index b08feb219b44..a9e478cd3787 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -984,7 +984,7 @@ static const struct proto_ops inet_dccp_ops = {
 	.accept		   = inet_accept,
 	.getname	   = inet_getname,
 	/* FIXME: work on tcp_poll to rename it to inet_csk_poll */
-	.poll		   = dccp_poll,
+	.poll_mask	   = dccp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	/* FIXME: work on inet_listen to rename it to sock_common_listen */
 	.listen		   = inet_dccp_listen,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 6344f1b18a6a..17fc4e0166ba 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1070,7 +1070,7 @@ static const struct proto_ops inet6_dccp_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = inet6_getname,
-	.poll		   = dccp_poll,
+	.poll_mask	   = dccp_poll_mask,
 	.ioctl		   = inet6_ioctl,
 	.listen		   = inet_dccp_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 84cd4e3fd01b..88b668db244b 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -314,20 +314,11 @@ int dccp_disconnect(struct sock *sk, int flags)
 
 EXPORT_SYMBOL_GPL(dccp_disconnect);
 
-/*
- *	Wait for a DCCP event.
- *
- *	Note that we don't need to lock the socket, as the upper poll layers
- *	take care of normal races (between the test and the event) and we don't
- *	go look at any of the socket buffers directly.
- */
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-		       poll_table *wait)
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events)
 {
 	__poll_t mask;
 	struct sock *sk = sock->sk;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
 	if (sk->sk_state == DCCP_LISTEN)
 		return inet_csk_listen_poll(sk);
 
@@ -369,7 +360,7 @@ __poll_t dccp_poll(struct file *file, struct socket *sock,
 	return mask;
 }
 
-EXPORT_SYMBOL_GPL(dccp_poll);
+EXPORT_SYMBOL_GPL(dccp_poll_mask);
 
 int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 19/33] net/atm: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 18/33] net/dccp: convert to ->poll_mask Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 20/33] net/vmw_vsock: " Christoph Hellwig
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/atm/common.c | 11 +++--------
 net/atm/common.h |  2 +-
 net/atm/pvc.c    |  2 +-
 net/atm/svc.c    |  2 +-
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/atm/common.c b/net/atm/common.c
index fc78a0508ae1..1f2af59935db 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -648,16 +648,11 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t size)
 	return error;
 }
 
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	struct atm_vcc *vcc;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
-
-	vcc = ATM_SD(sock);
+	struct atm_vcc *vcc = ATM_SD(sock);
+	__poll_t mask = 0;
 
 	/* exceptional events */
 	if (sk->sk_err)
diff --git a/net/atm/common.h b/net/atm/common.h
index 5850649068bb..526796ad230f 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -17,7 +17,7 @@ int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
 int vcc_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		int flags);
 int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len);
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events);
 int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_setsockopt(struct socket *sock, int level, int optname,
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index 2cb10af16afc..9f75092fe778 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -113,7 +113,7 @@ static const struct proto_ops pvc_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	pvc_getname,
-	.poll =		vcc_poll,
+	.poll_mask =	vcc_poll_mask,
 	.ioctl =	vcc_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl = vcc_compat_ioctl,
diff --git a/net/atm/svc.c b/net/atm/svc.c
index 2f91b766ac42..53f4ad7087b1 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -636,7 +636,7 @@ static const struct proto_ops svc_proto_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	svc_accept,
 	.getname =	svc_getname,
-	.poll =		vcc_poll,
+	.poll_mask =	vcc_poll_mask,
 	.ioctl =	svc_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	svc_compat_ioctl,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 20/33] net/vmw_vsock: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (18 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 19/33] net/atm: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 21/33] net/tipc: " Christoph Hellwig
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/vmw_vsock/af_vsock.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index c1076c19b858..bb5d5fa68c35 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -850,18 +850,11 @@ static int vsock_shutdown(struct socket *sock, int mode)
 	return err;
 }
 
-static __poll_t vsock_poll(struct file *file, struct socket *sock,
-			       poll_table *wait)
+static __poll_t vsock_poll_mask(struct socket *sock, __poll_t events)
 {
-	struct sock *sk;
-	__poll_t mask;
-	struct vsock_sock *vsk;
-
-	sk = sock->sk;
-	vsk = vsock_sk(sk);
-
-	poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	struct sock *sk = sock->sk;
+	struct vsock_sock *vsk = vsock_sk(sk);
+	__poll_t mask = 0;
 
 	if (sk->sk_err)
 		/* Signify that there has been an error on this socket. */
@@ -1091,7 +1084,7 @@ static const struct proto_ops vsock_dgram_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = vsock_getname,
-	.poll = vsock_poll,
+	.poll_mask = vsock_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = sock_no_listen,
 	.shutdown = vsock_shutdown,
@@ -1849,7 +1842,7 @@ static const struct proto_ops vsock_stream_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = vsock_accept,
 	.getname = vsock_getname,
-	.poll = vsock_poll,
+	.poll_mask = vsock_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = vsock_listen,
 	.shutdown = vsock_shutdown,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 21/33] net/tipc: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (19 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 20/33] net/vmw_vsock: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 22/33] net/sctp: " Christoph Hellwig
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/tipc/socket.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 6be21575503a..3bb45042e833 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -692,10 +692,9 @@ static int tipc_getname(struct socket *sock, struct sockaddr *uaddr,
 }
 
 /**
- * tipc_poll - read and possibly block on pollmask
+ * tipc_poll - read pollmask
  * @file: file structure associated with the socket
  * @sock: socket for which to calculate the poll bits
- * @wait: ???
  *
  * Returns pollmask value
  *
@@ -709,15 +708,12 @@ static int tipc_getname(struct socket *sock, struct sockaddr *uaddr,
  * imply that the operation will succeed, merely that it should be performed
  * and will not block.
  */
-static __poll_t tipc_poll(struct file *file, struct socket *sock,
-			      poll_table *wait)
+static __poll_t tipc_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct tipc_sock *tsk = tipc_sk(sk);
 	__poll_t revents = 0;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_shutdown & RCV_SHUTDOWN)
 		revents |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
 	if (sk->sk_shutdown == SHUTDOWN_MASK)
@@ -3028,7 +3024,7 @@ static const struct proto_ops msg_ops = {
 	.socketpair	= tipc_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= tipc_getname,
-	.poll		= tipc_poll,
+	.poll_mask	= tipc_poll_mask,
 	.ioctl		= tipc_ioctl,
 	.listen		= sock_no_listen,
 	.shutdown	= tipc_shutdown,
@@ -3049,7 +3045,7 @@ static const struct proto_ops packet_ops = {
 	.socketpair	= tipc_socketpair,
 	.accept		= tipc_accept,
 	.getname	= tipc_getname,
-	.poll		= tipc_poll,
+	.poll_mask	= tipc_poll_mask,
 	.ioctl		= tipc_ioctl,
 	.listen		= tipc_listen,
 	.shutdown	= tipc_shutdown,
@@ -3070,7 +3066,7 @@ static const struct proto_ops stream_ops = {
 	.socketpair	= tipc_socketpair,
 	.accept		= tipc_accept,
 	.getname	= tipc_getname,
-	.poll		= tipc_poll,
+	.poll_mask	= tipc_poll_mask,
 	.ioctl		= tipc_ioctl,
 	.listen		= tipc_listen,
 	.shutdown	= tipc_shutdown,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 22/33] net/sctp: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (20 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 21/33] net/tipc: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 23/33] net/bluetooth: " Christoph Hellwig
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/sctp/sctp.h | 3 +--
 net/sctp/ipv6.c         | 2 +-
 net/sctp/protocol.c     | 2 +-
 net/sctp/socket.c       | 4 +---
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 28b996d63490..206100cc665b 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -107,8 +107,7 @@ int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb);
 int sctp_inet_listen(struct socket *sock, int backlog);
 void sctp_write_space(struct sock *sk);
 void sctp_data_ready(struct sock *sk);
-__poll_t sctp_poll(struct file *file, struct socket *sock,
-		poll_table *wait);
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events);
 void sctp_sock_rfree(struct sk_buff *skb);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 		    struct sctp_association *asoc);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 42247110d842..2bcbc41aaffb 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -1010,7 +1010,7 @@ static const struct proto_ops inet6_seqpacket_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = sctp_getname,
-	.poll		   = sctp_poll,
+	.poll_mask	   = sctp_poll_mask,
 	.ioctl		   = inet6_ioctl,
 	.listen		   = sctp_inet_listen,
 	.shutdown	   = inet_shutdown,
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index d685f8456762..a1d2ea3ff4c9 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1016,7 +1016,7 @@ static const struct proto_ops inet_seqpacket_ops = {
 	.socketpair	   = sock_no_socketpair,
 	.accept		   = inet_accept,
 	.getname	   = inet_getname,	/* Semantics are different.  */
-	.poll		   = sctp_poll,
+	.poll_mask	   = sctp_poll_mask,
 	.ioctl		   = inet_ioctl,
 	.listen		   = sctp_inet_listen,
 	.shutdown	   = inet_shutdown,	/* Looks harmless.  */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 80835ac26d2c..f6bb1b89525c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7701,14 +7701,12 @@ int sctp_inet_listen(struct socket *sock, int backlog)
  * here, again, by modeling the current TCP/UDP code.  We don't have
  * a good way to test with it yet.
  */
-__poll_t sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct sctp_sock *sp = sctp_sk(sk);
 	__poll_t mask;
 
-	poll_wait(file, sk_sleep(sk), wait);
-
 	sock_rps_record_flow(sk);
 
 	/* A TCP-style listening socket becomes readable when the accept queue
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 23/33] net/bluetooth: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (21 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 22/33] net/sctp: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 24/33] net/caif: " Christoph Hellwig
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/bluetooth/bluetooth.h | 2 +-
 net/bluetooth/af_bluetooth.c      | 7 ++-----
 net/bluetooth/l2cap_sock.c        | 2 +-
 net/bluetooth/rfcomm/sock.c       | 2 +-
 net/bluetooth/sco.c               | 2 +-
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index ec9d6bc65855..53ce8176c313 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -271,7 +271,7 @@ int  bt_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 		     int flags);
 int  bt_sock_stream_recvmsg(struct socket *sock, struct msghdr *msg,
 			    size_t len, int flags);
-__poll_t bt_sock_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events);
 int  bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int  bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo);
 int  bt_sock_wait_ready(struct sock *sk, unsigned long flags);
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 3264e1873219..510ab4f55df5 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -437,16 +437,13 @@ static inline __poll_t bt_accept_poll(struct sock *parent)
 	return 0;
 }
 
-__poll_t bt_sock_poll(struct file *file, struct socket *sock,
-			  poll_table *wait)
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	__poll_t mask = 0;
 
 	BT_DBG("sock %p, sk %p", sock, sk);
 
-	poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == BT_LISTEN)
 		return bt_accept_poll(sk);
 
@@ -478,7 +475,7 @@ __poll_t bt_sock_poll(struct file *file, struct socket *sock,
 
 	return mask;
 }
-EXPORT_SYMBOL(bt_sock_poll);
+EXPORT_SYMBOL(bt_sock_poll_mask);
 
 int bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 686bdc6b35b0..742a190034e6 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1653,7 +1653,7 @@ static const struct proto_ops l2cap_sock_ops = {
 	.getname	= l2cap_sock_getname,
 	.sendmsg	= l2cap_sock_sendmsg,
 	.recvmsg	= l2cap_sock_recvmsg,
-	.poll		= bt_sock_poll,
+	.poll_mask	= bt_sock_poll_mask,
 	.ioctl		= bt_sock_ioctl,
 	.mmap		= sock_no_mmap,
 	.socketpair	= sock_no_socketpair,
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index d606e9212291..1cf57622473a 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -1049,7 +1049,7 @@ static const struct proto_ops rfcomm_sock_ops = {
 	.setsockopt	= rfcomm_sock_setsockopt,
 	.getsockopt	= rfcomm_sock_getsockopt,
 	.ioctl		= rfcomm_sock_ioctl,
-	.poll		= bt_sock_poll,
+	.poll_mask	= bt_sock_poll_mask,
 	.socketpair	= sock_no_socketpair,
 	.mmap		= sock_no_mmap
 };
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index 413b8ee49fec..d60dbc61d170 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -1197,7 +1197,7 @@ static const struct proto_ops sco_sock_ops = {
 	.getname	= sco_sock_getname,
 	.sendmsg	= sco_sock_sendmsg,
 	.recvmsg	= sco_sock_recvmsg,
-	.poll		= bt_sock_poll,
+	.poll_mask	= bt_sock_poll_mask,
 	.ioctl		= bt_sock_ioctl,
 	.mmap		= sock_no_mmap,
 	.socketpair	= sock_no_socketpair,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 24/33] net/caif: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (22 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 23/33] net/bluetooth: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 25/33] net/nfc: " Christoph Hellwig
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/caif/caif_socket.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index a6fb1b3bcad9..c7991867d622 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -934,15 +934,11 @@ static int caif_release(struct socket *sock)
 }
 
 /* Copied from af_unix.c:unix_poll(), added CAIF tx_flow handling */
-static __poll_t caif_poll(struct file *file,
-			      struct socket *sock, poll_table *wait)
+static __poll_t caif_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
-	__poll_t mask;
 	struct caifsock *cf_sk = container_of(sk, struct caifsock, sk);
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* exceptional events? */
 	if (sk->sk_err)
@@ -976,7 +972,7 @@ static const struct proto_ops caif_seqpacket_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = sock_no_getname,
-	.poll = caif_poll,
+	.poll_mask = caif_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = sock_no_listen,
 	.shutdown = sock_no_shutdown,
@@ -997,7 +993,7 @@ static const struct proto_ops caif_stream_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = sock_no_getname,
-	.poll = caif_poll,
+	.poll_mask = caif_poll_mask,
 	.ioctl = sock_no_ioctl,
 	.listen = sock_no_listen,
 	.shutdown = sock_no_shutdown,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 25/33] net/nfc: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (23 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 24/33] net/caif: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 26/33] net/phonet: " Christoph Hellwig
                   ` (8 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/nfc/llcp_sock.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index ea0c0c6f1874..ab5bb14b49af 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -548,16 +548,13 @@ static inline __poll_t llcp_accept_poll(struct sock *parent)
 	return 0;
 }
 
-static __poll_t llcp_sock_poll(struct file *file, struct socket *sock,
-				   poll_table *wait)
+static __poll_t llcp_sock_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	__poll_t mask = 0;
 
 	pr_debug("%p\n", sk);
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == LLCP_LISTEN)
 		return llcp_accept_poll(sk);
 
@@ -899,7 +896,7 @@ static const struct proto_ops llcp_sock_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = llcp_sock_accept,
 	.getname        = llcp_sock_getname,
-	.poll           = llcp_sock_poll,
+	.poll_mask      = llcp_sock_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = llcp_sock_listen,
 	.shutdown       = sock_no_shutdown,
@@ -919,7 +916,7 @@ static const struct proto_ops llcp_rawsock_ops = {
 	.socketpair     = sock_no_socketpair,
 	.accept         = sock_no_accept,
 	.getname        = llcp_sock_getname,
-	.poll           = llcp_sock_poll,
+	.poll_mask      = llcp_sock_poll_mask,
 	.ioctl          = sock_no_ioctl,
 	.listen         = sock_no_listen,
 	.shutdown       = sock_no_shutdown,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 26/33] net/phonet: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (24 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 25/33] net/nfc: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 27/33] net/iucv: " Christoph Hellwig
                   ` (7 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/phonet/socket.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 59f5b5dc5308..c295c4e20f01 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -340,15 +340,12 @@ static int pn_socket_getname(struct socket *sock, struct sockaddr *addr,
 	return sizeof(struct sockaddr_pn);
 }
 
-static __poll_t pn_socket_poll(struct file *file, struct socket *sock,
-					poll_table *wait)
+static __poll_t pn_socket_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct pep_sock *pn = pep_sk(sk);
 	__poll_t mask = 0;
 
-	poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == TCP_CLOSE)
 		return EPOLLERR;
 	if (!skb_queue_empty(&sk->sk_receive_queue))
@@ -473,7 +470,7 @@ const struct proto_ops phonet_stream_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= pn_socket_accept,
 	.getname	= pn_socket_getname,
-	.poll		= pn_socket_poll,
+	.poll_mask	= pn_socket_poll_mask,
 	.ioctl		= pn_socket_ioctl,
 	.listen		= pn_socket_listen,
 	.shutdown	= sock_no_shutdown,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 27/33] net/iucv: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (25 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 26/33] net/phonet: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 28/33] net/rxrpc: " Christoph Hellwig
                   ` (6 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/iucv/af_iucv.h | 2 --
 net/iucv/af_iucv.c         | 7 ++-----
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/net/iucv/af_iucv.h b/include/net/iucv/af_iucv.h
index f4c21b5a1242..b0eaeb02d46d 100644
--- a/include/net/iucv/af_iucv.h
+++ b/include/net/iucv/af_iucv.h
@@ -153,8 +153,6 @@ struct iucv_sock_list {
 	atomic_t	  autobind_name;
 };
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-			    poll_table *wait);
 void iucv_sock_link(struct iucv_sock_list *l, struct sock *s);
 void iucv_sock_unlink(struct iucv_sock_list *l, struct sock *s);
 void iucv_accept_enqueue(struct sock *parent, struct sock *sk);
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 893a022f9620..68e86257a549 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1488,14 +1488,11 @@ static inline __poll_t iucv_accept_poll(struct sock *parent)
 	return 0;
 }
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-			    poll_table *wait)
+static __poll_t iucv_sock_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	__poll_t mask = 0;
 
-	sock_poll_wait(file, sk_sleep(sk), wait);
-
 	if (sk->sk_state == IUCV_LISTEN)
 		return iucv_accept_poll(sk);
 
@@ -2388,7 +2385,7 @@ static const struct proto_ops iucv_sock_ops = {
 	.getname	= iucv_sock_getname,
 	.sendmsg	= iucv_sock_sendmsg,
 	.recvmsg	= iucv_sock_recvmsg,
-	.poll		= iucv_sock_poll,
+	.poll_mask	= iucv_sock_poll_mask,
 	.ioctl		= sock_no_ioctl,
 	.mmap		= sock_no_mmap,
 	.socketpair	= sock_no_socketpair,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 28/33] net/rxrpc: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (26 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 27/33] net/iucv: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 29/33] crypto: af_alg: " Christoph Hellwig
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/rxrpc/af_rxrpc.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 2b463047dd7b..3b1ac93efee2 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -734,15 +734,11 @@ static int rxrpc_getsockopt(struct socket *sock, int level, int optname,
 /*
  * permit an RxRPC socket to be polled
  */
-static __poll_t rxrpc_poll(struct file *file, struct socket *sock,
-			       poll_table *wait)
+static __poll_t rxrpc_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct rxrpc_sock *rx = rxrpc_sk(sk);
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	/* the socket is readable if there are any messages waiting on the Rx
 	 * queue */
@@ -949,7 +945,7 @@ static const struct proto_ops rxrpc_rpc_ops = {
 	.socketpair	= sock_no_socketpair,
 	.accept		= sock_no_accept,
 	.getname	= sock_no_getname,
-	.poll		= rxrpc_poll,
+	.poll_mask	= rxrpc_poll_mask,
 	.ioctl		= sock_no_ioctl,
 	.listen		= rxrpc_listen,
 	.shutdown	= rxrpc_shutdown,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 29/33] crypto: af_alg: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (27 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 28/33] net/rxrpc: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 30/33] pipe: " Christoph Hellwig
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 crypto/af_alg.c         | 13 +++----------
 crypto/algif_aead.c     |  4 ++--
 crypto/algif_skcipher.c |  4 ++--
 include/crypto/if_alg.h |  3 +--
 4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 80838c1cef94..89ed613c017e 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1060,19 +1060,12 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
 }
 EXPORT_SYMBOL_GPL(af_alg_async_cb);
 
-/**
- * af_alg_poll - poll system call handler
- */
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-			 poll_table *wait)
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
 	struct af_alg_ctx *ctx = ask->private;
-	__poll_t mask;
-
-	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
+	__poll_t mask = 0;
 
 	if (!ctx->more || ctx->used)
 		mask |= EPOLLIN | EPOLLRDNORM;
@@ -1082,7 +1075,7 @@ __poll_t af_alg_poll(struct file *file, struct socket *sock,
 
 	return mask;
 }
-EXPORT_SYMBOL_GPL(af_alg_poll);
+EXPORT_SYMBOL_GPL(af_alg_poll_mask);
 
 /**
  * af_alg_alloc_areq - allocate struct af_alg_async_req
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 4b07edd5a9ff..330cf9f2b767 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -375,7 +375,7 @@ static struct proto_ops algif_aead_ops = {
 	.sendmsg	=	aead_sendmsg,
 	.sendpage	=	af_alg_sendpage,
 	.recvmsg	=	aead_recvmsg,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static int aead_check_key(struct socket *sock)
@@ -471,7 +471,7 @@ static struct proto_ops algif_aead_ops_nokey = {
 	.sendmsg	=	aead_sendmsg_nokey,
 	.sendpage	=	aead_sendpage_nokey,
 	.recvmsg	=	aead_recvmsg_nokey,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static void *aead_bind(const char *name, u32 type, u32 mask)
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index c4e885df4564..15cf3c5222e0 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -205,7 +205,7 @@ static struct proto_ops algif_skcipher_ops = {
 	.sendmsg	=	skcipher_sendmsg,
 	.sendpage	=	af_alg_sendpage,
 	.recvmsg	=	skcipher_recvmsg,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static int skcipher_check_key(struct socket *sock)
@@ -301,7 +301,7 @@ static struct proto_ops algif_skcipher_ops_nokey = {
 	.sendmsg	=	skcipher_sendmsg_nokey,
 	.sendpage	=	skcipher_sendpage_nokey,
 	.recvmsg	=	skcipher_recvmsg_nokey,
-	.poll		=	af_alg_poll,
+	.poll_mask	=	af_alg_poll_mask,
 };
 
 static void *skcipher_bind(const char *name, u32 type, u32 mask)
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 482461d8931d..cc414db9da0a 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -245,8 +245,7 @@ ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
 			int offset, size_t size, int flags);
 void af_alg_free_resources(struct af_alg_async_req *areq);
 void af_alg_async_cb(struct crypto_async_request *_req, int err);
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-			 poll_table *wait);
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events);
 struct af_alg_async_req *af_alg_alloc_areq(struct sock *sk,
 					   unsigned int areqlen);
 int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 30/33] pipe: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (28 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 29/33] crypto: af_alg: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 31/33] eventfd: switch " Christoph Hellwig
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/pipe.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 39d6f431da83..bb0840e234f3 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -509,19 +509,22 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	}
 }
 
-/* No kernel lock held - fine */
-static __poll_t
-pipe_poll(struct file *filp, poll_table *wait)
+static struct wait_queue_head *
+pipe_get_poll_head(struct file *filp, __poll_t events)
 {
-	__poll_t mask;
 	struct pipe_inode_info *pipe = filp->private_data;
-	int nrbufs;
 
-	poll_wait(filp, &pipe->wait, wait);
+	return &pipe->wait;
+}
+
+/* No kernel lock held - fine */
+static __poll_t pipe_poll_mask(struct file *filp, __poll_t events)
+{
+	struct pipe_inode_info *pipe = filp->private_data;
+	int nrbufs = pipe->nrbufs;
+	__poll_t mask = 0;
 
 	/* Reading only -- no need for acquiring the semaphore.  */
-	nrbufs = pipe->nrbufs;
-	mask = 0;
 	if (filp->f_mode & FMODE_READ) {
 		mask = (nrbufs > 0) ? EPOLLIN | EPOLLRDNORM : 0;
 		if (!pipe->writers && filp->f_version != pipe->w_counter)
@@ -1020,7 +1023,8 @@ const struct file_operations pipefifo_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= pipe_read,
 	.write_iter	= pipe_write,
-	.poll		= pipe_poll,
+	.get_poll_head	= pipe_get_poll_head,
+	.poll_mask	= pipe_poll_mask,
 	.unlocked_ioctl	= pipe_ioctl,
 	.release	= pipe_release,
 	.fasync		= pipe_fasync,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 31/33] eventfd: switch to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (29 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 30/33] pipe: " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 32/33] timerfd: convert " Christoph Hellwig
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/eventfd.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 08d3bd602f73..61c9514da5e9 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -101,14 +101,20 @@ static int eventfd_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static __poll_t eventfd_poll(struct file *file, poll_table *wait)
+static struct wait_queue_head *
+eventfd_get_poll_head(struct file *file, __poll_t events)
+{
+	struct eventfd_ctx *ctx = file->private_data;
+
+	return &ctx->wqh;
+}
+
+static __poll_t eventfd_poll_mask(struct file *file, __poll_t eventmask)
 {
 	struct eventfd_ctx *ctx = file->private_data;
 	__poll_t events = 0;
 	u64 count;
 
-	poll_wait(file, &ctx->wqh, wait);
-
 	/*
 	 * All writes to ctx->count occur within ctx->wqh.lock.  This read
 	 * can be done outside ctx->wqh.lock because we know that poll_wait
@@ -305,7 +311,8 @@ static const struct file_operations eventfd_fops = {
 	.show_fdinfo	= eventfd_show_fdinfo,
 #endif
 	.release	= eventfd_release,
-	.poll		= eventfd_poll,
+	.get_poll_head	= eventfd_get_poll_head,
+	.poll_mask	= eventfd_poll_mask,
 	.read		= eventfd_read,
 	.write		= eventfd_write,
 	.llseek		= noop_llseek,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 32/33] timerfd: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (30 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 31/33] eventfd: switch " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-23 19:20 ` [PATCH 33/33] random: " Christoph Hellwig
  2018-05-26  0:11 ` aio poll and a new in-kernel poll API V13 Al Viro
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/timerfd.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index cdad49da3ff7..d84a2bee4f82 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -226,21 +226,20 @@ static int timerfd_release(struct inode *inode, struct file *file)
 	kfree_rcu(ctx, rcu);
 	return 0;
 }
-
-static __poll_t timerfd_poll(struct file *file, poll_table *wait)
+	
+static struct wait_queue_head *timerfd_get_poll_head(struct file *file,
+		__poll_t eventmask)
 {
 	struct timerfd_ctx *ctx = file->private_data;
-	__poll_t events = 0;
-	unsigned long flags;
 
-	poll_wait(file, &ctx->wqh, wait);
+	return &ctx->wqh;
+}
 
-	spin_lock_irqsave(&ctx->wqh.lock, flags);
-	if (ctx->ticks)
-		events |= EPOLLIN;
-	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+static __poll_t timerfd_poll_mask(struct file *file, __poll_t eventmask)
+{
+	struct timerfd_ctx *ctx = file->private_data;
 
-	return events;
+	return ctx->ticks ? EPOLLIN : 0;
 }
 
 static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
@@ -364,7 +363,8 @@ static long timerfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg
 
 static const struct file_operations timerfd_fops = {
 	.release	= timerfd_release,
-	.poll		= timerfd_poll,
+	.get_poll_head	= timerfd_get_poll_head,
+	.poll_mask	= timerfd_poll_mask,
 	.read		= timerfd_read,
 	.llseek		= noop_llseek,
 	.show_fdinfo	= timerfd_show,
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 33/33] random: convert to ->poll_mask
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (31 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 32/33] timerfd: convert " Christoph Hellwig
@ 2018-05-23 19:20 ` Christoph Hellwig
  2018-05-26  0:11 ` aio poll and a new in-kernel poll API V13 Al Viro
  33 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-23 19:20 UTC (permalink / raw)
  To: viro
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

The big change is that random_read_wait and random_write_wait are merged
into a single waitqueue that uses keyed wakeups.  Because wait_event_*
doesn't know about that this will lead to occassional spurious wakeups
in _random_read and add_hwgenerator_randomness, but wait_event_* is
designed to handle these and were are not in a a hot path there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/char/random.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index cd888d4ee605..a8fb0020ba5c 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -402,8 +402,7 @@ static struct poolinfo {
 /*
  * Static global variables
  */
-static DECLARE_WAIT_QUEUE_HEAD(random_read_wait);
-static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
+static DECLARE_WAIT_QUEUE_HEAD(random_wait);
 static struct fasync_struct *fasync;
 
 static DEFINE_SPINLOCK(random_ready_list_lock);
@@ -722,8 +721,8 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
 
 		/* should we wake readers? */
 		if (entropy_bits >= random_read_wakeup_bits &&
-		    wq_has_sleeper(&random_read_wait)) {
-			wake_up_interruptible(&random_read_wait);
+		    wq_has_sleeper(&random_wait)) {
+			wake_up_interruptible_poll(&random_wait, POLLIN);
 			kill_fasync(&fasync, SIGIO, POLL_IN);
 		}
 		/* If the input pool is getting full, send some
@@ -1397,7 +1396,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
 	trace_debit_entropy(r->name, 8 * ibytes);
 	if (ibytes &&
 	    (r->entropy_count >> ENTROPY_SHIFT) < random_write_wakeup_bits) {
-		wake_up_interruptible(&random_write_wait);
+		wake_up_interruptible_poll(&random_wait, POLLOUT);
 		kill_fasync(&fasync, SIGIO, POLL_OUT);
 	}
 
@@ -1839,7 +1838,7 @@ _random_read(int nonblock, char __user *buf, size_t nbytes)
 		if (nonblock)
 			return -EAGAIN;
 
-		wait_event_interruptible(random_read_wait,
+		wait_event_interruptible(random_wait,
 			ENTROPY_BITS(&input_pool) >=
 			random_read_wakeup_bits);
 		if (signal_pending(current))
@@ -1876,14 +1875,17 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	return ret;
 }
 
+static struct wait_queue_head *
+random_get_poll_head(struct file *file, __poll_t events)
+{
+	return &random_wait;
+}
+
 static __poll_t
-random_poll(struct file *file, poll_table * wait)
+random_poll_mask(struct file *file, __poll_t events)
 {
-	__poll_t mask;
+	__poll_t mask = 0;
 
-	poll_wait(file, &random_read_wait, wait);
-	poll_wait(file, &random_write_wait, wait);
-	mask = 0;
 	if (ENTROPY_BITS(&input_pool) >= random_read_wakeup_bits)
 		mask |= EPOLLIN | EPOLLRDNORM;
 	if (ENTROPY_BITS(&input_pool) < random_write_wakeup_bits)
@@ -1990,7 +1992,8 @@ static int random_fasync(int fd, struct file *filp, int on)
 const struct file_operations random_fops = {
 	.read  = random_read,
 	.write = random_write,
-	.poll  = random_poll,
+	.get_poll_head  = random_get_poll_head,
+	.poll_mask  = random_poll_mask,
 	.unlocked_ioctl = random_ioctl,
 	.fasync = random_fasync,
 	.llseek = noop_llseek,
@@ -2323,7 +2326,7 @@ void add_hwgenerator_randomness(const char *buffer, size_t count,
 	 * We'll be woken up again once below random_write_wakeup_thresh,
 	 * or when the calling thread is about to terminate.
 	 */
-	wait_event_interruptible(random_write_wait, kthread_should_stop() ||
+	wait_event_interruptible(random_wait, kthread_should_stop() ||
 			ENTROPY_BITS(&input_pool) <= random_write_wakeup_bits);
 	mix_pool_bytes(poolp, buffer, count);
 	credit_entropy_bits(poolp, entropy);
-- 
2.17.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: aio poll and a new in-kernel poll API V13
  2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
                   ` (32 preceding siblings ...)
  2018-05-23 19:20 ` [PATCH 33/33] random: " Christoph Hellwig
@ 2018-05-26  0:11 ` Al Viro
  2018-05-26  7:09   ` Al Viro
  33 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-26  0:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

On Wed, May 23, 2018 at 09:19:49PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds support for the IOCB_CMD_POLL operation to poll for the
> readyness of file descriptors using the aio subsystem.  The API is based
> on patches that existed in RHAS2.1 and RHEL3, which means it already is
> supported by libaio.  To implement the poll support efficiently new
> methods to poll are introduced in struct file_operations:  get_poll_head
> and poll_mask.  The first one returns a wait_queue_head to wait on
> (lifetime is bound by the file), and the second does a non-blocking
> check for the POLL* events.  This allows aio poll to work without
> any additional context switches, unlike epoll.
> 
> This series sits on top of the aio-fsync series that also includes
> support for io_pgetevents.

OK, I can live with that, except for one problem - the first patch shouldn't
be sitting on top of arseloads of next window fodder.

Please, rebase the rest of the series on top of merge of vfs.git#fixes
(4faa99965e02) with your aio-fsync.4 and tell me what to pull.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: aio poll and a new in-kernel poll API V13
  2018-05-26  0:11 ` aio poll and a new in-kernel poll API V13 Al Viro
@ 2018-05-26  7:09   ` Al Viro
  2018-05-26  7:23     ` Christoph Hellwig
  0 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-26  7:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

On Sat, May 26, 2018 at 01:11:11AM +0100, Al Viro wrote:
> On Wed, May 23, 2018 at 09:19:49PM +0200, Christoph Hellwig wrote:
> > Hi all,
> > 
> > this series adds support for the IOCB_CMD_POLL operation to poll for the
> > readyness of file descriptors using the aio subsystem.  The API is based
> > on patches that existed in RHAS2.1 and RHEL3, which means it already is
> > supported by libaio.  To implement the poll support efficiently new
> > methods to poll are introduced in struct file_operations:  get_poll_head
> > and poll_mask.  The first one returns a wait_queue_head to wait on
> > (lifetime is bound by the file), and the second does a non-blocking
> > check for the POLL* events.  This allows aio poll to work without
> > any additional context switches, unlike epoll.
> > 
> > This series sits on top of the aio-fsync series that also includes
> > support for io_pgetevents.
> 
> OK, I can live with that, except for one problem - the first patch shouldn't
> be sitting on top of arseloads of next window fodder.
> 
> Please, rebase the rest of the series on top of merge of vfs.git#fixes
> (4faa99965e02) with your aio-fsync.4 and tell me what to pull.

UGH

You've based it on vfs.git#hch.aio (== your aio-fsync.4)�+ baf10564fbb6
(== vfs.git#fixes^), *and* started with cherry-pick of vfs.git#fixes
on top of that, followed by your series.

That makes no sense whatsoever.  Please, take your aio-fsync.4, merge
vfs.git#fixes (== 4faa99965e02, "fix io_destroy()/aio_complete() race",
same change as your 4e79230e5254) into it and rebase the rest of your
branch on top of that (from "uapi: turn __poll_t sparse checkin
on by default" to "random: convert to ->poll_mask").  BTW, you probably
want s/checkin/checks/ in the first one of those...

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: aio poll and a new in-kernel poll API V13
  2018-05-26  7:09   ` Al Viro
@ 2018-05-26  7:23     ` Christoph Hellwig
  2018-05-27 22:27       ` Al Viro
  0 siblings, 1 reply; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-26  7:23 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Avi Kivity, linux-aio, linux-fsdevel, netdev,
	linux-api, linux-kernel

I'm still waking up..

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: aio poll and a new in-kernel poll API V13
  2018-05-26  7:23     ` Christoph Hellwig
@ 2018-05-27 22:27       ` Al Viro
  2018-05-27 22:28         ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
  0 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-27 22:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api, linux-kernel

	OK, it's in -next now; there are several cleanups I'd put
into vfs.git#work.aio:
      aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
      aio_read_events_ring(): make a bit more readable
      aio: shift copyin of iocb into io_submit_one()
      aio: fold do_io_submit() into callers
Those are *not* on -next yet and if anybody has objections against
any of those, please yell.  Individual patches in followups...

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
  2018-05-27 22:27       ` Al Viro
@ 2018-05-27 22:28         ` Al Viro
  2018-05-27 22:28           ` [PATCH 2/4] aio_read_events_ring(): make a bit more readable Al Viro
                             ` (3 more replies)
  0 siblings, 4 replies; 72+ messages in thread
From: Al Viro @ 2018-05-27 22:28 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

... so just make them return 0 when caller does not need to destroy iocb

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 8274d09d44a2..c6f29d9d006c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1456,11 +1456,11 @@ static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
 	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
-static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
+static inline void aio_rw_ret(struct kiocb *req, ssize_t ret)
 {
 	switch (ret) {
 	case -EIOCBQUEUED:
-		return ret;
+		break;
 	case -ERESTARTSYS:
 	case -ERESTARTNOINTR:
 	case -ERESTARTNOHAND:
@@ -1473,7 +1473,6 @@ static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
 		/*FALLTHRU*/
 	default:
 		aio_complete_rw(req, ret, 0);
-		return 0;
 	}
 }
 
@@ -1502,10 +1501,10 @@ static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
 		goto out_fput;
 	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret)
-		ret = aio_rw_ret(req, call_read_iter(file, req, &iter));
+		aio_rw_ret(req, call_read_iter(file, req, &iter));
 	kfree(iovec);
 out_fput:
-	if (unlikely(ret && ret != -EIOCBQUEUED))
+	if (unlikely(ret))
 		fput(file);
 	return ret;
 }
@@ -1547,11 +1546,11 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
 			__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
 		}
 		req->ki_flags |= IOCB_WRITE;
-		ret = aio_rw_ret(req, call_write_iter(file, req, &iter));
+		aio_rw_ret(req, call_write_iter(file, req, &iter));
 	}
 	kfree(iovec);
 out_fput:
-	if (unlikely(ret && ret != -EIOCBQUEUED))
+	if (unlikely(ret))
 		fput(file);
 	return ret;
 }
@@ -1582,7 +1581,7 @@ static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync)
 	req->datasync = datasync;
 	INIT_WORK(&req->work, aio_fsync_work);
 	schedule_work(&req->work);
-	return -EIOCBQUEUED;
+	return 0;
 }
 
 /* need to use list_del_init so we can check if item was present */
@@ -1711,7 +1710,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
 done:
 	if (mask)
 		__aio_poll_complete(req, mask);
-	return -EIOCBQUEUED;
+	return 0;
 out_fail:
 	fput(req->file);
 	return -EINVAL; /* same as no support for IOCB_CMD_POLL */
@@ -1795,12 +1794,11 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	}
 
 	/*
-	 * If ret is -EIOCBQUEUED, ownership of the file reference acquired
-	 * above passed to the file system, which at this point might have
-	 * dropped the reference, so we must be careful to not reference it
-	 * once we have called into the file system.
+	 * If ret is 0, we'd either done aio_complete() ourselves or have
+	 * arranged for that to be done asynchronously.  Anything non-zero
+	 * means that we need to destroy req ourselves.
 	 */
-	if (ret && ret != -EIOCBQUEUED)
+	if (ret)
 		goto out_put_req;
 	return 0;
 out_put_req:
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 2/4] aio_read_events_ring(): make a bit more readable
  2018-05-27 22:28         ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
@ 2018-05-27 22:28           ` Al Viro
  2018-05-27 22:28           ` [PATCH 3/4] aio: shift copyin of iocb into io_submit_one() Al Viro
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 72+ messages in thread
From: Al Viro @ 2018-05-27 22:28 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

The logics for 'avail' is
	* not past the tail of cyclic buffer
	* no more than asked
	* not past the end of buffer
	* not past the end of a page

Unobfuscate the last part.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index c6f29d9d006c..cb99a92a5324 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1205,14 +1205,13 @@ static long aio_read_events_ring(struct kioctx *ctx,
 		if (head == tail)
 			break;
 
-		avail = min(avail, nr - ret);
-		avail = min_t(long, avail, AIO_EVENTS_PER_PAGE -
-			    ((head + AIO_EVENTS_OFFSET) % AIO_EVENTS_PER_PAGE));
-
 		pos = head + AIO_EVENTS_OFFSET;
 		page = ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE];
 		pos %= AIO_EVENTS_PER_PAGE;
 
+		avail = min(avail, nr - ret);
+		avail = min_t(long, avail, AIO_EVENTS_PER_PAGE - pos);
+
 		ev = kmap(page);
 		copy_ret = copy_to_user(event + ret, ev + pos,
 					sizeof(*ev) * avail);
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 3/4] aio: shift copyin of iocb into io_submit_one()
  2018-05-27 22:28         ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
  2018-05-27 22:28           ` [PATCH 2/4] aio_read_events_ring(): make a bit more readable Al Viro
@ 2018-05-27 22:28           ` Al Viro
  2018-05-28  5:16             ` Christoph Hellwig
  2018-05-27 22:28           ` [PATCH 4/4] aio: fold do_io_submit() into callers Al Viro
  2018-05-28  5:15           ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Christoph Hellwig
  3 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-27 22:28 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 46 ++++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index cb99a92a5324..29fa2f3c3cba 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1716,22 +1716,26 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
 }
 
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
-			 struct iocb *iocb, bool compat)
+			 bool compat)
 {
 	struct aio_kiocb *req;
+	struct iocb iocb;
 	ssize_t ret;
 
+	if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb))))
+		return -EFAULT;
+
 	/* enforce forwards compatibility on users */
-	if (unlikely(iocb->aio_reserved2)) {
+	if (unlikely(iocb.aio_reserved2)) {
 		pr_debug("EINVAL: reserve field set\n");
 		return -EINVAL;
 	}
 
 	/* prevent overflows */
 	if (unlikely(
-	    (iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
-	    (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
-	    ((ssize_t)iocb->aio_nbytes < 0)
+	    (iocb.aio_buf != (unsigned long)iocb.aio_buf) ||
+	    (iocb.aio_nbytes != (size_t)iocb.aio_nbytes) ||
+	    ((ssize_t)iocb.aio_nbytes < 0)
 	   )) {
 		pr_debug("EINVAL: overflow check\n");
 		return -EINVAL;
@@ -1741,14 +1745,14 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	if (unlikely(!req))
 		return -EAGAIN;
 
-	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
+	if (iocb.aio_flags & IOCB_FLAG_RESFD) {
 		/*
 		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
 		 * instance of the file* now. The file descriptor must be
 		 * an eventfd() fd, and will be signaled for each completed
 		 * event using the eventfd_signal() function.
 		 */
-		req->ki_eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);
+		req->ki_eventfd = eventfd_ctx_fdget((int) iocb.aio_resfd);
 		if (IS_ERR(req->ki_eventfd)) {
 			ret = PTR_ERR(req->ki_eventfd);
 			req->ki_eventfd = NULL;
@@ -1763,31 +1767,31 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	}
 
 	req->ki_user_iocb = user_iocb;
-	req->ki_user_data = iocb->aio_data;
+	req->ki_user_data = iocb.aio_data;
 
-	switch (iocb->aio_lio_opcode) {
+	switch (iocb.aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
-		ret = aio_read(&req->rw, iocb, false, compat);
+		ret = aio_read(&req->rw, &iocb, false, compat);
 		break;
 	case IOCB_CMD_PWRITE:
-		ret = aio_write(&req->rw, iocb, false, compat);
+		ret = aio_write(&req->rw, &iocb, false, compat);
 		break;
 	case IOCB_CMD_PREADV:
-		ret = aio_read(&req->rw, iocb, true, compat);
+		ret = aio_read(&req->rw, &iocb, true, compat);
 		break;
 	case IOCB_CMD_PWRITEV:
-		ret = aio_write(&req->rw, iocb, true, compat);
+		ret = aio_write(&req->rw, &iocb, true, compat);
 		break;
 	case IOCB_CMD_FSYNC:
-		ret = aio_fsync(&req->fsync, iocb, false);
+		ret = aio_fsync(&req->fsync, &iocb, false);
 		break;
 	case IOCB_CMD_FDSYNC:
-		ret = aio_fsync(&req->fsync, iocb, true);
+		ret = aio_fsync(&req->fsync, &iocb, true);
 	case IOCB_CMD_POLL:
-		ret = aio_poll(req, iocb);
+		ret = aio_poll(req, &iocb);
 		break;
 	default:
-		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
+		pr_debug("invalid aio operation %d\n", iocb.aio_lio_opcode);
 		ret = -EINVAL;
 		break;
 	}
@@ -1840,19 +1844,13 @@ static long do_io_submit(aio_context_t ctx_id, long nr,
 	 */
 	for (i=0; i<nr; i++) {
 		struct iocb __user *user_iocb;
-		struct iocb tmp;
 
 		if (unlikely(__get_user(user_iocb, iocbpp + i))) {
 			ret = -EFAULT;
 			break;
 		}
 
-		if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
-			ret = -EFAULT;
-			break;
-		}
-
-		ret = io_submit_one(ctx, user_iocb, &tmp, compat);
+		ret = io_submit_one(ctx, user_iocb, compat);
 		if (ret)
 			break;
 	}
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 4/4] aio: fold do_io_submit() into callers
  2018-05-27 22:28         ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
  2018-05-27 22:28           ` [PATCH 2/4] aio_read_events_ring(): make a bit more readable Al Viro
  2018-05-27 22:28           ` [PATCH 3/4] aio: shift copyin of iocb into io_submit_one() Al Viro
@ 2018-05-27 22:28           ` Al Viro
  2018-05-27 23:14             ` Al Viro
  2018-05-28  5:15           ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Christoph Hellwig
  3 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-27 22:28 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

sanitize the limit checking and get rid of insane "copy array of
32bit pointers into an array of native ones" glue.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 110 +++++++++++++++++++++++++++++----------------------------------
 1 file changed, 50 insertions(+), 60 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 29fa2f3c3cba..6a4d7796681e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1813,8 +1813,20 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	return ret;
 }
 
-static long do_io_submit(aio_context_t ctx_id, long nr,
-			  struct iocb __user *__user *iocbpp, bool compat)
+/* sys_io_submit:
+ *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
+ *	the number of iocbs queued.  May return -EINVAL if the aio_context
+ *	specified by ctx_id is invalid, if nr is < 0, if the iocb at
+ *	*iocbpp[0] is not properly initialized, if the operation specified
+ *	is invalid for the file descriptor in the iocb.  May fail with
+ *	-EFAULT if any of the data structures point to invalid data.  May
+ *	fail with -EBADF if the file descriptor specified in the first
+ *	iocb is invalid.  May fail with -EAGAIN if insufficient resources
+ *	are available to queue any iocbs.  Will return 0 if nr is 0.  Will
+ *	fail with -ENOSYS if not implemented.
+ */
+SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
+		struct iocb __user * __user *, iocbpp)
 {
 	struct kioctx *ctx;
 	long ret = 0;
@@ -1824,33 +1836,25 @@ static long do_io_submit(aio_context_t ctx_id, long nr,
 	if (unlikely(nr < 0))
 		return -EINVAL;
 
-	if (unlikely(nr > LONG_MAX/sizeof(*iocbpp)))
-		nr = LONG_MAX/sizeof(*iocbpp);
-
-	if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
-		return -EFAULT;
-
 	ctx = lookup_ioctx(ctx_id);
 	if (unlikely(!ctx)) {
 		pr_debug("EINVAL: invalid context id\n");
 		return -EINVAL;
 	}
 
-	blk_start_plug(&plug);
+	if (nr > ctx->nr_events)
+		nr = ctx->nr_events;
 
-	/*
-	 * AKPM: should this return a partial result if some of the IOs were
-	 * successfully submitted?
-	 */
-	for (i=0; i<nr; i++) {
+	blk_start_plug(&plug);
+	for (i = 0; i < nr; i++) {
 		struct iocb __user *user_iocb;
 
-		if (unlikely(__get_user(user_iocb, iocbpp + i))) {
+		if (unlikely(get_user(user_iocb, iocbpp + i))) {
 			ret = -EFAULT;
 			break;
 		}
 
-		ret = io_submit_one(ctx, user_iocb, compat);
+		ret = io_submit_one(ctx, user_iocb, false);
 		if (ret)
 			break;
 	}
@@ -1860,59 +1864,45 @@ static long do_io_submit(aio_context_t ctx_id, long nr,
 	return i ? i : ret;
 }
 
-/* sys_io_submit:
- *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
- *	the number of iocbs queued.  May return -EINVAL if the aio_context
- *	specified by ctx_id is invalid, if nr is < 0, if the iocb at
- *	*iocbpp[0] is not properly initialized, if the operation specified
- *	is invalid for the file descriptor in the iocb.  May fail with
- *	-EFAULT if any of the data structures point to invalid data.  May
- *	fail with -EBADF if the file descriptor specified in the first
- *	iocb is invalid.  May fail with -EAGAIN if insufficient resources
- *	are available to queue any iocbs.  Will return 0 if nr is 0.  Will
- *	fail with -ENOSYS if not implemented.
- */
-SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
-		struct iocb __user * __user *, iocbpp)
-{
-	return do_io_submit(ctx_id, nr, iocbpp, 0);
-}
-
 #ifdef CONFIG_COMPAT
-static inline long
-copy_iocb(long nr, u32 __user *ptr32, struct iocb __user * __user *ptr64)
-{
-	compat_uptr_t uptr;
-	int i;
-
-	for (i = 0; i < nr; ++i) {
-		if (get_user(uptr, ptr32 + i))
-			return -EFAULT;
-		if (put_user(compat_ptr(uptr), ptr64 + i))
-			return -EFAULT;
-	}
-	return 0;
-}
-
-#define MAX_AIO_SUBMITS 	(PAGE_SIZE/sizeof(struct iocb *))
 
 COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
-		       int, nr, u32 __user *, iocb)
+		       int, nr, compat_uptr_t __user *, iocb)
 {
-	struct iocb __user * __user *iocb64;
-	long ret;
+	struct kioctx *ctx;
+	long ret = 0;
+	int i = 0;
+	struct blk_plug plug;
 
 	if (unlikely(nr < 0))
 		return -EINVAL;
 
-	if (nr > MAX_AIO_SUBMITS)
-		nr = MAX_AIO_SUBMITS;
+	ctx = lookup_ioctx(ctx_id);
+	if (unlikely(!ctx)) {
+		pr_debug("EINVAL: invalid context id\n");
+		return -EINVAL;
+	}
+
+	if (nr > ctx->nr_events)
+		nr = ctx->nr_events;
 
-	iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64));
-	ret = copy_iocb(nr, iocb, iocb64);
-	if (!ret)
-		ret = do_io_submit(ctx_id, nr, iocb64, 1);
-	return ret;
+	blk_start_plug(&plug);
+	for (i = 0; i < nr; i++) {
+		compat_uptr_t user_iocb;
+
+		if (unlikely(get_user(user_iocb, iocbpp + i))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		ret = io_submit_one(ctx, compat_ptr(user_iocb), true);
+		if (ret)
+			break;
+	}
+	blk_finish_plug(&plug);
+
+	percpu_ref_put(&ctx->users);
+	return i ? i : ret;
 }
 #endif
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 4/4] aio: fold do_io_submit() into callers
  2018-05-27 22:28           ` [PATCH 4/4] aio: fold do_io_submit() into callers Al Viro
@ 2018-05-27 23:14             ` Al Viro
  2018-05-28  5:24               ` Christoph Hellwig
  0 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-27 23:14 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

and now with dumb braino fixed:

aio: fold do_io_submit() into callers
    
sanitize the limit checking and get rid of insane "copy array of
32bit pointers into an array of native ones" glue.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/aio.c b/fs/aio.c
index 29fa2f3c3cba..ef33944aed7c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1813,8 +1813,20 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	return ret;
 }
 
-static long do_io_submit(aio_context_t ctx_id, long nr,
-			  struct iocb __user *__user *iocbpp, bool compat)
+/* sys_io_submit:
+ *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
+ *	the number of iocbs queued.  May return -EINVAL if the aio_context
+ *	specified by ctx_id is invalid, if nr is < 0, if the iocb at
+ *	*iocbpp[0] is not properly initialized, if the operation specified
+ *	is invalid for the file descriptor in the iocb.  May fail with
+ *	-EFAULT if any of the data structures point to invalid data.  May
+ *	fail with -EBADF if the file descriptor specified in the first
+ *	iocb is invalid.  May fail with -EAGAIN if insufficient resources
+ *	are available to queue any iocbs.  Will return 0 if nr is 0.  Will
+ *	fail with -ENOSYS if not implemented.
+ */
+SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
+		struct iocb __user * __user *, iocbpp)
 {
 	struct kioctx *ctx;
 	long ret = 0;
@@ -1824,33 +1836,25 @@ static long do_io_submit(aio_context_t ctx_id, long nr,
 	if (unlikely(nr < 0))
 		return -EINVAL;
 
-	if (unlikely(nr > LONG_MAX/sizeof(*iocbpp)))
-		nr = LONG_MAX/sizeof(*iocbpp);
-
-	if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
-		return -EFAULT;
-
 	ctx = lookup_ioctx(ctx_id);
 	if (unlikely(!ctx)) {
 		pr_debug("EINVAL: invalid context id\n");
 		return -EINVAL;
 	}
 
-	blk_start_plug(&plug);
+	if (nr > ctx->nr_events)
+		nr = ctx->nr_events;
 
-	/*
-	 * AKPM: should this return a partial result if some of the IOs were
-	 * successfully submitted?
-	 */
-	for (i=0; i<nr; i++) {
+	blk_start_plug(&plug);
+	for (i = 0; i < nr; i++) {
 		struct iocb __user *user_iocb;
 
-		if (unlikely(__get_user(user_iocb, iocbpp + i))) {
+		if (unlikely(get_user(user_iocb, iocbpp + i))) {
 			ret = -EFAULT;
 			break;
 		}
 
-		ret = io_submit_one(ctx, user_iocb, compat);
+		ret = io_submit_one(ctx, user_iocb, false);
 		if (ret)
 			break;
 	}
@@ -1860,59 +1864,45 @@ static long do_io_submit(aio_context_t ctx_id, long nr,
 	return i ? i : ret;
 }
 
-/* sys_io_submit:
- *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
- *	the number of iocbs queued.  May return -EINVAL if the aio_context
- *	specified by ctx_id is invalid, if nr is < 0, if the iocb at
- *	*iocbpp[0] is not properly initialized, if the operation specified
- *	is invalid for the file descriptor in the iocb.  May fail with
- *	-EFAULT if any of the data structures point to invalid data.  May
- *	fail with -EBADF if the file descriptor specified in the first
- *	iocb is invalid.  May fail with -EAGAIN if insufficient resources
- *	are available to queue any iocbs.  Will return 0 if nr is 0.  Will
- *	fail with -ENOSYS if not implemented.
- */
-SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
-		struct iocb __user * __user *, iocbpp)
-{
-	return do_io_submit(ctx_id, nr, iocbpp, 0);
-}
-
 #ifdef CONFIG_COMPAT
-static inline long
-copy_iocb(long nr, u32 __user *ptr32, struct iocb __user * __user *ptr64)
-{
-	compat_uptr_t uptr;
-	int i;
-
-	for (i = 0; i < nr; ++i) {
-		if (get_user(uptr, ptr32 + i))
-			return -EFAULT;
-		if (put_user(compat_ptr(uptr), ptr64 + i))
-			return -EFAULT;
-	}
-	return 0;
-}
-
-#define MAX_AIO_SUBMITS 	(PAGE_SIZE/sizeof(struct iocb *))
 
 COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
-		       int, nr, u32 __user *, iocb)
+		       int, nr, compat_uptr_t __user *, iocbpp)
 {
-	struct iocb __user * __user *iocb64;
-	long ret;
+	struct kioctx *ctx;
+	long ret = 0;
+	int i = 0;
+	struct blk_plug plug;
 
 	if (unlikely(nr < 0))
 		return -EINVAL;
 
-	if (nr > MAX_AIO_SUBMITS)
-		nr = MAX_AIO_SUBMITS;
+	ctx = lookup_ioctx(ctx_id);
+	if (unlikely(!ctx)) {
+		pr_debug("EINVAL: invalid context id\n");
+		return -EINVAL;
+	}
+
+	if (nr > ctx->nr_events)
+		nr = ctx->nr_events;
 
-	iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64));
-	ret = copy_iocb(nr, iocb, iocb64);
-	if (!ret)
-		ret = do_io_submit(ctx_id, nr, iocb64, 1);
-	return ret;
+	blk_start_plug(&plug);
+	for (i = 0; i < nr; i++) {
+		compat_uptr_t user_iocb;
+
+		if (unlikely(get_user(user_iocb, iocbpp + i))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		ret = io_submit_one(ctx, compat_ptr(user_iocb), true);
+		if (ret)
+			break;
+	}
+	blk_finish_plug(&plug);
+
+	percpu_ref_put(&ctx->users);
+	return i ? i : ret;
 }
 #endif
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
  2018-05-27 22:28         ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
                             ` (2 preceding siblings ...)
  2018-05-27 22:28           ` [PATCH 4/4] aio: fold do_io_submit() into callers Al Viro
@ 2018-05-28  5:15           ` Christoph Hellwig
  2018-05-28 14:04             ` Al Viro
  3 siblings, 1 reply; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-28  5:15 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig

On Sun, May 27, 2018 at 11:28:50PM +0100, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> ... so just make them return 0 when caller does not need to destroy iocb
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

But I think we really need a better name for aio_rw_ret now.
Unfortunately I can't think of one.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 3/4] aio: shift copyin of iocb into io_submit_one()
  2018-05-27 22:28           ` [PATCH 3/4] aio: shift copyin of iocb into io_submit_one() Al Viro
@ 2018-05-28  5:16             ` Christoph Hellwig
  0 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-28  5:16 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig

On Sun, May 27, 2018 at 11:28:52PM +0100, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 4/4] aio: fold do_io_submit() into callers
  2018-05-27 23:14             ` Al Viro
@ 2018-05-28  5:24               ` Christoph Hellwig
  0 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-28  5:24 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig

> -	if (unlikely(nr > LONG_MAX/sizeof(*iocbpp)))
> -		nr = LONG_MAX/sizeof(*iocbpp);

> +	if (nr > ctx->nr_events)
> +		nr = ctx->nr_events;

This seems like a slight behavior change.  What about splitting
that into a separate, properly document patch?

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
  2018-05-28  5:15           ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Christoph Hellwig
@ 2018-05-28 14:04             ` Al Viro
  2018-05-28 17:54               ` Al Viro
  0 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-28 14:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel

On Mon, May 28, 2018 at 07:15:29AM +0200, Christoph Hellwig wrote:
> On Sun, May 27, 2018 at 11:28:50PM +0100, Al Viro wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > ... so just make them return 0 when caller does not need to destroy iocb
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> Looks good,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> But I think we really need a better name for aio_rw_ret now.
> Unfortunately I can't think of one.

Hell knows...  aio_rw_done(), perhaps?  BTW, I would rather have fput()
in aio_complete_rw() done after ki_list removal - having ->ki_cancel()
callable after fput() is Not Nice(tm).  Consider e.g.
static int ffs_aio_cancel(struct kiocb *kiocb)
{
        struct ffs_io_data *io_data = kiocb->private;
        struct ffs_epfile *epfile = kiocb->ki_filp->private_data;
        int value;

        ENTER();

        spin_lock_irq(&epfile->ffs->eps_lock);

What's to guarantee that kiocb->ki_filp is not already freed and reused by
the time we call that sucker, with its ->private_data pointing to something
completely unrelated?

How about lifting the list removal into aio_complete_rw() and aio_poll_work(),
with WARN_ON() left in its place in aio_complete() itself?  Look:
aio_compelete() call chains are
	aio_complete_rw()
	aio_fsync_work()
	__aio_poll_complete()
		aio_poll_work()
		aio_poll_wake()
		aio_poll()

The call in aio_fsync_work() is guaranteed to have iocb not on cancel lists.
The call in aio_poll_wake() *relies* upon aio_complete() not going into
list removal.  The call in aio_poll() is also guaranteed to be not on cancel
list - we get there only if mask != 0 and we add to ->active_reqs only if
mask == 0.

So if we take the list removal into aio_complete_rw() and aio_poll_wake() we
should get the right ordering - iocb gets removed from the list before fput()
in all cases.  And aio_complete() locking footprint becomes simpler...  As
a fringe benefit, __aio_poll_complete() becomes simply
	fput(req->file);
	aio_complete(iocb, mangle_poll(mask), 0);
since we don't need to order fput() vs. aio_complete() anymore - the caller
of __aio_poll_complete() has already taken care of ->ki_cancel() possibility.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
  2018-05-28 14:04             ` Al Viro
@ 2018-05-28 17:54               ` Al Viro
  2018-05-28 17:57                 ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Al Viro
  2018-05-28 22:20                 ` [PATCH 1/4] vmsplice: lift import_iovec() into do_vmsplice() Al Viro
  0 siblings, 2 replies; 72+ messages in thread
From: Al Viro @ 2018-05-28 17:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel

On Mon, May 28, 2018 at 03:04:34PM +0100, Al Viro wrote:

> How about lifting the list removal into aio_complete_rw() and aio_poll_work(),
> with WARN_ON() left in its place in aio_complete() itself?  Look:
> aio_compelete() call chains are
> 	aio_complete_rw()
> 	aio_fsync_work()
> 	__aio_poll_complete()
> 		aio_poll_work()
> 		aio_poll_wake()
> 		aio_poll()
> 
> The call in aio_fsync_work() is guaranteed to have iocb not on cancel lists.
> The call in aio_poll_wake() *relies* upon aio_complete() not going into
> list removal.  The call in aio_poll() is also guaranteed to be not on cancel
> list - we get there only if mask != 0 and we add to ->active_reqs only if
> mask == 0.
> 
> So if we take the list removal into aio_complete_rw() and aio_poll_wake() we
> should get the right ordering - iocb gets removed from the list before fput()
> in all cases.  And aio_complete() locking footprint becomes simpler...  As
> a fringe benefit, __aio_poll_complete() becomes simply
> 	fput(req->file);
> 	aio_complete(iocb, mangle_poll(mask), 0);
> since we don't need to order fput() vs. aio_complete() anymore - the caller
> of __aio_poll_complete() has already taken care of ->ki_cancel() possibility.

Anyway, what I have in mind is in vfs.git#work.aio; on top of your fix for missing
break it's
      aio: take list removal to (some) callers of aio_complete()
      aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
      aio_read_events_ring(): make a bit more readable
      aio: shift copyin of iocb into io_submit_one()
      aio: fold do_io_submit() into callers
      aio: sanitize the limit checking in io_submit(2)
(in followups)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete()
  2018-05-28 17:54               ` Al Viro
@ 2018-05-28 17:57                 ` Al Viro
  2018-05-28 17:57                   ` [PATCH v2 2/6] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
                                     ` (5 more replies)
  2018-05-28 22:20                 ` [PATCH 1/4] vmsplice: lift import_iovec() into do_vmsplice() Al Viro
  1 sibling, 6 replies; 72+ messages in thread
From: Al Viro @ 2018-05-28 17:57 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

We really want iocb out of io_cancel(2) reach before we start tearing
it down.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index e0b2f183fa1c..f95b167801c2 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1073,14 +1073,6 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
-	if (!list_empty_careful(&iocb->ki_list)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&ctx->ctx_lock, flags);
-		list_del(&iocb->ki_list);
-		spin_unlock_irqrestore(&ctx->ctx_lock, flags);
-	}
-
 	/*
 	 * Add a completion event to the ring buffer. Must be done holding
 	 * ctx->completion_lock to prevent other code from messing with the tail
@@ -1402,6 +1394,15 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
 {
 	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
 
+	if (!list_empty_careful(&iocb->ki_list)) {
+		struct kioctx	*ctx = iocb->ki_ctx;
+		unsigned long flags;
+
+		spin_lock_irqsave(&ctx->ctx_lock, flags);
+		list_del(&iocb->ki_list);
+		spin_unlock_irqrestore(&ctx->ctx_lock, flags);
+	}
+
 	if (kiocb->ki_flags & IOCB_WRITE) {
 		struct inode *inode = file_inode(kiocb->ki_filp);
 
@@ -1594,20 +1595,26 @@ static inline bool __aio_poll_remove(struct poll_iocb *req)
 	return true;
 }
 
-static inline void __aio_poll_complete(struct poll_iocb *req, __poll_t mask)
+static inline void __aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask)
 {
-	struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
-	struct file *file = req->file;
-
+	fput(iocb->poll.file);
 	aio_complete(iocb, mangle_poll(mask), 0);
-	fput(file);
 }
 
 static void aio_poll_work(struct work_struct *work)
 {
-	struct poll_iocb *req = container_of(work, struct poll_iocb, work);
+	struct aio_kiocb *iocb = container_of(work, struct aio_kiocb, poll.work);
+
+	if (!list_empty_careful(&iocb->ki_list)) {
+		struct kioctx	*ctx = iocb->ki_ctx;
+		unsigned long flags;
+
+		spin_lock_irqsave(&ctx->ctx_lock, flags);
+		list_del(&iocb->ki_list);
+		spin_unlock_irqrestore(&ctx->ctx_lock, flags);
+	}
 
-	__aio_poll_complete(req, req->events);
+	__aio_poll_complete(iocb, iocb->poll.events);
 }
 
 static int aio_poll_cancel(struct kiocb *iocb)
@@ -1658,7 +1665,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 		list_del_init(&iocb->ki_list);
 		spin_unlock(&iocb->ki_ctx->ctx_lock);
 
-		__aio_poll_complete(req, mask);
+		__aio_poll_complete(iocb, mask);
 	} else {
 		req->events = mask;
 		INIT_WORK(&req->work, aio_poll_work);
@@ -1710,7 +1717,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
 	spin_unlock_irq(&ctx->ctx_lock);
 done:
 	if (mask)
-		__aio_poll_complete(req, mask);
+		__aio_poll_complete(aiocb, mask);
 	return -EIOCBQUEUED;
 out_fail:
 	fput(req->file);
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 2/6] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
  2018-05-28 17:57                 ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Al Viro
@ 2018-05-28 17:57                   ` Al Viro
  2018-05-29  6:08                     ` Christoph Hellwig
  2018-05-28 17:57                   ` [PATCH v2 3/6] aio_read_events_ring(): make a bit more readable Al Viro
                                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-28 17:57 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

... so just make them return 0 when caller does not need to destroy iocb

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f95b167801c2..a8e4353ded2f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1457,11 +1457,11 @@ static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
 	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
-static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
+static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
 {
 	switch (ret) {
 	case -EIOCBQUEUED:
-		return ret;
+		break;
 	case -ERESTARTSYS:
 	case -ERESTARTNOINTR:
 	case -ERESTARTNOHAND:
@@ -1474,7 +1474,6 @@ static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
 		/*FALLTHRU*/
 	default:
 		aio_complete_rw(req, ret, 0);
-		return 0;
 	}
 }
 
@@ -1503,10 +1502,10 @@ static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
 		goto out_fput;
 	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret)
-		ret = aio_rw_ret(req, call_read_iter(file, req, &iter));
+		aio_rw_done(req, call_read_iter(file, req, &iter));
 	kfree(iovec);
 out_fput:
-	if (unlikely(ret && ret != -EIOCBQUEUED))
+	if (unlikely(ret))
 		fput(file);
 	return ret;
 }
@@ -1548,11 +1547,11 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
 			__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
 		}
 		req->ki_flags |= IOCB_WRITE;
-		ret = aio_rw_ret(req, call_write_iter(file, req, &iter));
+		aio_rw_done(req, call_write_iter(file, req, &iter));
 	}
 	kfree(iovec);
 out_fput:
-	if (unlikely(ret && ret != -EIOCBQUEUED))
+	if (unlikely(ret))
 		fput(file);
 	return ret;
 }
@@ -1583,7 +1582,7 @@ static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync)
 	req->datasync = datasync;
 	INIT_WORK(&req->work, aio_fsync_work);
 	schedule_work(&req->work);
-	return -EIOCBQUEUED;
+	return 0;
 }
 
 /* need to use list_del_init so we can check if item was present */
@@ -1718,7 +1717,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
 done:
 	if (mask)
 		__aio_poll_complete(aiocb, mask);
-	return -EIOCBQUEUED;
+	return 0;
 out_fail:
 	fput(req->file);
 	return -EINVAL; /* same as no support for IOCB_CMD_POLL */
@@ -1803,12 +1802,11 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	}
 
 	/*
-	 * If ret is -EIOCBQUEUED, ownership of the file reference acquired
-	 * above passed to the file system, which at this point might have
-	 * dropped the reference, so we must be careful to not reference it
-	 * once we have called into the file system.
+	 * If ret is 0, we'd either done aio_complete() ourselves or have
+	 * arranged for that to be done asynchronously.  Anything non-zero
+	 * means that we need to destroy req ourselves.
 	 */
-	if (ret && ret != -EIOCBQUEUED)
+	if (ret)
 		goto out_put_req;
 	return 0;
 out_put_req:
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 3/6] aio_read_events_ring(): make a bit more readable
  2018-05-28 17:57                 ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Al Viro
  2018-05-28 17:57                   ` [PATCH v2 2/6] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
@ 2018-05-28 17:57                   ` Al Viro
  2018-05-28 17:57                   ` [PATCH v2 4/6] aio: shift copyin of iocb into io_submit_one() Al Viro
                                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Al Viro @ 2018-05-28 17:57 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

The logics for 'avail' is
	* not past the tail of cyclic buffer
	* no more than asked
	* not past the end of buffer
	* not past the end of a page

Unobfuscate the last part.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index a8e4353ded2f..51843b057841 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1197,14 +1197,13 @@ static long aio_read_events_ring(struct kioctx *ctx,
 		if (head == tail)
 			break;
 
-		avail = min(avail, nr - ret);
-		avail = min_t(long, avail, AIO_EVENTS_PER_PAGE -
-			    ((head + AIO_EVENTS_OFFSET) % AIO_EVENTS_PER_PAGE));
-
 		pos = head + AIO_EVENTS_OFFSET;
 		page = ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE];
 		pos %= AIO_EVENTS_PER_PAGE;
 
+		avail = min(avail, nr - ret);
+		avail = min_t(long, avail, AIO_EVENTS_PER_PAGE - pos);
+
 		ev = kmap(page);
 		copy_ret = copy_to_user(event + ret, ev + pos,
 					sizeof(*ev) * avail);
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 4/6] aio: shift copyin of iocb into io_submit_one()
  2018-05-28 17:57                 ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Al Viro
  2018-05-28 17:57                   ` [PATCH v2 2/6] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
  2018-05-28 17:57                   ` [PATCH v2 3/6] aio_read_events_ring(): make a bit more readable Al Viro
@ 2018-05-28 17:57                   ` Al Viro
  2018-05-28 17:57                   ` [PATCH v2 5/6] aio: fold do_io_submit() into callers Al Viro
                                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Al Viro @ 2018-05-28 17:57 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 46 ++++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 51843b057841..dca104883f0f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1723,22 +1723,26 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
 }
 
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
-			 struct iocb *iocb, bool compat)
+			 bool compat)
 {
 	struct aio_kiocb *req;
+	struct iocb iocb;
 	ssize_t ret;
 
+	if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb))))
+		return -EFAULT;
+
 	/* enforce forwards compatibility on users */
-	if (unlikely(iocb->aio_reserved2)) {
+	if (unlikely(iocb.aio_reserved2)) {
 		pr_debug("EINVAL: reserve field set\n");
 		return -EINVAL;
 	}
 
 	/* prevent overflows */
 	if (unlikely(
-	    (iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
-	    (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
-	    ((ssize_t)iocb->aio_nbytes < 0)
+	    (iocb.aio_buf != (unsigned long)iocb.aio_buf) ||
+	    (iocb.aio_nbytes != (size_t)iocb.aio_nbytes) ||
+	    ((ssize_t)iocb.aio_nbytes < 0)
 	   )) {
 		pr_debug("EINVAL: overflow check\n");
 		return -EINVAL;
@@ -1748,14 +1752,14 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	if (unlikely(!req))
 		return -EAGAIN;
 
-	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
+	if (iocb.aio_flags & IOCB_FLAG_RESFD) {
 		/*
 		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
 		 * instance of the file* now. The file descriptor must be
 		 * an eventfd() fd, and will be signaled for each completed
 		 * event using the eventfd_signal() function.
 		 */
-		req->ki_eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);
+		req->ki_eventfd = eventfd_ctx_fdget((int) iocb.aio_resfd);
 		if (IS_ERR(req->ki_eventfd)) {
 			ret = PTR_ERR(req->ki_eventfd);
 			req->ki_eventfd = NULL;
@@ -1770,32 +1774,32 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	}
 
 	req->ki_user_iocb = user_iocb;
-	req->ki_user_data = iocb->aio_data;
+	req->ki_user_data = iocb.aio_data;
 
-	switch (iocb->aio_lio_opcode) {
+	switch (iocb.aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
-		ret = aio_read(&req->rw, iocb, false, compat);
+		ret = aio_read(&req->rw, &iocb, false, compat);
 		break;
 	case IOCB_CMD_PWRITE:
-		ret = aio_write(&req->rw, iocb, false, compat);
+		ret = aio_write(&req->rw, &iocb, false, compat);
 		break;
 	case IOCB_CMD_PREADV:
-		ret = aio_read(&req->rw, iocb, true, compat);
+		ret = aio_read(&req->rw, &iocb, true, compat);
 		break;
 	case IOCB_CMD_PWRITEV:
-		ret = aio_write(&req->rw, iocb, true, compat);
+		ret = aio_write(&req->rw, &iocb, true, compat);
 		break;
 	case IOCB_CMD_FSYNC:
-		ret = aio_fsync(&req->fsync, iocb, false);
+		ret = aio_fsync(&req->fsync, &iocb, false);
 		break;
 	case IOCB_CMD_FDSYNC:
-		ret = aio_fsync(&req->fsync, iocb, true);
+		ret = aio_fsync(&req->fsync, &iocb, true);
 		break;
 	case IOCB_CMD_POLL:
-		ret = aio_poll(req, iocb);
+		ret = aio_poll(req, &iocb);
 		break;
 	default:
-		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
+		pr_debug("invalid aio operation %d\n", iocb.aio_lio_opcode);
 		ret = -EINVAL;
 		break;
 	}
@@ -1848,19 +1852,13 @@ static long do_io_submit(aio_context_t ctx_id, long nr,
 	 */
 	for (i=0; i<nr; i++) {
 		struct iocb __user *user_iocb;
-		struct iocb tmp;
 
 		if (unlikely(__get_user(user_iocb, iocbpp + i))) {
 			ret = -EFAULT;
 			break;
 		}
 
-		if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
-			ret = -EFAULT;
-			break;
-		}
-
-		ret = io_submit_one(ctx, user_iocb, &tmp, compat);
+		ret = io_submit_one(ctx, user_iocb, compat);
 		if (ret)
 			break;
 	}
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 5/6] aio: fold do_io_submit() into callers
  2018-05-28 17:57                 ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Al Viro
                                     ` (2 preceding siblings ...)
  2018-05-28 17:57                   ` [PATCH v2 4/6] aio: shift copyin of iocb into io_submit_one() Al Viro
@ 2018-05-28 17:57                   ` Al Viro
  2018-05-29  6:10                     ` Christoph Hellwig
  2018-05-28 17:57                   ` [PATCH v2 6/6] aio: sanitize the limit checking in io_submit(2) Al Viro
  2018-05-29  6:08                   ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Christoph Hellwig
  5 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-28 17:57 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

get rid of insane "copy array of 32bit pointers into an array of
native ones" glue.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 102 ++++++++++++++++++++++++++++++---------------------------------
 1 file changed, 48 insertions(+), 54 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index dca104883f0f..f67b0847ecac 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1821,8 +1821,20 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	return ret;
 }
 
-static long do_io_submit(aio_context_t ctx_id, long nr,
-			  struct iocb __user *__user *iocbpp, bool compat)
+/* sys_io_submit:
+ *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
+ *	the number of iocbs queued.  May return -EINVAL if the aio_context
+ *	specified by ctx_id is invalid, if nr is < 0, if the iocb at
+ *	*iocbpp[0] is not properly initialized, if the operation specified
+ *	is invalid for the file descriptor in the iocb.  May fail with
+ *	-EFAULT if any of the data structures point to invalid data.  May
+ *	fail with -EBADF if the file descriptor specified in the first
+ *	iocb is invalid.  May fail with -EAGAIN if insufficient resources
+ *	are available to queue any iocbs.  Will return 0 if nr is 0.  Will
+ *	fail with -ENOSYS if not implemented.
+ */
+SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
+		struct iocb __user * __user *, iocbpp)
 {
 	struct kioctx *ctx;
 	long ret = 0;
@@ -1835,9 +1847,6 @@ static long do_io_submit(aio_context_t ctx_id, long nr,
 	if (unlikely(nr > LONG_MAX/sizeof(*iocbpp)))
 		nr = LONG_MAX/sizeof(*iocbpp);
 
-	if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
-		return -EFAULT;
-
 	ctx = lookup_ioctx(ctx_id);
 	if (unlikely(!ctx)) {
 		pr_debug("EINVAL: invalid context id\n");
@@ -1845,20 +1854,15 @@ static long do_io_submit(aio_context_t ctx_id, long nr,
 	}
 
 	blk_start_plug(&plug);
-
-	/*
-	 * AKPM: should this return a partial result if some of the IOs were
-	 * successfully submitted?
-	 */
-	for (i=0; i<nr; i++) {
+	for (i = 0; i < nr; i++) {
 		struct iocb __user *user_iocb;
 
-		if (unlikely(__get_user(user_iocb, iocbpp + i))) {
+		if (unlikely(get_user(user_iocb, iocbpp + i))) {
 			ret = -EFAULT;
 			break;
 		}
 
-		ret = io_submit_one(ctx, user_iocb, compat);
+		ret = io_submit_one(ctx, user_iocb, false);
 		if (ret)
 			break;
 	}
@@ -1868,47 +1872,16 @@ static long do_io_submit(aio_context_t ctx_id, long nr,
 	return i ? i : ret;
 }
 
-/* sys_io_submit:
- *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
- *	the number of iocbs queued.  May return -EINVAL if the aio_context
- *	specified by ctx_id is invalid, if nr is < 0, if the iocb at
- *	*iocbpp[0] is not properly initialized, if the operation specified
- *	is invalid for the file descriptor in the iocb.  May fail with
- *	-EFAULT if any of the data structures point to invalid data.  May
- *	fail with -EBADF if the file descriptor specified in the first
- *	iocb is invalid.  May fail with -EAGAIN if insufficient resources
- *	are available to queue any iocbs.  Will return 0 if nr is 0.  Will
- *	fail with -ENOSYS if not implemented.
- */
-SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
-		struct iocb __user * __user *, iocbpp)
-{
-	return do_io_submit(ctx_id, nr, iocbpp, 0);
-}
-
 #ifdef CONFIG_COMPAT
-static inline long
-copy_iocb(long nr, u32 __user *ptr32, struct iocb __user * __user *ptr64)
-{
-	compat_uptr_t uptr;
-	int i;
-
-	for (i = 0; i < nr; ++i) {
-		if (get_user(uptr, ptr32 + i))
-			return -EFAULT;
-		if (put_user(compat_ptr(uptr), ptr64 + i))
-			return -EFAULT;
-	}
-	return 0;
-}
-
 #define MAX_AIO_SUBMITS 	(PAGE_SIZE/sizeof(struct iocb *))
 
 COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
-		       int, nr, u32 __user *, iocb)
+		       int, nr, compat_uptr_t __user *, iocbpp)
 {
-	struct iocb __user * __user *iocb64;
-	long ret;
+	struct kioctx *ctx;
+	long ret = 0;
+	int i = 0;
+	struct blk_plug plug;
 
 	if (unlikely(nr < 0))
 		return -EINVAL;
@@ -1916,11 +1889,32 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
 	if (nr > MAX_AIO_SUBMITS)
 		nr = MAX_AIO_SUBMITS;
 
-	iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64));
-	ret = copy_iocb(nr, iocb, iocb64);
-	if (!ret)
-		ret = do_io_submit(ctx_id, nr, iocb64, 1);
-	return ret;
+	ctx = lookup_ioctx(ctx_id);
+	if (unlikely(!ctx)) {
+		pr_debug("EINVAL: invalid context id\n");
+		return -EINVAL;
+	}
+
+	if (nr > ctx->nr_events)
+		nr = ctx->nr_events;
+
+	blk_start_plug(&plug);
+	for (i = 0; i < nr; i++) {
+		compat_uptr_t user_iocb;
+
+		if (unlikely(get_user(user_iocb, iocbpp + i))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		ret = io_submit_one(ctx, compat_ptr(user_iocb), true);
+		if (ret)
+			break;
+	}
+	blk_finish_plug(&plug);
+
+	percpu_ref_put(&ctx->users);
+	return i ? i : ret;
 }
 #endif
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 6/6] aio: sanitize the limit checking in io_submit(2)
  2018-05-28 17:57                 ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Al Viro
                                     ` (3 preceding siblings ...)
  2018-05-28 17:57                   ` [PATCH v2 5/6] aio: fold do_io_submit() into callers Al Viro
@ 2018-05-28 17:57                   ` Al Viro
  2018-05-29  6:10                     ` Christoph Hellwig
  2018-05-29  6:08                   ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Christoph Hellwig
  5 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-28 17:57 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig

From: Al Viro <viro@zeniv.linux.org.uk>

as it is, the logics in native io_submit(2) is "if asked for
more than LONG_MAX/sizeof(pointer) iocbs to submit, don't
bother with more than LONG_MAX/sizeof(pointer)" (i.e.
512M requests on 32bit and 1E requests on 64bit) while
compat io_submit(2) goes with "stop after the first
PAGE_SIZE/sizeof(pointer) iocbs", i.e. 1K or so.  Which is
	* inconsistent
	* *way* too much in native case
	* possibly too little in compat one
and
	* wrong anyway, since the natural point where we
ought to stop bothering is ctx->nr_events

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f67b0847ecac..aa4071c98335 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1844,15 +1844,15 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
 	if (unlikely(nr < 0))
 		return -EINVAL;
 
-	if (unlikely(nr > LONG_MAX/sizeof(*iocbpp)))
-		nr = LONG_MAX/sizeof(*iocbpp);
-
 	ctx = lookup_ioctx(ctx_id);
 	if (unlikely(!ctx)) {
 		pr_debug("EINVAL: invalid context id\n");
 		return -EINVAL;
 	}
 
+	if (nr > ctx->nr_events)
+		nr = ctx->nr_events;
+
 	blk_start_plug(&plug);
 	for (i = 0; i < nr; i++) {
 		struct iocb __user *user_iocb;
@@ -1873,8 +1873,6 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
 }
 
 #ifdef CONFIG_COMPAT
-#define MAX_AIO_SUBMITS 	(PAGE_SIZE/sizeof(struct iocb *))
-
 COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
 		       int, nr, compat_uptr_t __user *, iocbpp)
 {
@@ -1886,9 +1884,6 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
 	if (unlikely(nr < 0))
 		return -EINVAL;
 
-	if (nr > MAX_AIO_SUBMITS)
-		nr = MAX_AIO_SUBMITS;
-
 	ctx = lookup_ioctx(ctx_id);
 	if (unlikely(!ctx)) {
 		pr_debug("EINVAL: invalid context id\n");
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/4] vmsplice: lift import_iovec() into do_vmsplice()
  2018-05-28 17:54               ` Al Viro
  2018-05-28 17:57                 ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Al Viro
@ 2018-05-28 22:20                 ` Al Viro
  2018-05-28 22:20                   ` [PATCH 2/4] vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart Al Viro
                                     ` (3 more replies)
  1 sibling, 4 replies; 72+ messages in thread
From: Al Viro @ 2018-05-28 22:20 UTC (permalink / raw)
  To: linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c | 69 ++++++++++++++++++++++++++-----------------------------------
 1 file changed, 29 insertions(+), 40 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 005d09cf3fa8..920ff0b20e53 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1242,38 +1242,26 @@ static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
  * For lack of a better implementation, implement vmsplice() to userspace
  * as a simple copy of the pipes pages to the user iov.
  */
-static long vmsplice_to_user(struct file *file, const struct iovec __user *uiov,
-			     unsigned long nr_segs, unsigned int flags)
+static long vmsplice_to_user(struct file *file, struct iov_iter *iter,
+			     unsigned int flags)
 {
-	struct pipe_inode_info *pipe;
-	struct splice_desc sd;
-	long ret;
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
+	struct pipe_inode_info *pipe = get_pipe_info(file);
+	struct splice_desc sd = {
+		.total_len = iov_iter_count(iter),
+		.flags = flags,
+		.u.data = iter
+	};
+	long ret = 0;
 
-	pipe = get_pipe_info(file);
 	if (!pipe)
 		return -EBADF;
 
-	ret = import_iovec(READ, uiov, nr_segs,
-			   ARRAY_SIZE(iovstack), &iov, &iter);
-	if (ret < 0)
-		return ret;
-
-	sd.total_len = iov_iter_count(&iter);
-	sd.len = 0;
-	sd.flags = flags;
-	sd.u.data = &iter;
-	sd.pos = 0;
-
 	if (sd.total_len) {
 		pipe_lock(pipe);
 		ret = __splice_from_pipe(pipe, &sd, pipe_to_user);
 		pipe_unlock(pipe);
 	}
 
-	kfree(iov);
 	return ret;
 }
 
@@ -1282,14 +1270,11 @@ static long vmsplice_to_user(struct file *file, const struct iovec __user *uiov,
  * as splice-from-memory, where the regular splice is splice-from-file (or
  * to file). In both cases the output is a pipe, naturally.
  */
-static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov,
-			     unsigned long nr_segs, unsigned int flags)
+static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
+			     unsigned int flags)
 {
 	struct pipe_inode_info *pipe;
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter from;
-	long ret;
+	long ret = 0;
 	unsigned buf_flag = 0;
 
 	if (flags & SPLICE_F_GIFT)
@@ -1299,19 +1284,13 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov,
 	if (!pipe)
 		return -EBADF;
 
-	ret = import_iovec(WRITE, uiov, nr_segs,
-			   ARRAY_SIZE(iovstack), &iov, &from);
-	if (ret < 0)
-		return ret;
-
 	pipe_lock(pipe);
 	ret = wait_for_space(pipe, flags);
 	if (!ret)
-		ret = iter_to_pipe(&from, pipe, buf_flag);
+		ret = iter_to_pipe(iter, pipe, buf_flag);
 	pipe_unlock(pipe);
 	if (ret > 0)
 		wakeup_pipe_readers(pipe);
-	kfree(iov);
 	return ret;
 }
 
@@ -1331,29 +1310,39 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov,
  * Currently we punt and implement it as a normal copy, see pipe_to_user().
  *
  */
-static long do_vmsplice(int fd, const struct iovec __user *iov,
+static long do_vmsplice(int fd, const struct iovec __user *uiov,
 			unsigned long nr_segs, unsigned int flags)
 {
+	struct iovec iovstack[UIO_FASTIOV];
+	struct iovec *iov = iovstack;
+	struct iov_iter iter;
 	struct fd f;
 	long error;
 
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
-	if (unlikely(nr_segs > UIO_MAXIOV))
-		return -EINVAL;
-	else if (unlikely(!nr_segs))
+
+	error = import_iovec(READ, uiov, nr_segs,
+			   ARRAY_SIZE(iovstack), &iov, &iter);
+	if (error < 0)
+		return error;
+
+	if (!iov_iter_count(&iter)) {
+		kfree(iov);
 		return 0;
+	}
 
 	error = -EBADF;
 	f = fdget(fd);
 	if (f.file) {
 		if (f.file->f_mode & FMODE_WRITE)
-			error = vmsplice_to_pipe(f.file, iov, nr_segs, flags);
+			error = vmsplice_to_pipe(f.file, &iter, flags);
 		else if (f.file->f_mode & FMODE_READ)
-			error = vmsplice_to_user(f.file, iov, nr_segs, flags);
+			error = vmsplice_to_user(f.file, &iter, flags);
 
 		fdput(f);
 	}
+	kfree(iov);
 
 	return error;
 }
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 2/4] vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
  2018-05-28 22:20                 ` [PATCH 1/4] vmsplice: lift import_iovec() into do_vmsplice() Al Viro
@ 2018-05-28 22:20                   ` Al Viro
  2018-05-28 22:20                   ` [PATCH 3/4] signalfd: lift sigmask copyin and size checks to callers of do_signalfd4() Al Viro
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 72+ messages in thread
From: Al Viro @ 2018-05-28 22:20 UTC (permalink / raw)
  To: linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

... getting rid of transformations in the latter - just use
compat_import_iovec().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c | 59 ++++++++++++++++++++++++++++-------------------------------
 1 file changed, 28 insertions(+), 31 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 920ff0b20e53..ab224fea4760 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1310,67 +1310,64 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
  * Currently we punt and implement it as a normal copy, see pipe_to_user().
  *
  */
-static long do_vmsplice(int fd, const struct iovec __user *uiov,
-			unsigned long nr_segs, unsigned int flags)
+static long do_vmsplice(int fd, struct iov_iter *iter, unsigned int flags)
 {
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
 	struct fd f;
 	long error;
 
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
 
-	error = import_iovec(READ, uiov, nr_segs,
-			   ARRAY_SIZE(iovstack), &iov, &iter);
-	if (error < 0)
-		return error;
-
-	if (!iov_iter_count(&iter)) {
-		kfree(iov);
+	if (!iov_iter_count(iter))
 		return 0;
-	}
 
 	error = -EBADF;
 	f = fdget(fd);
 	if (f.file) {
 		if (f.file->f_mode & FMODE_WRITE)
-			error = vmsplice_to_pipe(f.file, &iter, flags);
+			error = vmsplice_to_pipe(f.file, iter, flags);
 		else if (f.file->f_mode & FMODE_READ)
-			error = vmsplice_to_user(f.file, &iter, flags);
+			error = vmsplice_to_user(f.file, iter, flags);
 
 		fdput(f);
 	}
-	kfree(iov);
 
 	return error;
 }
 
-SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, iov,
+SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 		unsigned long, nr_segs, unsigned int, flags)
 {
-	return do_vmsplice(fd, iov, nr_segs, flags);
+	struct iovec iovstack[UIO_FASTIOV];
+	struct iovec *iov = iovstack;
+	struct iov_iter iter;
+	long error;
+
+	error = import_iovec(READ, uiov, nr_segs,
+			   ARRAY_SIZE(iovstack), &iov, &iter);
+	if (!error) {
+		error = do_vmsplice(fd, &iter, flags);
+		kfree(iov);
+	}
+	return error;
 }
 
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, iov32,
 		    unsigned int, nr_segs, unsigned int, flags)
 {
-	unsigned i;
-	struct iovec __user *iov;
-	if (nr_segs > UIO_MAXIOV)
-		return -EINVAL;
-	iov = compat_alloc_user_space(nr_segs * sizeof(struct iovec));
-	for (i = 0; i < nr_segs; i++) {
-		struct compat_iovec v;
-		if (get_user(v.iov_base, &iov32[i].iov_base) ||
-		    get_user(v.iov_len, &iov32[i].iov_len) ||
-		    put_user(compat_ptr(v.iov_base), &iov[i].iov_base) ||
-		    put_user(v.iov_len, &iov[i].iov_len))
-			return -EFAULT;
+	struct iovec iovstack[UIO_FASTIOV];
+	struct iovec *iov = iovstack;
+	struct iov_iter iter;
+	long error;
+
+	error = compat_import_iovec(READ, iov32, nr_segs,
+			   ARRAY_SIZE(iovstack), &iov, &iter);
+	if (!error) {
+		error = do_vmsplice(fd, &iter, flags);
+		kfree(iov);
 	}
-	return do_vmsplice(fd, iov, nr_segs, flags);
+	return error;
 }
 #endif
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 3/4] signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
  2018-05-28 22:20                 ` [PATCH 1/4] vmsplice: lift import_iovec() into do_vmsplice() Al Viro
  2018-05-28 22:20                   ` [PATCH 2/4] vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart Al Viro
@ 2018-05-28 22:20                   ` Al Viro
  2018-05-28 22:20                   ` [PATCH 4/4] orangefs: simplify compat ioctl handling Al Viro
  2018-06-06 22:57                   ` [1/4] vmsplice: lift import_iovec() into do_vmsplice() Andrei Vagin
  3 siblings, 0 replies; 72+ messages in thread
From: Al Viro @ 2018-05-28 22:20 UTC (permalink / raw)
  To: linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/signalfd.c | 50 +++++++++++++++++++++++++-------------------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index d2187a813376..46e9de097507 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -256,10 +256,8 @@ static const struct file_operations signalfd_fops = {
 	.llseek		= noop_llseek,
 };
 
-static int do_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask,
-			int flags)
+static int do_signalfd4(int ufd, sigset_t *mask, int flags)
 {
-	sigset_t sigmask;
 	struct signalfd_ctx *ctx;
 
 	/* Check the SFD_* constants for consistency.  */
@@ -269,18 +267,15 @@ static int do_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask,
 	if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK))
 		return -EINVAL;
 
-	if (sizemask != sizeof(sigset_t) ||
-	    copy_from_user(&sigmask, user_mask, sizeof(sigmask)))
-		return -EINVAL;
-	sigdelsetmask(&sigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
-	signotset(&sigmask);
+	sigdelsetmask(mask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+	signotset(mask);
 
 	if (ufd == -1) {
 		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
 		if (!ctx)
 			return -ENOMEM;
 
-		ctx->sigmask = sigmask;
+		ctx->sigmask = *mask;
 
 		/*
 		 * When we call this, the initialization must be complete, since
@@ -300,7 +295,7 @@ static int do_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask,
 			return -EINVAL;
 		}
 		spin_lock_irq(&current->sighand->siglock);
-		ctx->sigmask = sigmask;
+		ctx->sigmask = *mask;
 		spin_unlock_irq(&current->sighand->siglock);
 
 		wake_up(&current->sighand->signalfd_wqh);
@@ -313,46 +308,51 @@ static int do_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask,
 SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 		size_t, sizemask, int, flags)
 {
-	return do_signalfd4(ufd, user_mask, sizemask, flags);
+	sigset_t mask;
+
+	if (sizemask != sizeof(sigset_t) ||
+	    copy_from_user(&mask, user_mask, sizeof(mask)))
+		return -EINVAL;
+	return do_signalfd4(ufd, &mask, flags);
 }
 
 SYSCALL_DEFINE3(signalfd, int, ufd, sigset_t __user *, user_mask,
 		size_t, sizemask)
 {
-	return do_signalfd4(ufd, user_mask, sizemask, 0);
+	sigset_t mask;
+
+	if (sizemask != sizeof(sigset_t) ||
+	    copy_from_user(&mask, user_mask, sizeof(mask)))
+		return -EINVAL;
+	return do_signalfd4(ufd, &mask, 0);
 }
 
 #ifdef CONFIG_COMPAT
 static long do_compat_signalfd4(int ufd,
-			const compat_sigset_t __user *sigmask,
+			const compat_sigset_t __user *user_mask,
 			compat_size_t sigsetsize, int flags)
 {
-	sigset_t tmp;
-	sigset_t __user *ksigmask;
+	sigset_t mask;
 
 	if (sigsetsize != sizeof(compat_sigset_t))
 		return -EINVAL;
-	if (get_compat_sigset(&tmp, sigmask))
-		return -EFAULT;
-	ksigmask = compat_alloc_user_space(sizeof(sigset_t));
-	if (copy_to_user(ksigmask, &tmp, sizeof(sigset_t)))
+	if (get_compat_sigset(&mask, user_mask))
 		return -EFAULT;
-
-	return do_signalfd4(ufd, ksigmask, sizeof(sigset_t), flags);
+	return do_signalfd4(ufd, &mask, flags);
 }
 
 COMPAT_SYSCALL_DEFINE4(signalfd4, int, ufd,
-		     const compat_sigset_t __user *, sigmask,
+		     const compat_sigset_t __user *, user_mask,
 		     compat_size_t, sigsetsize,
 		     int, flags)
 {
-	return do_compat_signalfd4(ufd, sigmask, sigsetsize, flags);
+	return do_compat_signalfd4(ufd, user_mask, sigsetsize, flags);
 }
 
 COMPAT_SYSCALL_DEFINE3(signalfd, int, ufd,
-		     const compat_sigset_t __user *,sigmask,
+		     const compat_sigset_t __user *, user_mask,
 		     compat_size_t, sigsetsize)
 {
-	return do_compat_signalfd4(ufd, sigmask, sigsetsize, 0);
+	return do_compat_signalfd4(ufd, user_mask, sigsetsize, 0);
 }
 #endif
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 4/4] orangefs: simplify compat ioctl handling
  2018-05-28 22:20                 ` [PATCH 1/4] vmsplice: lift import_iovec() into do_vmsplice() Al Viro
  2018-05-28 22:20                   ` [PATCH 2/4] vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart Al Viro
  2018-05-28 22:20                   ` [PATCH 3/4] signalfd: lift sigmask copyin and size checks to callers of do_signalfd4() Al Viro
@ 2018-05-28 22:20                   ` Al Viro
  2018-05-31 11:11                     ` kbuild test robot
  2018-05-31 20:54                     ` Mike Marshall
  2018-06-06 22:57                   ` [1/4] vmsplice: lift import_iovec() into do_vmsplice() Andrei Vagin
  3 siblings, 2 replies; 72+ messages in thread
From: Al Viro @ 2018-05-28 22:20 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Mike Marshall

From: Al Viro <viro@zeniv.linux.org.uk>

no need to mess with copy_in_user(), etc...

Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/orangefs/devorangefs-req.c | 54 ++++++++++---------------------------------
 1 file changed, 12 insertions(+), 42 deletions(-)

diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
index 66369ec90020..8581daf19634 100644
--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -716,37 +716,6 @@ struct ORANGEFS_dev_map_desc32 {
 	__s32 count;
 };
 
-static unsigned long translate_dev_map26(unsigned long args, long *error)
-{
-	struct ORANGEFS_dev_map_desc32 __user *p32 = (void __user *)args;
-	/*
-	 * Depending on the architecture, allocate some space on the
-	 * user-call-stack based on our expected layout.
-	 */
-	struct ORANGEFS_dev_map_desc __user *p =
-	    compat_alloc_user_space(sizeof(*p));
-	compat_uptr_t addr;
-
-	*error = 0;
-	/* get the ptr from the 32 bit user-space */
-	if (get_user(addr, &p32->ptr))
-		goto err;
-	/* try to put that into a 64-bit layout */
-	if (put_user(compat_ptr(addr), &p->ptr))
-		goto err;
-	/* copy the remaining fields */
-	if (copy_in_user(&p->total_size, &p32->total_size, sizeof(__s32)))
-		goto err;
-	if (copy_in_user(&p->size, &p32->size, sizeof(__s32)))
-		goto err;
-	if (copy_in_user(&p->count, &p32->count, sizeof(__s32)))
-		goto err;
-	return (unsigned long)p;
-err:
-	*error = -EFAULT;
-	return 0;
-}
-
 /*
  * 32 bit user-space apps' ioctl handlers when kernel modules
  * is compiled as a 64 bit one
@@ -755,25 +724,26 @@ static long orangefs_devreq_compat_ioctl(struct file *filp, unsigned int cmd,
 				      unsigned long args)
 {
 	long ret;
-	unsigned long arg = args;
 
 	/* Check for properly constructed commands */
 	ret = check_ioctl_command(cmd);
 	if (ret < 0)
 		return ret;
 	if (cmd == ORANGEFS_DEV_MAP) {
-		/*
-		 * convert the arguments to what we expect internally
-		 * in kernel space
-		 */
-		arg = translate_dev_map26(args, &ret);
-		if (ret < 0) {
-			gossip_err("Could not translate dev map\n");
-			return ret;
-		}
+		struct ORANGEFS_dev_map_desc desc;
+		struct ORANGEFS_dev_map_desc32 d32;
+
+		if (copy_from_user(&d32, (void __user *)args, sizeof(d32)))
+			return -EFAULT;
+
+		desc.ptr = compat_ptr(d32.ptr);
+		desc.total_size = d32.total_size;
+		desc.size = d32.size;
+		desc.count = d32.count;
+		return orangefs_bufmap_initialize(&desc);
 	}
 	/* no other ioctl requires translation */
-	return dispatch_ioctl_command(cmd, arg);
+	return dispatch_ioctl_command(cmd, args);
 }
 
 #endif /* CONFIG_COMPAT is in .config */
-- 
2.11.0

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete()
  2018-05-28 17:57                 ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Al Viro
                                     ` (4 preceding siblings ...)
  2018-05-28 17:57                   ` [PATCH v2 6/6] aio: sanitize the limit checking in io_submit(2) Al Viro
@ 2018-05-29  6:08                   ` Christoph Hellwig
  5 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-29  6:08 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig

On Mon, May 28, 2018 at 06:57:02PM +0100, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> We really want iocb out of io_cancel(2) reach before we start tearing
> it down.

A little helper would be useful, better naming for it welcome:

diff --git a/fs/aio.c b/fs/aio.c
index f95b167801c2..ae5977563b7e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1390,18 +1390,22 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
 	return -EINVAL;
 }
 
+static void aio_remove_iocb(struct aio_kiocb *iocb)
+{
+	struct kioctx *ctx = iocb->ki_ctx;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ctx->ctx_lock, flags);
+	list_del(&iocb->ki_list);
+	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
+}
+
 static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
 {
 	struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
 
-	if (!list_empty_careful(&iocb->ki_list)) {
-		struct kioctx	*ctx = iocb->ki_ctx;
-		unsigned long flags;
-
-		spin_lock_irqsave(&ctx->ctx_lock, flags);
-		list_del(&iocb->ki_list);
-		spin_unlock_irqrestore(&ctx->ctx_lock, flags);
-	}
+	if (!list_empty_careful(&iocb->ki_list))
+		aio_remove_iocb(iocb);
 
 	if (kiocb->ki_flags & IOCB_WRITE) {
 		struct inode *inode = file_inode(kiocb->ki_filp);
@@ -1605,15 +1609,8 @@ static void aio_poll_work(struct work_struct *work)
 {
 	struct aio_kiocb *iocb = container_of(work, struct aio_kiocb, poll.work);
 
-	if (!list_empty_careful(&iocb->ki_list)) {
-		struct kioctx	*ctx = iocb->ki_ctx;
-		unsigned long flags;
-
-		spin_lock_irqsave(&ctx->ctx_lock, flags);
-		list_del(&iocb->ki_list);
-		spin_unlock_irqrestore(&ctx->ctx_lock, flags);
-	}
-
+	if (!list_empty_careful(&iocb->ki_list))
+		aio_remove_iocb(iocb);
 	__aio_poll_complete(iocb, iocb->poll.events);
 }
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 2/6] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
  2018-05-28 17:57                   ` [PATCH v2 2/6] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
@ 2018-05-29  6:08                     ` Christoph Hellwig
  0 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-29  6:08 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig

On Mon, May 28, 2018 at 06:57:03PM +0100, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> ... so just make them return 0 when caller does not need to destroy iocb
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 5/6] aio: fold do_io_submit() into callers
  2018-05-28 17:57                   ` [PATCH v2 5/6] aio: fold do_io_submit() into callers Al Viro
@ 2018-05-29  6:10                     ` Christoph Hellwig
  0 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-29  6:10 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig

On Mon, May 28, 2018 at 06:57:06PM +0100, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> get rid of insane "copy array of 32bit pointers into an array of
> native ones" glue.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


> +	if (nr > ctx->nr_events)
> +		nr = ctx->nr_events;

Shouldn't this be in the next patch?

Except for that:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 6/6] aio: sanitize the limit checking in io_submit(2)
  2018-05-28 17:57                   ` [PATCH v2 6/6] aio: sanitize the limit checking in io_submit(2) Al Viro
@ 2018-05-29  6:10                     ` Christoph Hellwig
  0 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2018-05-29  6:10 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 4/4] orangefs: simplify compat ioctl handling
  2018-05-28 22:20                   ` [PATCH 4/4] orangefs: simplify compat ioctl handling Al Viro
@ 2018-05-31 11:11                     ` kbuild test robot
  2018-05-31 20:54                     ` Mike Marshall
  1 sibling, 0 replies; 72+ messages in thread
From: kbuild test robot @ 2018-05-31 11:11 UTC (permalink / raw)
  To: Al Viro; +Cc: kbuild-all, linux-fsdevel, Mike Marshall

Hi Al,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17-rc7 next-20180530]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Al-Viro/vmsplice-lift-import_iovec-into-do_vmsplice/20180531-161308
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> fs/orangefs/devorangefs-req.c:739:26: sparse: incorrect type in assignment (different address spaces) @@    expected void *ptr @@    got void [noderef] <avoid *ptr @@
   fs/orangefs/devorangefs-req.c:739:26:    expected void *ptr
   fs/orangefs/devorangefs-req.c:739:26:    got void [noderef] <asn:1>*
   fs/orangefs/devorangefs-req.c:158:16: sparse: context imbalance in 'orangefs_devreq_read' - different lock contexts for basic block

vim +739 fs/orangefs/devorangefs-req.c

   718	
   719	/*
   720	 * 32 bit user-space apps' ioctl handlers when kernel modules
   721	 * is compiled as a 64 bit one
   722	 */
   723	static long orangefs_devreq_compat_ioctl(struct file *filp, unsigned int cmd,
   724					      unsigned long args)
   725	{
   726		long ret;
   727	
   728		/* Check for properly constructed commands */
   729		ret = check_ioctl_command(cmd);
   730		if (ret < 0)
   731			return ret;
   732		if (cmd == ORANGEFS_DEV_MAP) {
   733			struct ORANGEFS_dev_map_desc desc;
   734			struct ORANGEFS_dev_map_desc32 d32;
   735	
   736			if (copy_from_user(&d32, (void __user *)args, sizeof(d32)))
   737				return -EFAULT;
   738	
 > 739			desc.ptr = compat_ptr(d32.ptr);
   740			desc.total_size = d32.total_size;
   741			desc.size = d32.size;
   742			desc.count = d32.count;
   743			return orangefs_bufmap_initialize(&desc);
   744		}
   745		/* no other ioctl requires translation */
   746		return dispatch_ioctl_command(cmd, args);
   747	}
   748	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 4/4] orangefs: simplify compat ioctl handling
  2018-05-28 22:20                   ` [PATCH 4/4] orangefs: simplify compat ioctl handling Al Viro
  2018-05-31 11:11                     ` kbuild test robot
@ 2018-05-31 20:54                     ` Mike Marshall
  2018-05-31 21:03                       ` Al Viro
  1 sibling, 1 reply; 72+ messages in thread
From: Mike Marshall @ 2018-05-31 20:54 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

Thanks for the cleanup. This runs through xfstests with no regressions
on 4.17-rc7.

I studied what to do about the sparse warning, looked at the code, and
looked for hints from the original authors in the pvfs svn commit messages.

No luck with the old commit messages.

I got the sparse warning to quit with this change, which also runs through
xfstests with no regressions, does it seem OK?


$ git diff
diff --git a/fs/orangefs/protocol.h b/fs/orangefs/protocol.h
index 61ee8d64c842..d403cf29a99b 100644
--- a/fs/orangefs/protocol.h
+++ b/fs/orangefs/protocol.h
@@ -342,7 +342,7 @@ enum {
  * that may be 32 bit!
  */
 struct ORANGEFS_dev_map_desc {
-       void *ptr;
+       void __user *ptr;
        __s32 total_size;
        __s32 size;
        __s32 count;


Please add: Tested-by: Mike Marshall <hubcap@omnibond.com>

-Mike

On Mon, May 28, 2018 at 6:20 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> no need to mess with copy_in_user(), etc...
>
> Cc: Mike Marshall <hubcap@omnibond.com>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/orangefs/devorangefs-req.c | 54 ++++++++++---------------------------------
>  1 file changed, 12 insertions(+), 42 deletions(-)
>
> diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
> index 66369ec90020..8581daf19634 100644
> --- a/fs/orangefs/devorangefs-req.c
> +++ b/fs/orangefs/devorangefs-req.c
> @@ -716,37 +716,6 @@ struct ORANGEFS_dev_map_desc32 {
>         __s32 count;
>  };
>
> -static unsigned long translate_dev_map26(unsigned long args, long *error)
> -{
> -       struct ORANGEFS_dev_map_desc32 __user *p32 = (void __user *)args;
> -       /*
> -        * Depending on the architecture, allocate some space on the
> -        * user-call-stack based on our expected layout.
> -        */
> -       struct ORANGEFS_dev_map_desc __user *p =
> -           compat_alloc_user_space(sizeof(*p));
> -       compat_uptr_t addr;
> -
> -       *error = 0;
> -       /* get the ptr from the 32 bit user-space */
> -       if (get_user(addr, &p32->ptr))
> -               goto err;
> -       /* try to put that into a 64-bit layout */
> -       if (put_user(compat_ptr(addr), &p->ptr))
> -               goto err;
> -       /* copy the remaining fields */
> -       if (copy_in_user(&p->total_size, &p32->total_size, sizeof(__s32)))
> -               goto err;
> -       if (copy_in_user(&p->size, &p32->size, sizeof(__s32)))
> -               goto err;
> -       if (copy_in_user(&p->count, &p32->count, sizeof(__s32)))
> -               goto err;
> -       return (unsigned long)p;
> -err:
> -       *error = -EFAULT;
> -       return 0;
> -}
> -
>  /*
>   * 32 bit user-space apps' ioctl handlers when kernel modules
>   * is compiled as a 64 bit one
> @@ -755,25 +724,26 @@ static long orangefs_devreq_compat_ioctl(struct file *filp, unsigned int cmd,
>                                       unsigned long args)
>  {
>         long ret;
> -       unsigned long arg = args;
>
>         /* Check for properly constructed commands */
>         ret = check_ioctl_command(cmd);
>         if (ret < 0)
>                 return ret;
>         if (cmd == ORANGEFS_DEV_MAP) {
> -               /*
> -                * convert the arguments to what we expect internally
> -                * in kernel space
> -                */
> -               arg = translate_dev_map26(args, &ret);
> -               if (ret < 0) {
> -                       gossip_err("Could not translate dev map\n");
> -                       return ret;
> -               }
> +               struct ORANGEFS_dev_map_desc desc;
> +               struct ORANGEFS_dev_map_desc32 d32;
> +
> +               if (copy_from_user(&d32, (void __user *)args, sizeof(d32)))
> +                       return -EFAULT;
> +
> +               desc.ptr = compat_ptr(d32.ptr);
> +               desc.total_size = d32.total_size;
> +               desc.size = d32.size;
> +               desc.count = d32.count;
> +               return orangefs_bufmap_initialize(&desc);
>         }
>         /* no other ioctl requires translation */
> -       return dispatch_ioctl_command(cmd, arg);
> +       return dispatch_ioctl_command(cmd, args);
>  }
>
>  #endif /* CONFIG_COMPAT is in .config */
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 4/4] orangefs: simplify compat ioctl handling
  2018-05-31 20:54                     ` Mike Marshall
@ 2018-05-31 21:03                       ` Al Viro
  2018-06-01 21:13                         ` Mike Marshall
  0 siblings, 1 reply; 72+ messages in thread
From: Al Viro @ 2018-05-31 21:03 UTC (permalink / raw)
  To: Mike Marshall; +Cc: linux-fsdevel

On Thu, May 31, 2018 at 04:54:06PM -0400, Mike Marshall wrote:
> Thanks for the cleanup. This runs through xfstests with no regressions
> on 4.17-rc7.
> 
> I studied what to do about the sparse warning, looked at the code, and
> looked for hints from the original authors in the pvfs svn commit messages.
> 
> No luck with the old commit messages.
> 
> I got the sparse warning to quit with this change, which also runs through
> xfstests with no regressions, does it seem OK?
> 
> 
> $ git diff
> diff --git a/fs/orangefs/protocol.h b/fs/orangefs/protocol.h
> index 61ee8d64c842..d403cf29a99b 100644
> --- a/fs/orangefs/protocol.h
> +++ b/fs/orangefs/protocol.h
> @@ -342,7 +342,7 @@ enum {
>   * that may be 32 bit!
>   */
>  struct ORANGEFS_dev_map_desc {
> -       void *ptr;
> +       void __user *ptr;
>         __s32 total_size;
>         __s32 size;
>         __s32 count;

You want more than that -
--- a/fs/orangefs/orangefs-bufmap.c
+++ b/fs/orangefs/orangefs-bufmap.c
@@ -138,7 +138,7 @@ static int get(struct slot_map *m)
 
 /* used to describe mapped buffers */
 struct orangefs_bufmap_desc {
-       void *uaddr;                    /* user space address pointer */
+       void __user *uaddr;             /* user space address pointer */
        struct page **page_array;       /* array of mapped pages */
        int array_count;                /* size of above arrays */
        struct list_head list_link;

to go with it.  FWIW, the following takes care of almost all sparse
warnings in there; up to you whether to split it or not:

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 26358efbf794..84f44365bfb3 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -544,7 +544,7 @@ static int orangefs_fault(struct vm_fault *vmf)
 	return filemap_fault(vmf);
 }
 
-const struct vm_operations_struct orangefs_file_vm_ops = {
+static const struct vm_operations_struct orangefs_file_vm_ops = {
 	.fault = orangefs_fault,
 	.map_pages = filemap_map_pages,
 	.page_mkwrite = filemap_page_mkwrite,
diff --git a/fs/orangefs/orangefs-bufmap.c b/fs/orangefs/orangefs-bufmap.c
index 4f927023d095..0a29f57d4c8f 100644
--- a/fs/orangefs/orangefs-bufmap.c
+++ b/fs/orangefs/orangefs-bufmap.c
@@ -138,7 +138,7 @@ static int get(struct slot_map *m)
 
 /* used to describe mapped buffers */
 struct orangefs_bufmap_desc {
-	void *uaddr;			/* user space address pointer */
+	void __user *uaddr;		/* user space address pointer */
 	struct page **page_array;	/* array of mapped pages */
 	int array_count;		/* size of above arrays */
 	struct list_head list_link;
@@ -215,19 +215,6 @@ int orangefs_bufmap_shift_query(void)
 static DECLARE_WAIT_QUEUE_HEAD(bufmap_waitq);
 static DECLARE_WAIT_QUEUE_HEAD(readdir_waitq);
 
-/*
- * orangefs_get_bufmap_init
- *
- * If bufmap_init is 1, then the shared memory system, including the
- * buffer_index_array, is available.  Otherwise, it is not.
- *
- * returns the value of bufmap_init
- */
-int orangefs_get_bufmap_init(void)
-{
-	return __orangefs_bufmap ? 1 : 0;
-}
-
 
 static struct orangefs_bufmap *
 orangefs_bufmap_alloc(struct ORANGEFS_dev_map_desc *user_desc)
diff --git a/fs/orangefs/orangefs-debugfs.c b/fs/orangefs/orangefs-debugfs.c
index 6e35f2f3c897..0732cb08173e 100644
--- a/fs/orangefs/orangefs-debugfs.c
+++ b/fs/orangefs/orangefs-debugfs.c
@@ -114,7 +114,7 @@ static const struct seq_operations help_debug_ops = {
 	.show	= help_show,
 };
 
-const struct file_operations debug_help_fops = {
+static const struct file_operations debug_help_fops = {
 	.owner		= THIS_MODULE,
 	.open           = orangefs_debug_help_open,
 	.read           = seq_read,
diff --git a/fs/orangefs/protocol.h b/fs/orangefs/protocol.h
index 61ee8d64c842..d403cf29a99b 100644
--- a/fs/orangefs/protocol.h
+++ b/fs/orangefs/protocol.h
@@ -342,7 +342,7 @@ enum {
  * that may be 32 bit!
  */
 struct ORANGEFS_dev_map_desc {
-	void *ptr;
+	void __user *ptr;
 	__s32 total_size;
 	__s32 size;
 	__s32 count;
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
index 0577d6dba8c8..3de323f17506 100644
--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -17,8 +17,11 @@
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
-static int wait_for_matching_downcall(struct orangefs_kernel_op_s *, long, bool);
-static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *);
+static int wait_for_matching_downcall(struct orangefs_kernel_op_s *op,
+				      long timeout, bool interruptible)
+	__acquires(op->lock);
+static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *op)
+	__releases(op->lock);
 
 /*
  * What we do in this function is to walk the list of operations that are
@@ -245,7 +248,8 @@ bool orangefs_cancel_op_in_progress(struct orangefs_kernel_op_s *op)
  * Change an op to the "given up" state and remove it from its list.
  */
 static void
-	orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *op)
+orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *op)
+	__releases(op->lock)
 {
 	/*
 	 * handle interrupted cases depending on what state we were in when
@@ -315,6 +319,7 @@ static void
 static int wait_for_matching_downcall(struct orangefs_kernel_op_s *op,
 				      long timeout,
 				      bool interruptible)
+	__acquires(op->lock)
 {
 	long n;
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 4/4] orangefs: simplify compat ioctl handling
  2018-05-31 21:03                       ` Al Viro
@ 2018-06-01 21:13                         ` Mike Marshall
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Marshall @ 2018-06-01 21:13 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

Thanks Al...

I broke those out into individual commits, tested it all and added it to
our for-next over on kernel.org.

-Mike

On Thu, May 31, 2018 at 5:03 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, May 31, 2018 at 04:54:06PM -0400, Mike Marshall wrote:
>> Thanks for the cleanup. This runs through xfstests with no regressions
>> on 4.17-rc7.
>>
>> I studied what to do about the sparse warning, looked at the code, and
>> looked for hints from the original authors in the pvfs svn commit messages.
>>
>> No luck with the old commit messages.
>>
>> I got the sparse warning to quit with this change, which also runs through
>> xfstests with no regressions, does it seem OK?
>>
>>
>> $ git diff
>> diff --git a/fs/orangefs/protocol.h b/fs/orangefs/protocol.h
>> index 61ee8d64c842..d403cf29a99b 100644
>> --- a/fs/orangefs/protocol.h
>> +++ b/fs/orangefs/protocol.h
>> @@ -342,7 +342,7 @@ enum {
>>   * that may be 32 bit!
>>   */
>>  struct ORANGEFS_dev_map_desc {
>> -       void *ptr;
>> +       void __user *ptr;
>>         __s32 total_size;
>>         __s32 size;
>>         __s32 count;
>
> You want more than that -
> --- a/fs/orangefs/orangefs-bufmap.c
> +++ b/fs/orangefs/orangefs-bufmap.c
> @@ -138,7 +138,7 @@ static int get(struct slot_map *m)
>
>  /* used to describe mapped buffers */
>  struct orangefs_bufmap_desc {
> -       void *uaddr;                    /* user space address pointer */
> +       void __user *uaddr;             /* user space address pointer */
>         struct page **page_array;       /* array of mapped pages */
>         int array_count;                /* size of above arrays */
>         struct list_head list_link;
>
> to go with it.  FWIW, the following takes care of almost all sparse
> warnings in there; up to you whether to split it or not:
>
> diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
> index 26358efbf794..84f44365bfb3 100644
> --- a/fs/orangefs/file.c
> +++ b/fs/orangefs/file.c
> @@ -544,7 +544,7 @@ static int orangefs_fault(struct vm_fault *vmf)
>         return filemap_fault(vmf);
>  }
>
> -const struct vm_operations_struct orangefs_file_vm_ops = {
> +static const struct vm_operations_struct orangefs_file_vm_ops = {
>         .fault = orangefs_fault,
>         .map_pages = filemap_map_pages,
>         .page_mkwrite = filemap_page_mkwrite,
> diff --git a/fs/orangefs/orangefs-bufmap.c b/fs/orangefs/orangefs-bufmap.c
> index 4f927023d095..0a29f57d4c8f 100644
> --- a/fs/orangefs/orangefs-bufmap.c
> +++ b/fs/orangefs/orangefs-bufmap.c
> @@ -138,7 +138,7 @@ static int get(struct slot_map *m)
>
>  /* used to describe mapped buffers */
>  struct orangefs_bufmap_desc {
> -       void *uaddr;                    /* user space address pointer */
> +       void __user *uaddr;             /* user space address pointer */
>         struct page **page_array;       /* array of mapped pages */
>         int array_count;                /* size of above arrays */
>         struct list_head list_link;
> @@ -215,19 +215,6 @@ int orangefs_bufmap_shift_query(void)
>  static DECLARE_WAIT_QUEUE_HEAD(bufmap_waitq);
>  static DECLARE_WAIT_QUEUE_HEAD(readdir_waitq);
>
> -/*
> - * orangefs_get_bufmap_init
> - *
> - * If bufmap_init is 1, then the shared memory system, including the
> - * buffer_index_array, is available.  Otherwise, it is not.
> - *
> - * returns the value of bufmap_init
> - */
> -int orangefs_get_bufmap_init(void)
> -{
> -       return __orangefs_bufmap ? 1 : 0;
> -}
> -
>
>  static struct orangefs_bufmap *
>  orangefs_bufmap_alloc(struct ORANGEFS_dev_map_desc *user_desc)
> diff --git a/fs/orangefs/orangefs-debugfs.c b/fs/orangefs/orangefs-debugfs.c
> index 6e35f2f3c897..0732cb08173e 100644
> --- a/fs/orangefs/orangefs-debugfs.c
> +++ b/fs/orangefs/orangefs-debugfs.c
> @@ -114,7 +114,7 @@ static const struct seq_operations help_debug_ops = {
>         .show   = help_show,
>  };
>
> -const struct file_operations debug_help_fops = {
> +static const struct file_operations debug_help_fops = {
>         .owner          = THIS_MODULE,
>         .open           = orangefs_debug_help_open,
>         .read           = seq_read,
> diff --git a/fs/orangefs/protocol.h b/fs/orangefs/protocol.h
> index 61ee8d64c842..d403cf29a99b 100644
> --- a/fs/orangefs/protocol.h
> +++ b/fs/orangefs/protocol.h
> @@ -342,7 +342,7 @@ enum {
>   * that may be 32 bit!
>   */
>  struct ORANGEFS_dev_map_desc {
> -       void *ptr;
> +       void __user *ptr;
>         __s32 total_size;
>         __s32 size;
>         __s32 count;
> diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
> index 0577d6dba8c8..3de323f17506 100644
> --- a/fs/orangefs/waitqueue.c
> +++ b/fs/orangefs/waitqueue.c
> @@ -17,8 +17,11 @@
>  #include "orangefs-kernel.h"
>  #include "orangefs-bufmap.h"
>
> -static int wait_for_matching_downcall(struct orangefs_kernel_op_s *, long, bool);
> -static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *);
> +static int wait_for_matching_downcall(struct orangefs_kernel_op_s *op,
> +                                     long timeout, bool interruptible)
> +       __acquires(op->lock);
> +static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *op)
> +       __releases(op->lock);
>
>  /*
>   * What we do in this function is to walk the list of operations that are
> @@ -245,7 +248,8 @@ bool orangefs_cancel_op_in_progress(struct orangefs_kernel_op_s *op)
>   * Change an op to the "given up" state and remove it from its list.
>   */
>  static void
> -       orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *op)
> +orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *op)
> +       __releases(op->lock)
>  {
>         /*
>          * handle interrupted cases depending on what state we were in when
> @@ -315,6 +319,7 @@ static void
>  static int wait_for_matching_downcall(struct orangefs_kernel_op_s *op,
>                                       long timeout,
>                                       bool interruptible)
> +       __acquires(op->lock)
>  {
>         long n;
>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [1/4] vmsplice: lift import_iovec() into do_vmsplice()
  2018-05-28 22:20                 ` [PATCH 1/4] vmsplice: lift import_iovec() into do_vmsplice() Al Viro
                                     ` (2 preceding siblings ...)
  2018-05-28 22:20                   ` [PATCH 4/4] orangefs: simplify compat ioctl handling Al Viro
@ 2018-06-06 22:57                   ` Andrei Vagin
  2018-06-07 17:56                     ` Andrei Vagin
  3 siblings, 1 reply; 72+ messages in thread
From: Andrei Vagin @ 2018-06-06 22:57 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

On Mon, May 28, 2018 at 11:20:10PM +0100, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/splice.c | 69 ++++++++++++++++++++++++++-----------------------------------
>  1 file changed, 29 insertions(+), 40 deletions(-)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index 005d09cf3fa8..920ff0b20e53 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1242,38 +1242,26 @@ static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>   * For lack of a better implementation, implement vmsplice() to userspace
>   * as a simple copy of the pipes pages to the user iov.
>   */
> -static long vmsplice_to_user(struct file *file, const struct iovec __user *uiov,
> -			     unsigned long nr_segs, unsigned int flags)
> +static long vmsplice_to_user(struct file *file, struct iov_iter *iter,
> +			     unsigned int flags)
>  {
> -	struct pipe_inode_info *pipe;
> -	struct splice_desc sd;
> -	long ret;
> -	struct iovec iovstack[UIO_FASTIOV];
> -	struct iovec *iov = iovstack;
> -	struct iov_iter iter;
> +	struct pipe_inode_info *pipe = get_pipe_info(file);
> +	struct splice_desc sd = {
> +		.total_len = iov_iter_count(iter),
> +		.flags = flags,
> +		.u.data = iter
> +	};
> +	long ret = 0;
>  
> -	pipe = get_pipe_info(file);
>  	if (!pipe)
>  		return -EBADF;
>  
> -	ret = import_iovec(READ, uiov, nr_segs,
> -			   ARRAY_SIZE(iovstack), &iov, &iter);
> -	if (ret < 0)
> -		return ret;
> -
> -	sd.total_len = iov_iter_count(&iter);
> -	sd.len = 0;
> -	sd.flags = flags;
> -	sd.u.data = &iter;
> -	sd.pos = 0;
> -
>  	if (sd.total_len) {
>  		pipe_lock(pipe);
>  		ret = __splice_from_pipe(pipe, &sd, pipe_to_user);
>  		pipe_unlock(pipe);
>  	}
>  
> -	kfree(iov);
>  	return ret;
>  }
>  
> @@ -1282,14 +1270,11 @@ static long vmsplice_to_user(struct file *file, const struct iovec __user *uiov,
>   * as splice-from-memory, where the regular splice is splice-from-file (or
>   * to file). In both cases the output is a pipe, naturally.
>   */
> -static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov,
> -			     unsigned long nr_segs, unsigned int flags)
> +static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
> +			     unsigned int flags)
>  {
>  	struct pipe_inode_info *pipe;
> -	struct iovec iovstack[UIO_FASTIOV];
> -	struct iovec *iov = iovstack;
> -	struct iov_iter from;
> -	long ret;
> +	long ret = 0;
>  	unsigned buf_flag = 0;
>  
>  	if (flags & SPLICE_F_GIFT)
> @@ -1299,19 +1284,13 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov,
>  	if (!pipe)
>  		return -EBADF;
>  
> -	ret = import_iovec(WRITE, uiov, nr_segs,
> -			   ARRAY_SIZE(iovstack), &iov, &from);
> -	if (ret < 0)
> -		return ret;
> -
>  	pipe_lock(pipe);
>  	ret = wait_for_space(pipe, flags);
>  	if (!ret)
> -		ret = iter_to_pipe(&from, pipe, buf_flag);
> +		ret = iter_to_pipe(iter, pipe, buf_flag);
>  	pipe_unlock(pipe);
>  	if (ret > 0)
>  		wakeup_pipe_readers(pipe);
> -	kfree(iov);
>  	return ret;
>  }
>  
> @@ -1331,29 +1310,39 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov,
>   * Currently we punt and implement it as a normal copy, see pipe_to_user().
>   *
>   */
> -static long do_vmsplice(int fd, const struct iovec __user *iov,
> +static long do_vmsplice(int fd, const struct iovec __user *uiov,
>  			unsigned long nr_segs, unsigned int flags)
>  {
> +	struct iovec iovstack[UIO_FASTIOV];
> +	struct iovec *iov = iovstack;
> +	struct iov_iter iter;
>  	struct fd f;
>  	long error;
>  
>  	if (unlikely(flags & ~SPLICE_F_ALL))
>  		return -EINVAL;
> -	if (unlikely(nr_segs > UIO_MAXIOV))
> -		return -EINVAL;
> -	else if (unlikely(!nr_segs))
> +
> +	error = import_iovec(READ, uiov, nr_segs,
> +			   ARRAY_SIZE(iovstack), &iov, &iter);

import_iovec should be called with WRITE, if we are going to call
vmsplice_to_pipe().

> +	if (error < 0)
> +		return error;
> +
> +	if (!iov_iter_count(&iter)) {
> +		kfree(iov);
>  		return 0;
> +	}
>  
>  	error = -EBADF;
>  	f = fdget(fd);
>  	if (f.file) {
>  		if (f.file->f_mode & FMODE_WRITE)
> -			error = vmsplice_to_pipe(f.file, iov, nr_segs, flags);
> +			error = vmsplice_to_pipe(f.file, &iter, flags);
>  		else if (f.file->f_mode & FMODE_READ)
> -			error = vmsplice_to_user(f.file, iov, nr_segs, flags);
> +			error = vmsplice_to_user(f.file, &iter, flags);
>  
>  		fdput(f);
>  	}
> +	kfree(iov);
>  
>  	return error;
>  }

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [1/4] vmsplice: lift import_iovec() into do_vmsplice()
  2018-06-06 22:57                   ` [1/4] vmsplice: lift import_iovec() into do_vmsplice() Andrei Vagin
@ 2018-06-07 17:56                     ` Andrei Vagin
  2018-06-11 20:14                       ` Cyrill Gorcunov
  0 siblings, 1 reply; 72+ messages in thread
From: Andrei Vagin @ 2018-06-07 17:56 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

On Wed, Jun 06, 2018 at 03:57:51PM -0700, Andrei Vagin wrote:
> On Mon, May 28, 2018 at 11:20:10PM +0100, Al Viro wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> >  fs/splice.c | 69 ++++++++++++++++++++++++++-----------------------------------
> >  1 file changed, 29 insertions(+), 40 deletions(-)
> > 
> > diff --git a/fs/splice.c b/fs/splice.c
> > index 005d09cf3fa8..920ff0b20e53 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -1242,38 +1242,26 @@ static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> >   * For lack of a better implementation, implement vmsplice() to userspace
> >   * as a simple copy of the pipes pages to the user iov.
> >   */
> > -static long vmsplice_to_user(struct file *file, const struct iovec __user *uiov,
> > -			     unsigned long nr_segs, unsigned int flags)
> > +static long vmsplice_to_user(struct file *file, struct iov_iter *iter,
> > +			     unsigned int flags)
> >  {
> > -	struct pipe_inode_info *pipe;
> > -	struct splice_desc sd;
> > -	long ret;
> > -	struct iovec iovstack[UIO_FASTIOV];
> > -	struct iovec *iov = iovstack;
> > -	struct iov_iter iter;
> > +	struct pipe_inode_info *pipe = get_pipe_info(file);
> > +	struct splice_desc sd = {
> > +		.total_len = iov_iter_count(iter),
> > +		.flags = flags,
> > +		.u.data = iter
> > +	};
> > +	long ret = 0;
> >  
> > -	pipe = get_pipe_info(file);
> >  	if (!pipe)
> >  		return -EBADF;
> >  
> > -	ret = import_iovec(READ, uiov, nr_segs,
> > -			   ARRAY_SIZE(iovstack), &iov, &iter);
> > -	if (ret < 0)
> > -		return ret;
> > -
> > -	sd.total_len = iov_iter_count(&iter);
> > -	sd.len = 0;
> > -	sd.flags = flags;
> > -	sd.u.data = &iter;
> > -	sd.pos = 0;
> > -
> >  	if (sd.total_len) {
> >  		pipe_lock(pipe);
> >  		ret = __splice_from_pipe(pipe, &sd, pipe_to_user);
> >  		pipe_unlock(pipe);
> >  	}
> >  
> > -	kfree(iov);
> >  	return ret;
> >  }
> >  
> > @@ -1282,14 +1270,11 @@ static long vmsplice_to_user(struct file *file, const struct iovec __user *uiov,
> >   * as splice-from-memory, where the regular splice is splice-from-file (or
> >   * to file). In both cases the output is a pipe, naturally.
> >   */
> > -static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov,
> > -			     unsigned long nr_segs, unsigned int flags)
> > +static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
> > +			     unsigned int flags)
> >  {
> >  	struct pipe_inode_info *pipe;
> > -	struct iovec iovstack[UIO_FASTIOV];
> > -	struct iovec *iov = iovstack;
> > -	struct iov_iter from;
> > -	long ret;
> > +	long ret = 0;
> >  	unsigned buf_flag = 0;
> >  
> >  	if (flags & SPLICE_F_GIFT)
> > @@ -1299,19 +1284,13 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov,
> >  	if (!pipe)
> >  		return -EBADF;
> >  
> > -	ret = import_iovec(WRITE, uiov, nr_segs,
> > -			   ARRAY_SIZE(iovstack), &iov, &from);
> > -	if (ret < 0)
> > -		return ret;
> > -
> >  	pipe_lock(pipe);
> >  	ret = wait_for_space(pipe, flags);
> >  	if (!ret)
> > -		ret = iter_to_pipe(&from, pipe, buf_flag);
> > +		ret = iter_to_pipe(iter, pipe, buf_flag);
> >  	pipe_unlock(pipe);
> >  	if (ret > 0)
> >  		wakeup_pipe_readers(pipe);
> > -	kfree(iov);
> >  	return ret;
> >  }
> >  
> > @@ -1331,29 +1310,39 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov,
> >   * Currently we punt and implement it as a normal copy, see pipe_to_user().
> >   *
> >   */
> > -static long do_vmsplice(int fd, const struct iovec __user *iov,
> > +static long do_vmsplice(int fd, const struct iovec __user *uiov,
> >  			unsigned long nr_segs, unsigned int flags)
> >  {
> > +	struct iovec iovstack[UIO_FASTIOV];
> > +	struct iovec *iov = iovstack;
> > +	struct iov_iter iter;
> >  	struct fd f;
> >  	long error;
> >  
> >  	if (unlikely(flags & ~SPLICE_F_ALL))
> >  		return -EINVAL;
> > -	if (unlikely(nr_segs > UIO_MAXIOV))
> > -		return -EINVAL;
> > -	else if (unlikely(!nr_segs))
> > +
> > +	error = import_iovec(READ, uiov, nr_segs,
> > +			   ARRAY_SIZE(iovstack), &iov, &iter);
> 
> import_iovec should be called with WRITE, if we are going to call
> vmsplice_to_pipe().

We caught this issue, when we run CRIU tests for
https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/log/?h=for-next

CRIU fails with errors like this:

pie: 38: Error (criu/pie/parasite.c:89): Can't splice pages to pipe (-14/2/0)

Thanks,
Andrei

> 
> > +	if (error < 0)
> > +		return error;
> > +
> > +	if (!iov_iter_count(&iter)) {
> > +		kfree(iov);
> >  		return 0;
> > +	}
> >  
> >  	error = -EBADF;
> >  	f = fdget(fd);
> >  	if (f.file) {
> >  		if (f.file->f_mode & FMODE_WRITE)
> > -			error = vmsplice_to_pipe(f.file, iov, nr_segs, flags);
> > +			error = vmsplice_to_pipe(f.file, &iter, flags);
> >  		else if (f.file->f_mode & FMODE_READ)
> > -			error = vmsplice_to_user(f.file, iov, nr_segs, flags);
> > +			error = vmsplice_to_user(f.file, &iter, flags);
> >  
> >  		fdput(f);
> >  	}
> > +	kfree(iov);
> >  
> >  	return error;
> >  }

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [1/4] vmsplice: lift import_iovec() into do_vmsplice()
  2018-06-07 17:56                     ` Andrei Vagin
@ 2018-06-11 20:14                       ` Cyrill Gorcunov
  2018-06-11 20:16                         ` Al Viro
  0 siblings, 1 reply; 72+ messages in thread
From: Cyrill Gorcunov @ 2018-06-11 20:14 UTC (permalink / raw)
  To: Andrey Vagin; +Cc: Al Viro, Linux FS Devel

On Thu, Jun 7, 2018 at 10:07 PM Andrei Vagin <avagin@virtuozzo.com> wrote:
>
...
> > > +
> > > +   error = import_iovec(READ, uiov, nr_segs,
> > > +                      ARRAY_SIZE(iovstack), &iov, &iter);
> >
> > import_iovec should be called with WRITE, if we are going to call
> > vmsplice_to_pipe().
>
> We caught this issue, when we run CRIU tests for
> https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/log/?h=for-next
>
> CRIU fails with errors like this:
>
> pie: 38: Error (criu/pie/parasite.c:89): Can't splice pages to pipe (-14/2/0)
>
It is already in the master tree, or still in the linux-next?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [1/4] vmsplice: lift import_iovec() into do_vmsplice()
  2018-06-11 20:14                       ` Cyrill Gorcunov
@ 2018-06-11 20:16                         ` Al Viro
  2018-06-11 20:18                           ` Cyrill Gorcunov
  2018-06-14 22:22                           ` Andrey Vagin
  0 siblings, 2 replies; 72+ messages in thread
From: Al Viro @ 2018-06-11 20:16 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Andrey Vagin, Linux FS Devel

On Mon, Jun 11, 2018 at 11:14:55PM +0300, Cyrill Gorcunov wrote:
> On Thu, Jun 7, 2018 at 10:07 PM Andrei Vagin <avagin@virtuozzo.com> wrote:
> >
> ...
> > > > +
> > > > +   error = import_iovec(READ, uiov, nr_segs,
> > > > +                      ARRAY_SIZE(iovstack), &iov, &iter);
> > >
> > > import_iovec should be called with WRITE, if we are going to call
> > > vmsplice_to_pipe().
> >
> > We caught this issue, when we run CRIU tests for
> > https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/log/?h=for-next
> >
> > CRIU fails with errors like this:
> >
> > pie: 38: Error (criu/pie/parasite.c:89): Can't splice pages to pipe (-14/2/0)
> >
> It is already in the master tree, or still in the linux-next?

Still in -next; I have a fixed variant, hadn't pushed it out yet.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [1/4] vmsplice: lift import_iovec() into do_vmsplice()
  2018-06-11 20:16                         ` Al Viro
@ 2018-06-11 20:18                           ` Cyrill Gorcunov
  2018-06-14 22:22                           ` Andrey Vagin
  1 sibling, 0 replies; 72+ messages in thread
From: Cyrill Gorcunov @ 2018-06-11 20:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Andrey Vagin, Linux FS Devel

On Mon, Jun 11, 2018 at 11:16 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
...
> > It is already in the master tree, or still in the linux-next?
>
> Still in -next; I have a fixed variant, hadn't pushed it out yet.

Thanks a lot, Al!

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [1/4] vmsplice: lift import_iovec() into do_vmsplice()
  2018-06-11 20:16                         ` Al Viro
  2018-06-11 20:18                           ` Cyrill Gorcunov
@ 2018-06-14 22:22                           ` Andrey Vagin
  1 sibling, 0 replies; 72+ messages in thread
From: Andrey Vagin @ 2018-06-14 22:22 UTC (permalink / raw)
  To: Al Viro; +Cc: Cyrill Gorcunov, Linux FS Devel

On Mon, Jun 11, 2018 at 09:16:56PM +0100, Al Viro wrote:
> On Mon, Jun 11, 2018 at 11:14:55PM +0300, Cyrill Gorcunov wrote:
> > On Thu, Jun 7, 2018 at 10:07 PM Andrei Vagin <avagin@virtuozzo.com> wrote:
> > >
> > ...
> > > > > +
> > > > > +   error = import_iovec(READ, uiov, nr_segs,
> > > > > +                      ARRAY_SIZE(iovstack), &iov, &iter);
> > > >
> > > > import_iovec should be called with WRITE, if we are going to call
> > > > vmsplice_to_pipe().
> > >
> > > We caught this issue, when we run CRIU tests for
> > > https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/log/?h=for-next
> > >
> > > CRIU fails with errors like this:
> > >
> > > pie: 38: Error (criu/pie/parasite.c:89): Can't splice pages to pipe (-14/2/0)
> > >
> > It is already in the master tree, or still in the linux-next?
> 
> Still in -next; I have a fixed variant, hadn't pushed it out yet.

Al, can I ask you to push the fixed variant to -next or revert this
one. We have a robot which regularly runs CRIU tests on -next, and now it
is blocked by this issue.

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2018-06-14 22:22 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-23 19:19 aio poll and a new in-kernel poll API V13 Christoph Hellwig
2018-05-23 19:19 ` [PATCH 01/33] fix io_destroy()/aio_complete() race Christoph Hellwig
2018-05-23 19:19 ` [PATCH 02/33] uapi: turn __poll_t sparse checkin on by default Christoph Hellwig
2018-05-23 19:19 ` [PATCH 03/33] fs: unexport poll_schedule_timeout Christoph Hellwig
2018-05-23 19:19 ` [PATCH 04/33] fs: cleanup do_pollfd Christoph Hellwig
2018-05-23 19:19 ` [PATCH 05/33] fs: update documentation to mention __poll_t and match the code Christoph Hellwig
2018-05-23 19:19 ` [PATCH 06/33] fs: add new vfs_poll and file_can_poll helpers Christoph Hellwig
2018-05-23 19:19 ` [PATCH 07/33] fs: introduce new ->get_poll_head and ->poll_mask methods Christoph Hellwig
2018-05-23 19:19 ` [PATCH 08/33] aio: simplify KIOCB_KEY handling Christoph Hellwig
2018-05-23 19:19 ` [PATCH 09/33] aio: simplify cancellation Christoph Hellwig
2018-05-23 19:19 ` [PATCH 10/33] aio: implement IOCB_CMD_POLL Christoph Hellwig
2018-05-23 19:20 ` [PATCH 11/33] aio: try to complete poll iocbs without context switch Christoph Hellwig
2018-05-23 19:20 ` [PATCH 12/33] net: refactor socket_poll Christoph Hellwig
2018-05-23 19:20 ` [PATCH 13/33] net: add support for ->poll_mask in proto_ops Christoph Hellwig
2018-05-23 19:20 ` [PATCH 14/33] net: remove sock_no_poll Christoph Hellwig
2018-05-23 19:20 ` [PATCH 15/33] net/tcp: convert to ->poll_mask Christoph Hellwig
2018-05-23 19:20 ` [PATCH 16/33] net/unix: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 17/33] net: convert datagram_poll users tp ->poll_mask Christoph Hellwig
2018-05-23 19:20 ` [PATCH 18/33] net/dccp: convert to ->poll_mask Christoph Hellwig
2018-05-23 19:20 ` [PATCH 19/33] net/atm: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 20/33] net/vmw_vsock: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 21/33] net/tipc: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 22/33] net/sctp: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 23/33] net/bluetooth: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 24/33] net/caif: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 25/33] net/nfc: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 26/33] net/phonet: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 27/33] net/iucv: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 28/33] net/rxrpc: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 29/33] crypto: af_alg: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 30/33] pipe: " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 31/33] eventfd: switch " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 32/33] timerfd: convert " Christoph Hellwig
2018-05-23 19:20 ` [PATCH 33/33] random: " Christoph Hellwig
2018-05-26  0:11 ` aio poll and a new in-kernel poll API V13 Al Viro
2018-05-26  7:09   ` Al Viro
2018-05-26  7:23     ` Christoph Hellwig
2018-05-27 22:27       ` Al Viro
2018-05-27 22:28         ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
2018-05-27 22:28           ` [PATCH 2/4] aio_read_events_ring(): make a bit more readable Al Viro
2018-05-27 22:28           ` [PATCH 3/4] aio: shift copyin of iocb into io_submit_one() Al Viro
2018-05-28  5:16             ` Christoph Hellwig
2018-05-27 22:28           ` [PATCH 4/4] aio: fold do_io_submit() into callers Al Viro
2018-05-27 23:14             ` Al Viro
2018-05-28  5:24               ` Christoph Hellwig
2018-05-28  5:15           ` [PATCH 1/4] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Christoph Hellwig
2018-05-28 14:04             ` Al Viro
2018-05-28 17:54               ` Al Viro
2018-05-28 17:57                 ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Al Viro
2018-05-28 17:57                   ` [PATCH v2 2/6] aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way Al Viro
2018-05-29  6:08                     ` Christoph Hellwig
2018-05-28 17:57                   ` [PATCH v2 3/6] aio_read_events_ring(): make a bit more readable Al Viro
2018-05-28 17:57                   ` [PATCH v2 4/6] aio: shift copyin of iocb into io_submit_one() Al Viro
2018-05-28 17:57                   ` [PATCH v2 5/6] aio: fold do_io_submit() into callers Al Viro
2018-05-29  6:10                     ` Christoph Hellwig
2018-05-28 17:57                   ` [PATCH v2 6/6] aio: sanitize the limit checking in io_submit(2) Al Viro
2018-05-29  6:10                     ` Christoph Hellwig
2018-05-29  6:08                   ` [PATCH v2 1/6] aio: take list removal to (some) callers of aio_complete() Christoph Hellwig
2018-05-28 22:20                 ` [PATCH 1/4] vmsplice: lift import_iovec() into do_vmsplice() Al Viro
2018-05-28 22:20                   ` [PATCH 2/4] vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart Al Viro
2018-05-28 22:20                   ` [PATCH 3/4] signalfd: lift sigmask copyin and size checks to callers of do_signalfd4() Al Viro
2018-05-28 22:20                   ` [PATCH 4/4] orangefs: simplify compat ioctl handling Al Viro
2018-05-31 11:11                     ` kbuild test robot
2018-05-31 20:54                     ` Mike Marshall
2018-05-31 21:03                       ` Al Viro
2018-06-01 21:13                         ` Mike Marshall
2018-06-06 22:57                   ` [1/4] vmsplice: lift import_iovec() into do_vmsplice() Andrei Vagin
2018-06-07 17:56                     ` Andrei Vagin
2018-06-11 20:14                       ` Cyrill Gorcunov
2018-06-11 20:16                         ` Al Viro
2018-06-11 20:18                           ` Cyrill Gorcunov
2018-06-14 22:22                           ` Andrey Vagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).