linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v2 0/3] Convert fs drivers to ->read_iter()
@ 2024-04-03 14:02 Jens Axboe
  2024-04-03 14:02 ` [PATCH 1/3] timerfd: convert " Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jens Axboe @ 2024-04-03 14:02 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, linux-kernel

Hi,

There are still a few users of fops->read() in the core parts of the
fs stack. Which is a shame, since it'd be nice to get rid of the
non-iterator parts of down the line, and reclaim that part of the
file_operations struct.

Outside of moving in that direction as a cleanup, using ->read_iter()
enables us to mark them with FMODE_NOWAIT. This is important for users
like io_uring, where per-IO nonblocking hints make a difference in how
efficiently IO can be done.

Those two things are my main motivation for starting this work, with
hopefully more to come down the line.

All patches have been booted and tested, and the corresponding test
cases been run. The timerfd one was sent separately previously, figured
I'd do a few more and make a smaller series out of it.

Since v1:
- Include uio.h in userfaultfd, or it fails to see iov_iter_count()
  on some configs
- Drop (now) unused siginfo pointer in signalfd and the incrementing of
  it, we don't use it anymore
- Add missing put_unused_fd() for userfaultfd error path

 fs/signalfd.c    | 46 ++++++++++++++++++++++++++++------------------
 fs/timerfd.c     | 31 ++++++++++++++++++++++---------
 fs/userfaultfd.c | 44 ++++++++++++++++++++++++++++----------------
 3 files changed, 78 insertions(+), 43 deletions(-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] timerfd: convert to ->read_iter()
  2024-04-03 14:02 [PATCHSET v2 0/3] Convert fs drivers to ->read_iter() Jens Axboe
@ 2024-04-03 14:02 ` Jens Axboe
  2024-04-03 22:40   ` Al Viro
  2024-04-03 14:02 ` [PATCH 2/3] userfaultfd: " Jens Axboe
  2024-04-03 14:02 ` [PATCH 3/3] signalfd: " Jens Axboe
  2 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2024-04-03 14:02 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, linux-kernel, Jens Axboe

Switch timerfd to using fops->read_iter(), so it can support not just
O_NONBLOCK but IOCB_NOWAIT as well. With the latter, users like io_uring
interact with timerfds a lot better, as they can be driven purely
by the poll trigger.

Manually get and install the required fd, so that FMODE_NOWAIT can be
set before the file is installed into the file table.

No functional changes intended in this patch, it's purely a straight
conversion to using the read iterator method.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/timerfd.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index e9c96a0c79f1..b96690b46c1f 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -262,17 +262,18 @@ static __poll_t timerfd_poll(struct file *file, poll_table *wait)
 	return events;
 }
 
-static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
-			    loff_t *ppos)
+static ssize_t timerfd_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
+	struct file *file = iocb->ki_filp;
 	struct timerfd_ctx *ctx = file->private_data;
 	ssize_t res;
 	u64 ticks = 0;
 
-	if (count < sizeof(ticks))
+	if (iov_iter_count(to) < sizeof(ticks))
 		return -EINVAL;
+
 	spin_lock_irq(&ctx->wqh.lock);
-	if (file->f_flags & O_NONBLOCK)
+	if (file->f_flags & O_NONBLOCK || iocb->ki_flags & IOCB_NOWAIT)
 		res = -EAGAIN;
 	else
 		res = wait_event_interruptible_locked_irq(ctx->wqh, ctx->ticks);
@@ -313,7 +314,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
 	}
 	spin_unlock_irq(&ctx->wqh.lock);
 	if (ticks)
-		res = put_user(ticks, (u64 __user *) buf) ? -EFAULT: sizeof(ticks);
+		res = copy_to_iter(&ticks, sizeof(ticks), to);
 	return res;
 }
 
@@ -384,7 +385,7 @@ static long timerfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg
 static const struct file_operations timerfd_fops = {
 	.release	= timerfd_release,
 	.poll		= timerfd_poll,
-	.read		= timerfd_read,
+	.read_iter	= timerfd_read_iter,
 	.llseek		= noop_llseek,
 	.show_fdinfo	= timerfd_show,
 	.unlocked_ioctl	= timerfd_ioctl,
@@ -407,6 +408,7 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 {
 	int ufd;
 	struct timerfd_ctx *ctx;
+	struct file *file;
 
 	/* Check the TFD_* constants for consistency.  */
 	BUILD_BUG_ON(TFD_CLOEXEC != O_CLOEXEC);
@@ -443,11 +445,22 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 
 	ctx->moffs = ktime_mono_to_real(0);
 
-	ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
-			       O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
-	if (ufd < 0)
+	ufd = get_unused_fd_flags(O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
+	if (ufd < 0) {
 		kfree(ctx);
+		return ufd;
+	}
+
+	file = anon_inode_getfile("[timerfd]", &timerfd_fops, ctx,
+				    O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
+	if (IS_ERR(file)) {
+		put_unused_fd(ufd);
+		kfree(ctx);
+		return PTR_ERR(file);
+	}
 
+	file->f_mode |= FMODE_NOWAIT;
+	fd_install(ufd, file);
 	return ufd;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] userfaultfd: convert to ->read_iter()
  2024-04-03 14:02 [PATCHSET v2 0/3] Convert fs drivers to ->read_iter() Jens Axboe
  2024-04-03 14:02 ` [PATCH 1/3] timerfd: convert " Jens Axboe
@ 2024-04-03 14:02 ` Jens Axboe
  2024-04-03 22:45   ` Al Viro
  2024-04-03 14:02 ` [PATCH 3/3] signalfd: " Jens Axboe
  2 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2024-04-03 14:02 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, linux-kernel, Jens Axboe

Rather than use the older style ->read() hook, use ->read_iter() so that
userfaultfd can support both O_NONBLOCK and IOCB_NOWAIT for non-blocking
read attempts.

Split the fd setup into two parts, so that userfaultfd can mark the file
mode with FMODE_NOWAIT before installing it into the process table. With
that, we can also defer grabbing the mm until we know the rest will
succeed, as the fd isn't visible before then.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/userfaultfd.c | 44 ++++++++++++++++++++++++++++----------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 60dcfafdc11a..d2f5409d60b6 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -31,6 +31,7 @@
 #include <linux/hugetlb.h>
 #include <linux/swapops.h>
 #include <linux/miscdevice.h>
+#include <linux/uio.h>
 
 static int sysctl_unprivileged_userfaultfd __read_mostly;
 
@@ -282,7 +283,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 /*
  * Verify the pagetables are still not ok after having reigstered into
  * the fault_pending_wqh to avoid userland having to UFFDIO_WAKE any
- * userfault that has already been resolved, if userfaultfd_read and
+ * userfault that has already been resolved, if userfaultfd_read_iter and
  * UFFDIO_COPY|ZEROPAGE are being run simultaneously on two different
  * threads.
  */
@@ -1177,34 +1178,34 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
 	return ret;
 }
 
-static ssize_t userfaultfd_read(struct file *file, char __user *buf,
-				size_t count, loff_t *ppos)
+static ssize_t userfaultfd_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
+	struct file *file = iocb->ki_filp;
 	struct userfaultfd_ctx *ctx = file->private_data;
 	ssize_t _ret, ret = 0;
 	struct uffd_msg msg;
-	int no_wait = file->f_flags & O_NONBLOCK;
 	struct inode *inode = file_inode(file);
+	bool no_wait;
 
 	if (!userfaultfd_is_initialized(ctx))
 		return -EINVAL;
 
+	no_wait = file->f_flags & O_NONBLOCK || iocb->ki_flags & IOCB_NOWAIT;
 	for (;;) {
-		if (count < sizeof(msg))
+		if (iov_iter_count(to) < sizeof(msg))
 			return ret ? ret : -EINVAL;
 		_ret = userfaultfd_ctx_read(ctx, no_wait, &msg, inode);
 		if (_ret < 0)
 			return ret ? ret : _ret;
-		if (copy_to_user((__u64 __user *) buf, &msg, sizeof(msg)))
+		_ret = copy_to_iter(&msg, sizeof(msg), to);
+		if (_ret < 0)
 			return ret ? ret : -EFAULT;
 		ret += sizeof(msg);
-		buf += sizeof(msg);
-		count -= sizeof(msg);
 		/*
 		 * Allow to read more than one fault at time but only
 		 * block if waiting for the very first one.
 		 */
-		no_wait = O_NONBLOCK;
+		no_wait = true;
 	}
 }
 
@@ -2172,7 +2173,7 @@ static const struct file_operations userfaultfd_fops = {
 #endif
 	.release	= userfaultfd_release,
 	.poll		= userfaultfd_poll,
-	.read		= userfaultfd_read,
+	.read_iter	= userfaultfd_read_iter,
 	.unlocked_ioctl = userfaultfd_ioctl,
 	.compat_ioctl	= compat_ptr_ioctl,
 	.llseek		= noop_llseek,
@@ -2192,6 +2193,7 @@ static void init_once_userfaultfd_ctx(void *mem)
 static int new_userfaultfd(int flags)
 {
 	struct userfaultfd_ctx *ctx;
+	struct file *file;
 	int fd;
 
 	BUG_ON(!current->mm);
@@ -2215,16 +2217,26 @@ static int new_userfaultfd(int flags)
 	init_rwsem(&ctx->map_changing_lock);
 	atomic_set(&ctx->mmap_changing, 0);
 	ctx->mm = current->mm;
-	/* prevent the mm struct to be freed */
-	mmgrab(ctx->mm);
+
+	fd = get_unused_fd_flags(O_RDONLY | (flags & UFFD_SHARED_FCNTL_FLAGS));
+	if (fd < 0)
+		goto err_out;
 
 	/* Create a new inode so that the LSM can block the creation.  */
-	fd = anon_inode_create_getfd("[userfaultfd]", &userfaultfd_fops, ctx,
+	file = anon_inode_create_getfile("[userfaultfd]", &userfaultfd_fops, ctx,
 			O_RDONLY | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
-	if (fd < 0) {
-		mmdrop(ctx->mm);
-		kmem_cache_free(userfaultfd_ctx_cachep, ctx);
+	if (IS_ERR(file)) {
+		put_unused_fd(fd);
+		fd = PTR_ERR(file);
+		goto err_out;
 	}
+	/* prevent the mm struct to be freed */
+	mmgrab(ctx->mm);
+	file->f_mode |= FMODE_NOWAIT;
+	fd_install(fd, file);
+	return fd;
+err_out:
+	kmem_cache_free(userfaultfd_ctx_cachep, ctx);
 	return fd;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] signalfd: convert to ->read_iter()
  2024-04-03 14:02 [PATCHSET v2 0/3] Convert fs drivers to ->read_iter() Jens Axboe
  2024-04-03 14:02 ` [PATCH 1/3] timerfd: convert " Jens Axboe
  2024-04-03 14:02 ` [PATCH 2/3] userfaultfd: " Jens Axboe
@ 2024-04-03 14:02 ` Jens Axboe
  2024-04-03 22:57   ` Al Viro
  2 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2024-04-03 14:02 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, linux-kernel, Jens Axboe

Rather than use the older style ->read() hook, use ->read_iter() so that
signalfd can support both O_NONBLOCK and IOCB_NOWAIT for non-blocking
read attempts.

Split the fd setup into two parts, so that signalfd can mark the file
mode with FMODE_NOWAIT before installing it into the process table.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/signalfd.c | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index e20d1484c663..72b6796a9a40 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -68,8 +68,7 @@ static __poll_t signalfd_poll(struct file *file, poll_table *wait)
 /*
  * Copied from copy_siginfo_to_user() in kernel/signal.c
  */
-static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
-			     kernel_siginfo_t const *kinfo)
+static int signalfd_copyinfo(struct iov_iter *to, kernel_siginfo_t const *kinfo)
 {
 	struct signalfd_siginfo new;
 
@@ -146,10 +145,7 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
 		break;
 	}
 
-	if (copy_to_user(uinfo, &new, sizeof(struct signalfd_siginfo)))
-		return -EFAULT;
-
-	return sizeof(*uinfo);
+	return copy_to_iter(&new, sizeof(struct signalfd_siginfo), to);
 }
 
 static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info,
@@ -199,28 +195,27 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info
  * error code. The "count" parameter must be at least the size of a
  * "struct signalfd_siginfo".
  */
-static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
-			     loff_t *ppos)
+static ssize_t signalfd_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
+	struct file *file = iocb->ki_filp;
 	struct signalfd_ctx *ctx = file->private_data;
-	struct signalfd_siginfo __user *siginfo;
-	int nonblock = file->f_flags & O_NONBLOCK;
+	size_t count = iov_iter_count(to);
 	ssize_t ret, total = 0;
 	kernel_siginfo_t info;
+	bool nonblock;
 
 	count /= sizeof(struct signalfd_siginfo);
 	if (!count)
 		return -EINVAL;
 
-	siginfo = (struct signalfd_siginfo __user *) buf;
+	nonblock = file->f_flags & O_NONBLOCK || iocb->ki_flags & IOCB_NOWAIT;
 	do {
 		ret = signalfd_dequeue(ctx, &info, nonblock);
 		if (unlikely(ret <= 0))
 			break;
-		ret = signalfd_copyinfo(siginfo, &info);
+		ret = signalfd_copyinfo(to, &info);
 		if (ret < 0)
 			break;
-		siginfo++;
 		total += ret;
 		nonblock = 1;
 	} while (--count);
@@ -246,7 +241,7 @@ static const struct file_operations signalfd_fops = {
 #endif
 	.release	= signalfd_release,
 	.poll		= signalfd_poll,
-	.read		= signalfd_read,
+	.read_iter	= signalfd_read_iter,
 	.llseek		= noop_llseek,
 };
 
@@ -265,20 +260,35 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags)
 	signotset(mask);
 
 	if (ufd == -1) {
+		struct file *file;
+
 		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
 		if (!ctx)
 			return -ENOMEM;
 
 		ctx->sigmask = *mask;
 
+		ufd = get_unused_fd_flags(O_RDWR |
+					  (flags & (O_CLOEXEC | O_NONBLOCK)));
+		if (ufd < 0) {
+			kfree(ctx);
+			return ufd;
+		}
+
+		file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx,
+				       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
+		if (IS_ERR(file)) {
+			put_unused_fd(ufd);
+			kfree(ctx);
+			return ufd;
+		}
+		file->f_mode |= FMODE_NOWAIT;
+
 		/*
 		 * When we call this, the initialization must be complete, since
 		 * anon_inode_getfd() will install the fd.
 		 */
-		ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,
-				       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
-		if (ufd < 0)
-			kfree(ctx);
+		fd_install(ufd, file);
 	} else {
 		struct fd f = fdget(ufd);
 		if (!f.file)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] timerfd: convert to ->read_iter()
  2024-04-03 14:02 ` [PATCH 1/3] timerfd: convert " Jens Axboe
@ 2024-04-03 22:40   ` Al Viro
  2024-04-04 12:51     ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2024-04-03 22:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, brauner, linux-kernel

On Wed, Apr 03, 2024 at 08:02:52AM -0600, Jens Axboe wrote:

> -		res = put_user(ticks, (u64 __user *) buf) ? -EFAULT: sizeof(ticks);
> +		res = copy_to_iter(&ticks, sizeof(ticks), to);

Umm...  That's not an equivalent transformation - different behaviour on
short copy; try to call it via read(fd, unmapped_buffer, 8) and see what
happens.

copy_to_iter() returns the amount copied; no data copied => return 0, not -EFAULT.

> +	ufd = get_unused_fd_flags(O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));

You do realize that get_unused_fd_flags() ignores O_RDWR (or O_NDELAY), right?
Mixing those with O_CLOEXEC makes sense for anon_inode_getfd(), but here you
have separate calls of get_unused_fd_flags() and anon_inode_getfile(), so...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] userfaultfd: convert to ->read_iter()
  2024-04-03 14:02 ` [PATCH 2/3] userfaultfd: " Jens Axboe
@ 2024-04-03 22:45   ` Al Viro
  0 siblings, 0 replies; 10+ messages in thread
From: Al Viro @ 2024-04-03 22:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, brauner, linux-kernel

On Wed, Apr 03, 2024 at 08:02:53AM -0600, Jens Axboe wrote:
> -		if (count < sizeof(msg))
> +		if (iov_iter_count(to) < sizeof(msg))
>  			return ret ? ret : -EINVAL;
>  		_ret = userfaultfd_ctx_read(ctx, no_wait, &msg, inode);
>  		if (_ret < 0)
>  			return ret ? ret : _ret;
> -		if (copy_to_user((__u64 __user *) buf, &msg, sizeof(msg)))
> +		_ret = copy_to_iter(&msg, sizeof(msg), to);
> +		if (_ret < 0)
>  			return ret ? ret : -EFAULT;

Take a look in uio.h - you'll see

size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)

in there.  See the problem?

> +	fd = get_unused_fd_flags(O_RDONLY | (flags & UFFD_SHARED_FCNTL_FLAGS));
> +	if (fd < 0)
> +		goto err_out;

Same comment as for the previous patch.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] signalfd: convert to ->read_iter()
  2024-04-03 14:02 ` [PATCH 3/3] signalfd: " Jens Axboe
@ 2024-04-03 22:57   ` Al Viro
  2024-04-04 12:52     ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2024-04-03 22:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, brauner, linux-kernel

On Wed, Apr 03, 2024 at 08:02:54AM -0600, Jens Axboe wrote:
> Rather than use the older style ->read() hook, use ->read_iter() so that
> signalfd can support both O_NONBLOCK and IOCB_NOWAIT for non-blocking
> read attempts.
> 
> Split the fd setup into two parts, so that signalfd can mark the file
> mode with FMODE_NOWAIT before installing it into the process table.

Same issue with copy_to_iter() calling conventions; what's more, userland
really does not expect partial copies here, so it might be worth adding

static inline
bool copy_to_iter_full(void *addr, size_t bytes, struct iov_iter *i)
{
        size_t copied = copy_to_iter(addr, bytes, i);
	if (likely(copied == bytes))
		return true;
	iov_iter_revert(i, copied);
	return false;
}

to include/linux/uio.h for the sake of those suckers.  Then
they could go for
        return copy_to_iter_full(&new, sizeof(new), to) ? sizeof(new) : -EFAULT;
and similar in other two.

NOTE: the userland ABI is somewhat sucky here - if the buffer goes
unmapped (or r/o) at the offset that is *not* a multiple of
sizeof(struct signalfd_siginfo), you get an event quietly lost.
Not sure what can be done with that - it is a user-visible ABI.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] timerfd: convert to ->read_iter()
  2024-04-03 22:40   ` Al Viro
@ 2024-04-04 12:51     ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-04-04 12:51 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, linux-kernel

On 4/3/24 4:40 PM, Al Viro wrote:
> On Wed, Apr 03, 2024 at 08:02:52AM -0600, Jens Axboe wrote:
> 
>> -		res = put_user(ticks, (u64 __user *) buf) ? -EFAULT: sizeof(ticks);
>> +		res = copy_to_iter(&ticks, sizeof(ticks), to);
> 
> Umm...  That's not an equivalent transformation - different behaviour on
> short copy; try to call it via read(fd, unmapped_buffer, 8) and see what
> happens.
> 
> copy_to_iter() returns the amount copied; no data copied => return 0, not -EFAULT.

Gah yes, ironically I did a bunch of conversions yesterday and it's all
fine. Not sure wha thappened here. I'll fix it up.

>> +	ufd = get_unused_fd_flags(O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
> 
> You do realize that get_unused_fd_flags() ignores O_RDWR (or
> O_NDELAY), right? Mixing those with O_CLOEXEC makes sense for
> anon_inode_getfd(), but here you have separate calls of
> get_unused_fd_flags() and anon_inode_getfile(), so...

I do, but figured it was cleaner that way. But I can change the flag
passing, ditto for the other ones.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] signalfd: convert to ->read_iter()
  2024-04-03 22:57   ` Al Viro
@ 2024-04-04 12:52     ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-04-04 12:52 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, linux-kernel

On 4/3/24 4:57 PM, Al Viro wrote:
> On Wed, Apr 03, 2024 at 08:02:54AM -0600, Jens Axboe wrote:
>> Rather than use the older style ->read() hook, use ->read_iter() so that
>> signalfd can support both O_NONBLOCK and IOCB_NOWAIT for non-blocking
>> read attempts.
>>
>> Split the fd setup into two parts, so that signalfd can mark the file
>> mode with FMODE_NOWAIT before installing it into the process table.
> 
> Same issue with copy_to_iter() calling conventions; what's more, userland
> really does not expect partial copies here, so it might be worth adding
> 
> static inline
> bool copy_to_iter_full(void *addr, size_t bytes, struct iov_iter *i)
> {
>         size_t copied = copy_to_iter(addr, bytes, i);
> 	if (likely(copied == bytes))
> 		return true;
> 	iov_iter_revert(i, copied);
> 	return false;
> }
> 
> to include/linux/uio.h for the sake of those suckers.  Then
> they could go for
>         return copy_to_iter_full(&new, sizeof(new), to) ? sizeof(new) : -EFAULT;
> and similar in other two.

That's a good idea, makes the transformations easier too and avoids
dealine with the partial case.

> NOTE: the userland ABI is somewhat sucky here - if the buffer goes
> unmapped (or r/o) at the offset that is *not* a multiple of
> sizeof(struct signalfd_siginfo), you get an event quietly lost.
> Not sure what can be done with that - it is a user-visible ABI.

The ABI could be nicer, but isn't this a case of "well don't do that
then"?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/3] signalfd: convert to ->read_iter()
  2024-04-02 20:18 [PATCHSET 0/3] Convert fs drivers " Jens Axboe
@ 2024-04-02 20:18 ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-04-02 20:18 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, linux-kernel, Jens Axboe

Rather than use the older style ->read() hook, use ->read_iter() so that
signalfd can support both O_NONBLOCK and IOCB_NOWAIT for non-blocking
read attempts.

Split the fd setup into two parts, so that signalfd can mark the file
mode with FMODE_NOWAIT before installing it into the process table.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/signalfd.c | 44 ++++++++++++++++++++++++++++----------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index e20d1484c663..9b4ff83d816d 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -68,8 +68,7 @@ static __poll_t signalfd_poll(struct file *file, poll_table *wait)
 /*
  * Copied from copy_siginfo_to_user() in kernel/signal.c
  */
-static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
-			     kernel_siginfo_t const *kinfo)
+static int signalfd_copyinfo(struct iov_iter *to, kernel_siginfo_t const *kinfo)
 {
 	struct signalfd_siginfo new;
 
@@ -146,10 +145,7 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
 		break;
 	}
 
-	if (copy_to_user(uinfo, &new, sizeof(struct signalfd_siginfo)))
-		return -EFAULT;
-
-	return sizeof(*uinfo);
+	return copy_to_iter(&new, sizeof(struct signalfd_siginfo), to);
 }
 
 static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info,
@@ -199,25 +195,26 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info
  * error code. The "count" parameter must be at least the size of a
  * "struct signalfd_siginfo".
  */
-static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
-			     loff_t *ppos)
+static ssize_t signalfd_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
+	struct file *file = iocb->ki_filp;
 	struct signalfd_ctx *ctx = file->private_data;
 	struct signalfd_siginfo __user *siginfo;
-	int nonblock = file->f_flags & O_NONBLOCK;
+	size_t count = iov_iter_count(to);
 	ssize_t ret, total = 0;
 	kernel_siginfo_t info;
+	bool nonblock;
 
 	count /= sizeof(struct signalfd_siginfo);
 	if (!count)
 		return -EINVAL;
 
-	siginfo = (struct signalfd_siginfo __user *) buf;
+	nonblock = file->f_flags & O_NONBLOCK || iocb->ki_flags & IOCB_NOWAIT;
 	do {
 		ret = signalfd_dequeue(ctx, &info, nonblock);
 		if (unlikely(ret <= 0))
 			break;
-		ret = signalfd_copyinfo(siginfo, &info);
+		ret = signalfd_copyinfo(to, &info);
 		if (ret < 0)
 			break;
 		siginfo++;
@@ -246,7 +243,7 @@ static const struct file_operations signalfd_fops = {
 #endif
 	.release	= signalfd_release,
 	.poll		= signalfd_poll,
-	.read		= signalfd_read,
+	.read_iter	= signalfd_read_iter,
 	.llseek		= noop_llseek,
 };
 
@@ -265,20 +262,35 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags)
 	signotset(mask);
 
 	if (ufd == -1) {
+		struct file *file;
+
 		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
 		if (!ctx)
 			return -ENOMEM;
 
 		ctx->sigmask = *mask;
 
+		ufd = get_unused_fd_flags(O_RDWR |
+					  (flags & (O_CLOEXEC | O_NONBLOCK)));
+		if (ufd < 0) {
+			kfree(ctx);
+			return ufd;
+		}
+
+		file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx,
+				       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
+		if (IS_ERR(file)) {
+			put_unused_fd(ufd);
+			kfree(ctx);
+			return ufd;
+		}
+		file->f_mode |= FMODE_NOWAIT;
+
 		/*
 		 * When we call this, the initialization must be complete, since
 		 * anon_inode_getfd() will install the fd.
 		 */
-		ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,
-				       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
-		if (ufd < 0)
-			kfree(ctx);
+		fd_install(ufd, file);
 	} else {
 		struct fd f = fdget(ufd);
 		if (!f.file)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-04-04 12:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-03 14:02 [PATCHSET v2 0/3] Convert fs drivers to ->read_iter() Jens Axboe
2024-04-03 14:02 ` [PATCH 1/3] timerfd: convert " Jens Axboe
2024-04-03 22:40   ` Al Viro
2024-04-04 12:51     ` Jens Axboe
2024-04-03 14:02 ` [PATCH 2/3] userfaultfd: " Jens Axboe
2024-04-03 22:45   ` Al Viro
2024-04-03 14:02 ` [PATCH 3/3] signalfd: " Jens Axboe
2024-04-03 22:57   ` Al Viro
2024-04-04 12:52     ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2024-04-02 20:18 [PATCHSET 0/3] Convert fs drivers " Jens Axboe
2024-04-02 20:18 ` [PATCH 3/3] signalfd: convert " Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).