Re: [PATCH v3 2/2] io_uring: add support for IORING_OP_GETDENTS

From: Lennert Buytenhek <buytenh@wantstofly.org>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>, Al Viro <viro@zeniv.linux.org.uk>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	io-uring@vger.kernel.org, David Laight <David.Laight@aculab.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v3 2/2] io_uring: add support for IORING_OP_GETDENTS
Date: Fri, 19 Feb 2021 20:06:37 +0200	[thread overview]
Message-ID: <20210219180637.GC342512@wantstofly.org> (raw)
In-Reply-To: <9a6fb59b-be85-c36b-3c83-26cff37bcb87@gmail.com>

On Fri, Feb 19, 2021 at 12:05:58PM +0000, Pavel Begunkov wrote:

> > IORING_OP_GETDENTS behaves much like getdents64(2) and takes the same
> > arguments, but with a small twist: it takes an additional offset
> > argument, and reading from the specified directory starts at the given
> > offset.
> > 
> > For the first IORING_OP_GETDENTS call on a directory, the offset
> > parameter can be set to zero, and for subsequent calls, it can be
> > set to the ->d_off field of the last struct linux_dirent64 returned
> > by the previous IORING_OP_GETDENTS call.
> > 
> > Internally, if necessary, IORING_OP_GETDENTS will vfs_llseek() to
> > the right directory position before calling vfs_getdents().
> > 
> > IORING_OP_GETDENTS may or may not update the specified directory's
> > file offset, and the file offset should not be relied upon having
> > any particular value during or after an IORING_OP_GETDENTS call.
> > 
> > Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
> > ---
> >  fs/io_uring.c                 | 73 +++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/io_uring.h |  1 +
> >  2 files changed, 74 insertions(+)
> > 
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 056bd4c90ade..6853bf48369a 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -635,6 +635,13 @@ struct io_mkdir {
> >  	struct filename			*filename;
> >  };
> >  
> [...]
> > +static int io_getdents(struct io_kiocb *req, unsigned int issue_flags)
> > +{
> > +	struct io_getdents *getdents = &req->getdents;
> > +	bool pos_unlock = false;
> > +	int ret = 0;
> > +
> > +	/* getdents always requires a blocking context */
> > +	if (issue_flags & IO_URING_F_NONBLOCK)
> > +		return -EAGAIN;
> > +
> > +	/* for vfs_llseek and to serialize ->iterate_shared() on this file */
> > +	if (file_count(req->file) > 1) {
> 
> Looks racy, is it safe? E.g. can be concurrently dupped and used, or
> just several similar IORING_OP_GETDENTS requests.

I thought that it was safe, but I thought about it a bit more, and it
seems that it is unsafe -- if you IORING_REGISTER_FILES to register the
dirfd and then close the dirfd, you'll get a file_count of 1, while you
can submit concurrent operations.  So I'll remove the conditional
locking.  Thanks!

(If not for IORING_REGISTER_FILES, it seems safe, because then
io_file_get() will hold a(t least one) reference on the file while the
operation is in flight, so then if file_count(req->file) == 1 here,
then it means that the file is no longer referenced by any fdtable,
and nobody else should be able to get a reference to it -- but that's
a bit of a useless optimization.)

(Logic was taken from __fdget_pos, where it is safe for a different
reason, i.e. __fget_light will not bump the refcount iff current->files
is unshared.)

> > +		pos_unlock = true;
> > +		mutex_lock(&req->file->f_pos_lock);
> > +	}
> > +
> > +	if (req->file->f_pos != getdents->pos) {
> > +		loff_t res = vfs_llseek(req->file, getdents->pos, SEEK_SET);
> 
> I may be missing the previous discussions, but can this ever become
> stateless, like passing an offset? Including readdir.c and beyond. 

My aim was to only make the minimally required change initially, but
to make that optimization possible in the future (e.g. by reserving the
right to either update or not update the file position) -- but I'll
try doing the optimization now.

> > +		if (res < 0)
> > +			ret = res;
> > +	}
> > +
> > +	if (ret == 0) {
> > +		ret = vfs_getdents(req->file, getdents->dirent,
> > +				   getdents->count);
> > +	}
> > +
> > +	if (pos_unlock)
> > +		mutex_unlock(&req->file->f_pos_lock);
> > +
> > +	if (ret < 0) {
> > +		if (ret == -ERESTARTSYS)
> > +			ret = -EINTR;
> > +		req_set_fail_links(req);
> > +	}
> > +	io_req_complete(req, ret);
> > +	return 0;
> > +}
> [...]
> 
> -- 
> Pavel Begunkov