io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/1] Is register file feature hard to use ?
@ 2021-10-12  8:48 Xiaoguang Wang
  2021-10-12  8:48 ` [RFC 1/1] io_uring: improve register file feature's usability Xiaoguang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Xiaoguang Wang @ 2021-10-12  8:48 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, asml.silence

While trying to use register file feature, I think it's hard to use, see
patch-1's commit message for more info.

In this RFC patch, I just propose an preliminary implementation, don't
consider tag, compatibility issue yet, sorry. If we come to a agreement
that it's the right direction, I'll refine it as soon as possible.

Also I saw Pavel has written "io_uring: openat directly into fixed fd table",
which requires user to pass a file_slot. I think it's inconvenient to
user app. We may still reply __get_unused_fd_flags() to allocate a fd,
use it to as slot info.

Xiaoguang Wang (1):
  io_uring: improve register file feature's usability

 fs/io_uring.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 8 deletions(-)

-- 
2.14.4.44.g2045bb6


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/1] io_uring: improve register file feature's usability
  2021-10-12  8:48 [RFC 0/1] Is register file feature hard to use ? Xiaoguang Wang
@ 2021-10-12  8:48 ` Xiaoguang Wang
  2021-10-12 11:10   ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Xiaoguang Wang @ 2021-10-12  8:48 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, asml.silence

The idea behind register file feature is good and straightforward, but
there is a very big issue that it's hard to use for user apps. User apps
need to bind slot info to file descriptor. For example, user app wants
to register a file, then it first needs to find a free slot in register
file infrastructure, that means user app needs to maintain slot info in
userspace, which is a obvious burden for userspace developers.

Actually, file descriptor can be a good candidate slot info. If app wants
to register a file, it can use this file's fd as valid slot, there'll
definitely be no conflicts and very easy for user apps.

To support to pass fd as slot info, we'll need to automatically resize
io_file_table if passed fd is greater than current io_file_table size,
just like how fd table extends.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 fs/io_uring.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 73135c5c6168..be7abd89c0b0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7768,6 +7768,21 @@ static bool io_alloc_file_tables(struct io_file_table *table, unsigned nr_files)
 	return !!table->files;
 }
 
+static int io_resize_file_tables(struct io_ring_ctx *ctx, unsigned old_files,
+				 unsigned new_files)
+{
+	size_t oldsize = sizeof(ctx->file_table.files[0]) * old_files;
+	size_t newsize = sizeof(ctx->file_table.files[0]) * new_files;
+
+	ctx->file_table.files = kvrealloc(ctx->file_table.files, oldsize, newsize,
+					   GFP_KERNEL_ACCOUNT);
+	if (!ctx->file_table.files)
+		return -ENOMEM;
+
+	ctx->nr_user_files = new_files;
+	return 0;
+}
+
 static void io_free_file_tables(struct io_file_table *table)
 {
 	kvfree(table->files);
@@ -8147,6 +8162,25 @@ static void io_rsrc_put_work(struct work_struct *work)
 	}
 }
 
+static inline int io_calc_file_tables_size(__s32 __user *fds, unsigned nr_files)
+{
+	int i, fd, max_fd = 0;
+
+	for (i = 0; i < nr_files; i++) {
+		if (copy_from_user(&fd, &fds[i], sizeof(fd)))
+			return -EFAULT;
+		if (fd == -1)
+			continue;
+		if (fd > max_fd)
+			max_fd = fd;
+	}
+
+	max_fd++;
+	if (max_fd < nr_files)
+		max_fd = nr_files;
+	return max_fd;
+}
+
 static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
 				 unsigned nr_args, u64 __user *tags)
 {
@@ -8154,6 +8188,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
 	struct file *file;
 	int fd, ret;
 	unsigned i;
+	int num_files;
 
 	if (ctx->file_data)
 		return -EBUSY;
@@ -8171,8 +8206,12 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
 	if (ret)
 		return ret;
 
+	num_files = io_calc_file_tables_size(fds, nr_args);
+	if (num_files < 0)
+		goto out_free;
+
 	ret = -ENOMEM;
-	if (!io_alloc_file_tables(&ctx->file_table, nr_args))
+	if (!io_alloc_file_tables(&ctx->file_table, num_files))
 		goto out_free;
 
 	for (i = 0; i < nr_args; i++, ctx->nr_user_files++) {
@@ -8204,7 +8243,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
 			fput(file);
 			goto out_fput;
 		}
-		io_fixed_file_set(io_fixed_file_slot(&ctx->file_table, i), file);
+		io_fixed_file_set(io_fixed_file_slot(&ctx->file_table, fd), file);
 	}
 
 	ret = io_sqe_files_scm(ctx);
@@ -8390,15 +8429,22 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
 	struct io_rsrc_data *data = ctx->file_data;
 	struct io_fixed_file *file_slot;
 	struct file *file;
-	int fd, i, err = 0;
+	int fd, err = 0;
 	unsigned int done;
 	bool needs_switch = false;
+	int num_files;
 
 	if (!ctx->file_data)
 		return -ENXIO;
 	if (up->offset + nr_args > ctx->nr_user_files)
 		return -EINVAL;
 
+	num_files = io_calc_file_tables_size(fds, nr_args);
+	if (num_files < 0)
+		return -EFAULT;
+	if (io_resize_file_tables(ctx, ctx->nr_user_files, num_files) < 0)
+		return -ENOMEM;
+
 	for (done = 0; done < nr_args; done++) {
 		u64 tag = 0;
 
@@ -8414,12 +8460,11 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
 		if (fd == IORING_REGISTER_FILES_SKIP)
 			continue;
 
-		i = array_index_nospec(up->offset + done, ctx->nr_user_files);
-		file_slot = io_fixed_file_slot(&ctx->file_table, i);
+		file_slot = io_fixed_file_slot(&ctx->file_table, fd);
 
 		if (file_slot->file_ptr) {
 			file = (struct file *)(file_slot->file_ptr & FFS_MASK);
-			err = io_queue_rsrc_removal(data, up->offset + done,
+			err = io_queue_rsrc_removal(data, fd,
 						    ctx->rsrc_node, file);
 			if (err)
 				break;
@@ -8445,9 +8490,9 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
 				err = -EBADF;
 				break;
 			}
-			*io_get_tag_slot(data, up->offset + done) = tag;
+			*io_get_tag_slot(data, fd) = tag;
 			io_fixed_file_set(file_slot, file);
-			err = io_sqe_file_register(ctx, file, i);
+			err = io_sqe_file_register(ctx, file, fd);
 			if (err) {
 				file_slot->file_ptr = 0;
 				fput(file);
-- 
2.14.4.44.g2045bb6


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] io_uring: improve register file feature's usability
  2021-10-12  8:48 ` [RFC 1/1] io_uring: improve register file feature's usability Xiaoguang Wang
@ 2021-10-12 11:10   ` Pavel Begunkov
  2021-10-12 13:11     ` Xiaoguang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2021-10-12 11:10 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe

On 10/12/21 09:48, Xiaoguang Wang wrote:
> The idea behind register file feature is good and straightforward, but
> there is a very big issue that it's hard to use for user apps. User apps
> need to bind slot info to file descriptor. For example, user app wants
> to register a file, then it first needs to find a free slot in register
> file infrastructure, that means user app needs to maintain slot info in
> userspace, which is a obvious burden for userspace developers.

Slot allocation is specifically entirely given away to the userspace,
the userspace has more info and can use it more efficiently, e.g.
if there is only a small managed set of registered files they can
always have O(1) slot "lookup", and a couple of more use cases.

If userspace wants to mimic a fdtable into io_uring's registered table,
it's possible to do as is and without extra fdtable tracking

fd = open();
io_uring_update_slot(off=fd, fd=fd);

For the dual wanting an fd both in the normal fdtable and fixed table
with same indexes, not sure how viable that is but "direct open" can
be extended if needed.


> Actually, file descriptor can be a good candidate slot info. If app wants
> to register a file, it can use this file's fd as valid slot, there'll
> definitely be no conflicts and very easy for user apps.
> 
> To support to pass fd as slot info, we'll need to automatically resize
> io_file_table if passed fd is greater than current io_file_table size,
> just like how fd table extends.
> 
> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> ---
>   fs/io_uring.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 53 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 73135c5c6168..be7abd89c0b0 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -7768,6 +7768,21 @@ static bool io_alloc_file_tables(struct io_file_table *table, unsigned nr_files)
>   	return !!table->files;
>   }
>   
> +static int io_resize_file_tables(struct io_ring_ctx *ctx, unsigned old_files,
> +				 unsigned new_files)
> +{
> +	size_t oldsize = sizeof(ctx->file_table.files[0]) * old_files;
> +	size_t newsize = sizeof(ctx->file_table.files[0]) * new_files;
> +
> +	ctx->file_table.files = kvrealloc(ctx->file_table.files, oldsize, newsize,
> +					   GFP_KERNEL_ACCOUNT);
> +	if (!ctx->file_table.files)
> +		return -ENOMEM;
> +
> +	ctx->nr_user_files = new_files;
> +	return 0;
> +}
> +
>   static void io_free_file_tables(struct io_file_table *table)
>   {
>   	kvfree(table->files);
> @@ -8147,6 +8162,25 @@ static void io_rsrc_put_work(struct work_struct *work)
>   	}
>   }
>   
> +static inline int io_calc_file_tables_size(__s32 __user *fds, unsigned nr_files)
> +{
> +	int i, fd, max_fd = 0;
> +
> +	for (i = 0; i < nr_files; i++) {
> +		if (copy_from_user(&fd, &fds[i], sizeof(fd)))
> +			return -EFAULT;
> +		if (fd == -1)
> +			continue;
> +		if (fd > max_fd)
> +			max_fd = fd;
> +	}
> +
> +	max_fd++;
> +	if (max_fd < nr_files)
> +		max_fd = nr_files;
> +	return max_fd;
> +}
> +
>   static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
>   				 unsigned nr_args, u64 __user *tags)
>   {
> @@ -8154,6 +8188,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
>   	struct file *file;
>   	int fd, ret;
>   	unsigned i;
> +	int num_files;
>   
>   	if (ctx->file_data)
>   		return -EBUSY;
> @@ -8171,8 +8206,12 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
>   	if (ret)
>   		return ret;
>   
> +	num_files = io_calc_file_tables_size(fds, nr_args);
> +	if (num_files < 0)
> +		goto out_free;
> +
>   	ret = -ENOMEM;
> -	if (!io_alloc_file_tables(&ctx->file_table, nr_args))
> +	if (!io_alloc_file_tables(&ctx->file_table, num_files))
>   		goto out_free;
>   
>   	for (i = 0; i < nr_args; i++, ctx->nr_user_files++) {
> @@ -8204,7 +8243,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
>   			fput(file);
>   			goto out_fput;
>   		}
> -		io_fixed_file_set(io_fixed_file_slot(&ctx->file_table, i), file);
> +		io_fixed_file_set(io_fixed_file_slot(&ctx->file_table, fd), file);
>   	}
>   
>   	ret = io_sqe_files_scm(ctx);
> @@ -8390,15 +8429,22 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
>   	struct io_rsrc_data *data = ctx->file_data;
>   	struct io_fixed_file *file_slot;
>   	struct file *file;
> -	int fd, i, err = 0;
> +	int fd, err = 0;
>   	unsigned int done;
>   	bool needs_switch = false;
> +	int num_files;
>   
>   	if (!ctx->file_data)
>   		return -ENXIO;
>   	if (up->offset + nr_args > ctx->nr_user_files)
>   		return -EINVAL;
>   
> +	num_files = io_calc_file_tables_size(fds, nr_args);
> +	if (num_files < 0)
> +		return -EFAULT;
> +	if (io_resize_file_tables(ctx, ctx->nr_user_files, num_files) < 0)
> +		return -ENOMEM;
> +
>   	for (done = 0; done < nr_args; done++) {
>   		u64 tag = 0;
>   
> @@ -8414,12 +8460,11 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
>   		if (fd == IORING_REGISTER_FILES_SKIP)
>   			continue;
>   
> -		i = array_index_nospec(up->offset + done, ctx->nr_user_files);
> -		file_slot = io_fixed_file_slot(&ctx->file_table, i);
> +		file_slot = io_fixed_file_slot(&ctx->file_table, fd);
>   
>   		if (file_slot->file_ptr) {
>   			file = (struct file *)(file_slot->file_ptr & FFS_MASK);
> -			err = io_queue_rsrc_removal(data, up->offset + done,
> +			err = io_queue_rsrc_removal(data, fd,
>   						    ctx->rsrc_node, file);
>   			if (err)
>   				break;
> @@ -8445,9 +8490,9 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
>   				err = -EBADF;
>   				break;
>   			}
> -			*io_get_tag_slot(data, up->offset + done) = tag;
> +			*io_get_tag_slot(data, fd) = tag;
>   			io_fixed_file_set(file_slot, file);
> -			err = io_sqe_file_register(ctx, file, i);
> +			err = io_sqe_file_register(ctx, file, fd);
>   			if (err) {
>   				file_slot->file_ptr = 0;
>   				fput(file);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] io_uring: improve register file feature's usability
  2021-10-12 11:10   ` Pavel Begunkov
@ 2021-10-12 13:11     ` Xiaoguang Wang
  2021-10-12 14:33       ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Xiaoguang Wang @ 2021-10-12 13:11 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe

hi,


> On 10/12/21 09:48, Xiaoguang Wang wrote:
>> The idea behind register file feature is good and straightforward, but
>> there is a very big issue that it's hard to use for user apps. User apps
>> need to bind slot info to file descriptor. For example, user app wants
>> to register a file, then it first needs to find a free slot in register
>> file infrastructure, that means user app needs to maintain slot info in
>> userspace, which is a obvious burden for userspace developers.
>
> Slot allocation is specifically entirely given away to the userspace,
> the userspace has more info and can use it more efficiently, e.g.
> if there is only a small managed set of registered files they can
> always have O(1) slot "lookup", and a couple of more use cases.

Can you explain more what is slot "lookup", thanks. For me, it seems that

use fd as slot is the simplest and most efficient way, user does not need to

mange slot info at all in userspace.


>
> If userspace wants to mimic a fdtable into io_uring's registered table,
> it's possible to do as is and without extra fdtable tracking
>
> fd = open();
> io_uring_update_slot(off=fd, fd=fd);

No, currently it's hard to do above work, unless we register a big 
number of files initially.

Say we call IORING_REGISTER_FILES to register 1000 files initially,  
then the io_uring

io_file_table only supports 1000 files, fd which is greater than 1000 
will be not able to

be registered.

For safety,  you may need to register the number of 
getrlimit(RLIMIT_NOFILE) initially,

but it also may fail, user may change RLIMIT_NOFILE too. This is why I 
introduce a

io_uring io_file_table resize feature, but I agree this method may waste 
memory, for

example, user app only wants one file registered, but this file's fd is 
very large.


Regards,

Xiaoguang Wang

>
> For the dual wanting an fd both in the normal fdtable and fixed table
> with same indexes, not sure how viable that is but "direct open" can
> be extended if needed.
>
>
>> Actually, file descriptor can be a good candidate slot info. If app 
>> wants
>> to register a file, it can use this file's fd as valid slot, there'll
>> definitely be no conflicts and very easy for user apps.
>>
>> To support to pass fd as slot info, we'll need to automatically resize
>> io_file_table if passed fd is greater than current io_file_table size,
>> just like how fd table extends.
>>
>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>> ---
>>   fs/io_uring.c | 61 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 53 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 73135c5c6168..be7abd89c0b0 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -7768,6 +7768,21 @@ static bool io_alloc_file_tables(struct 
>> io_file_table *table, unsigned nr_files)
>>       return !!table->files;
>>   }
>>   +static int io_resize_file_tables(struct io_ring_ctx *ctx, unsigned 
>> old_files,
>> +                 unsigned new_files)
>> +{
>> +    size_t oldsize = sizeof(ctx->file_table.files[0]) * old_files;
>> +    size_t newsize = sizeof(ctx->file_table.files[0]) * new_files;
>> +
>> +    ctx->file_table.files = kvrealloc(ctx->file_table.files, 
>> oldsize, newsize,
>> +                       GFP_KERNEL_ACCOUNT);
>> +    if (!ctx->file_table.files)
>> +        return -ENOMEM;
>> +
>> +    ctx->nr_user_files = new_files;
>> +    return 0;
>> +}
>> +
>>   static void io_free_file_tables(struct io_file_table *table)
>>   {
>>       kvfree(table->files);
>> @@ -8147,6 +8162,25 @@ static void io_rsrc_put_work(struct 
>> work_struct *work)
>>       }
>>   }
>>   +static inline int io_calc_file_tables_size(__s32 __user *fds, 
>> unsigned nr_files)
>> +{
>> +    int i, fd, max_fd = 0;
>> +
>> +    for (i = 0; i < nr_files; i++) {
>> +        if (copy_from_user(&fd, &fds[i], sizeof(fd)))
>> +            return -EFAULT;
>> +        if (fd == -1)
>> +            continue;
>> +        if (fd > max_fd)
>> +            max_fd = fd;
>> +    }
>> +
>> +    max_fd++;
>> +    if (max_fd < nr_files)
>> +        max_fd = nr_files;
>> +    return max_fd;
>> +}
>> +
>>   static int io_sqe_files_register(struct io_ring_ctx *ctx, void 
>> __user *arg,
>>                    unsigned nr_args, u64 __user *tags)
>>   {
>> @@ -8154,6 +8188,7 @@ static int io_sqe_files_register(struct 
>> io_ring_ctx *ctx, void __user *arg,
>>       struct file *file;
>>       int fd, ret;
>>       unsigned i;
>> +    int num_files;
>>         if (ctx->file_data)
>>           return -EBUSY;
>> @@ -8171,8 +8206,12 @@ static int io_sqe_files_register(struct 
>> io_ring_ctx *ctx, void __user *arg,
>>       if (ret)
>>           return ret;
>>   +    num_files = io_calc_file_tables_size(fds, nr_args);
>> +    if (num_files < 0)
>> +        goto out_free;
>> +
>>       ret = -ENOMEM;
>> -    if (!io_alloc_file_tables(&ctx->file_table, nr_args))
>> +    if (!io_alloc_file_tables(&ctx->file_table, num_files))
>>           goto out_free;
>>         for (i = 0; i < nr_args; i++, ctx->nr_user_files++) {
>> @@ -8204,7 +8243,7 @@ static int io_sqe_files_register(struct 
>> io_ring_ctx *ctx, void __user *arg,
>>               fput(file);
>>               goto out_fput;
>>           }
>> - io_fixed_file_set(io_fixed_file_slot(&ctx->file_table, i), file);
>> + io_fixed_file_set(io_fixed_file_slot(&ctx->file_table, fd), file);
>>       }
>>         ret = io_sqe_files_scm(ctx);
>> @@ -8390,15 +8429,22 @@ static int __io_sqe_files_update(struct 
>> io_ring_ctx *ctx,
>>       struct io_rsrc_data *data = ctx->file_data;
>>       struct io_fixed_file *file_slot;
>>       struct file *file;
>> -    int fd, i, err = 0;
>> +    int fd, err = 0;
>>       unsigned int done;
>>       bool needs_switch = false;
>> +    int num_files;
>>         if (!ctx->file_data)
>>           return -ENXIO;
>>       if (up->offset + nr_args > ctx->nr_user_files)
>>           return -EINVAL;
>>   +    num_files = io_calc_file_tables_size(fds, nr_args);
>> +    if (num_files < 0)
>> +        return -EFAULT;
>> +    if (io_resize_file_tables(ctx, ctx->nr_user_files, num_files) < 0)
>> +        return -ENOMEM;
>> +
>>       for (done = 0; done < nr_args; done++) {
>>           u64 tag = 0;
>>   @@ -8414,12 +8460,11 @@ static int __io_sqe_files_update(struct 
>> io_ring_ctx *ctx,
>>           if (fd == IORING_REGISTER_FILES_SKIP)
>>               continue;
>>   -        i = array_index_nospec(up->offset + done, 
>> ctx->nr_user_files);
>> -        file_slot = io_fixed_file_slot(&ctx->file_table, i);
>> +        file_slot = io_fixed_file_slot(&ctx->file_table, fd);
>>             if (file_slot->file_ptr) {
>>               file = (struct file *)(file_slot->file_ptr & FFS_MASK);
>> -            err = io_queue_rsrc_removal(data, up->offset + done,
>> +            err = io_queue_rsrc_removal(data, fd,
>>                               ctx->rsrc_node, file);
>>               if (err)
>>                   break;
>> @@ -8445,9 +8490,9 @@ static int __io_sqe_files_update(struct 
>> io_ring_ctx *ctx,
>>                   err = -EBADF;
>>                   break;
>>               }
>> -            *io_get_tag_slot(data, up->offset + done) = tag;
>> +            *io_get_tag_slot(data, fd) = tag;
>>               io_fixed_file_set(file_slot, file);
>> -            err = io_sqe_file_register(ctx, file, i);
>> +            err = io_sqe_file_register(ctx, file, fd);
>>               if (err) {
>>                   file_slot->file_ptr = 0;
>>                   fput(file);
>>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] io_uring: improve register file feature's usability
  2021-10-12 13:11     ` Xiaoguang Wang
@ 2021-10-12 14:33       ` Pavel Begunkov
  2021-10-13  3:32         ` Xiaoguang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2021-10-12 14:33 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe

On 10/12/21 14:11, Xiaoguang Wang wrote:
>> On 10/12/21 09:48, Xiaoguang Wang wrote:
>>> The idea behind register file feature is good and straightforward, but
>>> there is a very big issue that it's hard to use for user apps. User apps
>>> need to bind slot info to file descriptor. For example, user app wants
>>> to register a file, then it first needs to find a free slot in register
>>> file infrastructure, that means user app needs to maintain slot info in
>>> userspace, which is a obvious burden for userspace developers.
>>
>> Slot allocation is specifically entirely given away to the userspace,
>> the userspace has more info and can use it more efficiently, e.g.
>> if there is only a small managed set of registered files they can
>> always have O(1) slot "lookup", and a couple of more use cases.
> 
> Can you explain more what is slot "lookup", thanks. For me, it seems that

I referred to nothing particular, just a way userspace finds a new index,
can be round robin or "index==fd".

> use fd as slot is the simplest and most efficient way, user does not need to> mange slot info at all in userspace.

As mentioned, it should be slightly more efficient to have a small table,
cache misses. Also, it's allocated with kvcalloc() so if it can't be
allocate physically contig memory it will set up virtual memory.

So, if the userspace has some other way of indexing files, small tables
are preferred. For instance if it operates with 1-2 files, or stores files
in an array and the index in the array may serve the purpose, or any other
way. Also, additional memory for those who care.

>> If userspace wants to mimic a fdtable into io_uring's registered table,
>> it's possible to do as is and without extra fdtable tracking
>>
>> fd = open();
>> io_uring_update_slot(off=fd, fd=fd);
> 
> No, currently it's hard to do above work, unless we register a big number of files initially.

If they intend to use a big number of files that's the way to go. They
can unregister/register if needed, usual grow factor=2  should make
it workable.

We may consider fast growing as a separate feature if really needed,
either as you did it, or even better doing it explicitly and separately
from updates.


> Say we call IORING_REGISTER_FILES to register 1000 files initially, then the io_uring
> 
> io_file_table only supports 1000 files, fd which is greater than 1000 will be not able to
> 
> be registered.
> 
> For safety,  you may need to register the number of getrlimit(RLIMIT_NOFILE) initially,
> 
> but it also may fail, user may change RLIMIT_NOFILE too. This is why I introduce a
> 
> io_uring io_file_table resize feature, but I agree this method may waste memory, for
> 
> example, user app only wants one file registered, but this file's fd is very large.

That's fine as long as it's optional

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] io_uring: improve register file feature's usability
  2021-10-12 14:33       ` Pavel Begunkov
@ 2021-10-13  3:32         ` Xiaoguang Wang
  2021-10-14  9:43           ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Xiaoguang Wang @ 2021-10-13  3:32 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe

hi,


> On 10/12/21 14:11, Xiaoguang Wang wrote:
>>> On 10/12/21 09:48, Xiaoguang Wang wrote:
>>>> The idea behind register file feature is good and straightforward, but
>>>> there is a very big issue that it's hard to use for user apps. User 
>>>> apps
>>>> need to bind slot info to file descriptor. For example, user app wants
>>>> to register a file, then it first needs to find a free slot in 
>>>> register
>>>> file infrastructure, that means user app needs to maintain slot 
>>>> info in
>>>> userspace, which is a obvious burden for userspace developers.
>>>
>>> Slot allocation is specifically entirely given away to the userspace,
>>> the userspace has more info and can use it more efficiently, e.g.
>>> if there is only a small managed set of registered files they can
>>> always have O(1) slot "lookup", and a couple of more use cases.
>>
>> Can you explain more what is slot "lookup", thanks. For me, it seems 
>> that
>
> I referred to nothing particular, just a way userspace finds a new index,
> can be round robin or "index==fd".
>
>> use fd as slot is the simplest and most efficient way, user does not 
>> need to> mange slot info at all in userspace.
>
> As mentioned, it should be slightly more efficient to have a small table,
> cache misses. Also, it's allocated with kvcalloc() so if it can't be
> allocate physically contig memory it will set up virtual memory.
>
> So, if the userspace has some other way of indexing files, small tables
> are preferred. For instance if it operates with 1-2 files, or stores 
> files
> in an array and the index in the array may serve the purpose, or any 
> other
> way. Also, additional memory for those who care.

Yeah, I agree with you that for small tables, current implementation 
seems good,

If user app just registers a small number of files, it may handle it 
well, but imagine

how netty, nginx or other network apps which will open thousands of 
socket files,

manage these socket files' slot info will be a obvious burden to 
developer, these

apps may need to develop a private component to record used or free 
slot. Especially

in a high concurrency scenario, frequent sockes opened or closed, this 
private component

may need locks to protect, that means this private component will 
introduce overhead too.

For a fd, vfs layer has already ensure its unique.

>
>>> If userspace wants to mimic a fdtable into io_uring's registered table,
>>> it's possible to do as is and without extra fdtable tracking
>>>
>>> fd = open();
>>> io_uring_update_slot(off=fd, fd=fd);
>>
>> No, currently it's hard to do above work, unless we register a big 
>> number of files initially.
>
> If they intend to use a big number of files that's the way to go. They
> can unregister/register if needed, usual grow factor=2  should make
> it workable.

I'm not sure un-register/register are appropriate,say a app registers 
1000 files, then

it needs to un-register 1000 files firstly, there are doubts whether can 
do this un-registration

work, if some of these files are used by other threads, which submit 
sqes with FIXED_FILE

flags continually, so the first un-registration work needs to 
synchronize with threads which

are submitting requests. And later app needs to prepare a new files 
array, saving current 1000

files and new files info to this new array, for me, it can works, but 
not efficient and somewhat

hard to use :)

What I express here is that there are many factors to consider carefully 
when using file

registration feature, that's why I say it's somewhat hard to use :)


Do you know any popular apps based on io_uring that have used file 
registration feature ?

netty (https://github.com/netty/netty-incubator-transport-io_uring.git) 
has io_uring support,

but does not use file registration feature, and recently  we'd like to 
add file registration

to it.


Regards,

Xiaoguang Wang

>
> We may consider fast growing as a separate feature if really needed,
> either as you did it, or even better doing it explicitly and separately
> from updates.
>
>
>> Say we call IORING_REGISTER_FILES to register 1000 files initially, 
>> then the io_uring
>>
>> io_file_table only supports 1000 files, fd which is greater than 1000 
>> will be not able to
>>
>> be registered.
>>
>> For safety,  you may need to register the number of 
>> getrlimit(RLIMIT_NOFILE) initially,
>>
>> but it also may fail, user may change RLIMIT_NOFILE too. This is why 
>> I introduce a
>>
>> io_uring io_file_table resize feature, but I agree this method may 
>> waste memory, for
>>
>> example, user app only wants one file registered, but this file's fd 
>> is very large.
>
> That's fine as long as it's optional
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] io_uring: improve register file feature's usability
  2021-10-13  3:32         ` Xiaoguang Wang
@ 2021-10-14  9:43           ` Pavel Begunkov
  2021-10-21  8:40             ` Xiaoguang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2021-10-14  9:43 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe

On 10/13/21 04:32, Xiaoguang Wang wrote:
> hi,
>> On 10/12/21 14:11, Xiaoguang Wang wrote:
>>>> On 10/12/21 09:48, Xiaoguang Wang wrote:
>>>>> The idea behind register file feature is good and straightforward, but
>>>>> there is a very big issue that it's hard to use for user apps. User apps
>>>>> need to bind slot info to file descriptor. For example, user app wants
>>>>> to register a file, then it first needs to find a free slot in register
>>>>> file infrastructure, that means user app needs to maintain slot info in
>>>>> userspace, which is a obvious burden for userspace developers.
>>>>
>>>> Slot allocation is specifically entirely given away to the userspace,
>>>> the userspace has more info and can use it more efficiently, e.g.
>>>> if there is only a small managed set of registered files they can
>>>> always have O(1) slot "lookup", and a couple of more use cases.
>>>
>>> Can you explain more what is slot "lookup", thanks. For me, it seems that
>>
>> I referred to nothing particular, just a way userspace finds a new index,
>> can be round robin or "index==fd".
>>
>>> use fd as slot is the simplest and most efficient way, user does not need to> mange slot info at all in userspace.
>>
>> As mentioned, it should be slightly more efficient to have a small table,
>> cache misses. Also, it's allocated with kvcalloc() so if it can't be
>> allocate physically contig memory it will set up virtual memory.
>>
>> So, if the userspace has some other way of indexing files, small tables
>> are preferred. For instance if it operates with 1-2 files, or stores files
>> in an array and the index in the array may serve the purpose, or any other
>> way. Also, additional memory for those who care.
> 
> Yeah, I agree with you that for small tables, current implementation seems good,
> 
> If user app just registers a small number of files, it may handle it well, but imagine
> 
> how netty, nginx or other network apps which will open thousands of socket files,
> 
> manage these socket files' slot info will be a obvious burden to developer, these
> 
> apps may need to develop a private component to record used or free slot. Especially
> 
> in a high concurrency scenario, frequent sockes opened or closed, this private component
> 
> may need locks to protect, that means this private component will introduce overhead too.
> 
> For a fd, vfs layer has already ensure its unique.
> 
>>
>>>> If userspace wants to mimic a fdtable into io_uring's registered table,
>>>> it's possible to do as is and without extra fdtable tracking
>>>>
>>>> fd = open();
>>>> io_uring_update_slot(off=fd, fd=fd);
>>>
>>> No, currently it's hard to do above work, unless we register a big number of files initially.
>>
>> If they intend to use a big number of files that's the way to go. They
>> can unregister/register if needed, usual grow factor=2  should make
>> it workable.
> 
> I'm not sure un-register/register are appropriate,say a app registers 1000 files, then
> 
> it needs to un-register 1000 files firstly, there are doubts whether can do this un-registration
> 
> work, if some of these files are used by other threads, which submit sqes with FIXED_FILE
> 
> flags continually, so the first un-registration work needs to synchronize with threads which
> 
> are submitting requests. And later app needs to prepare a new files array, saving current 1000
> 
> files and new files info to this new array, for me, it can works, but not efficient and somewhat
> 
> hard to use :)

Sounds reasonable. What I oppose is wiring it solely based on fd. On the
other hand, it sounds what you need is a "grow table" feature.

We can also think about adding new format, instead of array of fds, add
passing an array of pairs {offset, fd}.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] io_uring: improve register file feature's usability
  2021-10-14  9:43           ` Pavel Begunkov
@ 2021-10-21  8:40             ` Xiaoguang Wang
  2021-10-25  9:43               ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Xiaoguang Wang @ 2021-10-21  8:40 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe

hi,


> On 10/13/21 04:32, Xiaoguang Wang wrote:
>> hi,
>>> On 10/12/21 14:11, Xiaoguang Wang wrote:
>>>>> On 10/12/21 09:48, Xiaoguang Wang wrote:
>>>>>> The idea behind register file feature is good and 
>>>>>> straightforward, but
>>>>>> there is a very big issue that it's hard to use for user apps. 
>>>>>> User apps
>>>>>> need to bind slot info to file descriptor. For example, user app 
>>>>>> wants
>>>>>> to register a file, then it first needs to find a free slot in 
>>>>>> register
>>>>>> file infrastructure, that means user app needs to maintain slot 
>>>>>> info in
>>>>>> userspace, which is a obvious burden for userspace developers.
>>>>>
>>>>> Slot allocation is specifically entirely given away to the userspace,
>>>>> the userspace has more info and can use it more efficiently, e.g.
>>>>> if there is only a small managed set of registered files they can
>>>>> always have O(1) slot "lookup", and a couple of more use cases.
>>>>
>>>> Can you explain more what is slot "lookup", thanks. For me, it 
>>>> seems that
>>>
>>> I referred to nothing particular, just a way userspace finds a new 
>>> index,
>>> can be round robin or "index==fd".
>>>
>>>> use fd as slot is the simplest and most efficient way, user does 
>>>> not need to> mange slot info at all in userspace.
>>>
>>> As mentioned, it should be slightly more efficient to have a small 
>>> table,
>>> cache misses. Also, it's allocated with kvcalloc() so if it can't be
>>> allocate physically contig memory it will set up virtual memory.
>>>
>>> So, if the userspace has some other way of indexing files, small tables
>>> are preferred. For instance if it operates with 1-2 files, or stores 
>>> files
>>> in an array and the index in the array may serve the purpose, or any 
>>> other
>>> way. Also, additional memory for those who care.
>>
>> Yeah, I agree with you that for small tables, current implementation 
>> seems good,
>>
>> If user app just registers a small number of files, it may handle it 
>> well, but imagine
>>
>> how netty, nginx or other network apps which will open thousands of 
>> socket files,
>>
>> manage these socket files' slot info will be a obvious burden to 
>> developer, these
>>
>> apps may need to develop a private component to record used or free 
>> slot. Especially
>>
>> in a high concurrency scenario, frequent sockes opened or closed, 
>> this private component
>>
>> may need locks to protect, that means this private component will 
>> introduce overhead too.
>>
>> For a fd, vfs layer has already ensure its unique.
>>
>>>
>>>>> If userspace wants to mimic a fdtable into io_uring's registered 
>>>>> table,
>>>>> it's possible to do as is and without extra fdtable tracking
>>>>>
>>>>> fd = open();
>>>>> io_uring_update_slot(off=fd, fd=fd);
>>>>
>>>> No, currently it's hard to do above work, unless we register a big 
>>>> number of files initially.
>>>
>>> If they intend to use a big number of files that's the way to go. They
>>> can unregister/register if needed, usual grow factor=2  should make
>>> it workable.
>>
>> I'm not sure un-register/register are appropriate,say a app registers 
>> 1000 files, then
>>
>> it needs to un-register 1000 files firstly, there are doubts whether 
>> can do this un-registration
>>
>> work, if some of these files are used by other threads, which submit 
>> sqes with FIXED_FILE
>>
>> flags continually, so the first un-registration work needs to 
>> synchronize with threads which
>>
>> are submitting requests. And later app needs to prepare a new files 
>> array, saving current 1000
>>
>> files and new files info to this new array, for me, it can works, but 
>> not efficient and somewhat
>>
>> hard to use :)
>
> Sounds reasonable. What I oppose is wiring it solely based on fd. On the

Are the main concerns are that you worry about the possible big memory 
consumption, which

also may not be allocated physically continuous?  If user app open 
thousands of files, but only

make a small set of files registered, this method is really not good.


What about adding a new flag, like IORING_SETUP_REGISTER_FILES_BY_FD. If 
user creates

a uring instance with this flag, we'll support register files by fd. App 
that make most of its opened

files registered will benefit from this feature, not to maintain slot 
offset info anymore.


Considering the future, once io_uring becomes the main program 
interface, every file maybe

opened by io_uring, so we can register every file opened by io_uring, 
after all, file registration

feature gives performance improvements. In this scenario, this new 
registration method seems

simplest.


> other hand, it sounds what you need is a "grow table" feature.

No, it's just a result. What I want is that we can use fd as slot info 
to register files. Once a new fd

is returned by open(2), it means the slot indexed by this fd in io_uring 
io_file_table can be updated

safely, which is convenient for user app. "grow table" feature is just 
used to implement this support.


>
> We can also think about adding new format, instead of array of fds, add
> passing an array of pairs {offset, fd}.

Can you explain more about this format, or does this will simply user 
apps' slot info maintain burden?


Regards,

Xiaoguang Wang


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/1] io_uring: improve register file feature's usability
  2021-10-21  8:40             ` Xiaoguang Wang
@ 2021-10-25  9:43               ` Pavel Begunkov
  0 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2021-10-25  9:43 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe

On 10/21/21 09:40, Xiaoguang Wang wrote:
>> On 10/13/21 04:32, Xiaoguang Wang wrote:
>>> hi,
>>>> On 10/12/21 14:11, Xiaoguang Wang wrote:
>>>>>> On 10/12/21 09:48, Xiaoguang Wang wrote:
>>>>>>> The idea behind register file feature is good and straightforward, but
>>>>>>> there is a very big issue that it's hard to use for user apps. User apps
>>>>>>> need to bind slot info to file descriptor. For example, user app wants
>>>>>>> to register a file, then it first needs to find a free slot in register
>>>>>>> file infrastructure, that means user app needs to maintain slot info in
>>>>>>> userspace, which is a obvious burden for userspace developers.
>>>>>>
>>>>>> Slot allocation is specifically entirely given away to the userspace,
>>>>>> the userspace has more info and can use it more efficiently, e.g.
>>>>>> if there is only a small managed set of registered files they can
>>>>>> always have O(1) slot "lookup", and a couple of more use cases.
>>>>>
>>>>> Can you explain more what is slot "lookup", thanks. For me, it seems that
>>>>
>>>> I referred to nothing particular, just a way userspace finds a new index,
>>>> can be round robin or "index==fd".
>>>>
>>>>> use fd as slot is the simplest and most efficient way, user does not need to> mange slot info at all in userspace.
>>>>
>>>> As mentioned, it should be slightly more efficient to have a small table,
>>>> cache misses. Also, it's allocated with kvcalloc() so if it can't be
>>>> allocate physically contig memory it will set up virtual memory.
>>>>
>>>> So, if the userspace has some other way of indexing files, small tables
>>>> are preferred. For instance if it operates with 1-2 files, or stores files
>>>> in an array and the index in the array may serve the purpose, or any other
>>>> way. Also, additional memory for those who care.
>>>
>>> Yeah, I agree with you that for small tables, current implementation seems good,
>>>
>>> If user app just registers a small number of files, it may handle it well, but imagine
>>>
>>> how netty, nginx or other network apps which will open thousands of socket files,
>>>
>>> manage these socket files' slot info will be a obvious burden to developer, these
>>>
>>> apps may need to develop a private component to record used or free slot. Especially
>>>
>>> in a high concurrency scenario, frequent sockes opened or closed, this private component
>>>
>>> may need locks to protect, that means this private component will introduce overhead too.
>>>
>>> For a fd, vfs layer has already ensure its unique.
>>>
>>>>
>>>>>> If userspace wants to mimic a fdtable into io_uring's registered table,
>>>>>> it's possible to do as is and without extra fdtable tracking
>>>>>>
>>>>>> fd = open();
>>>>>> io_uring_update_slot(off=fd, fd=fd);
>>>>>
>>>>> No, currently it's hard to do above work, unless we register a big number of files initially.
>>>>
>>>> If they intend to use a big number of files that's the way to go. They
>>>> can unregister/register if needed, usual grow factor=2  should make
>>>> it workable.
>>>
>>> I'm not sure un-register/register are appropriate,say a app registers 1000 files, then
>>>
>>> it needs to un-register 1000 files firstly, there are doubts whether can do this un-registration
>>>
>>> work, if some of these files are used by other threads, which submit sqes with FIXED_FILE
>>>
>>> flags continually, so the first un-registration work needs to synchronize with threads which
>>>
>>> are submitting requests. And later app needs to prepare a new files array, saving current 1000
>>>
>>> files and new files info to this new array, for me, it can works, but not efficient and somewhat
>>>
>>> hard to use :)
>>
>> Sounds reasonable. What I oppose is wiring it solely based on fd. On the
> 
> Are the main concerns are that you worry about the possible big memory consumption, which
> 
> also may not be allocated physically continuous?  If user app open thousands of files, but only
> 
> make a small set of files registered, this method is really not good.
> 
> 
> What about adding a new flag, like IORING_SETUP_REGISTER_FILES_BY_FD. If user creates
> 
> a uring instance with this flag, we'll support register files by fd. App that make most of its opened
> 
> files registered will benefit from this feature, not to maintain slot offset info anymore.
> 
> 
> Considering the future, once io_uring becomes the main program interface, every file maybe
> 
> opened by io_uring, so we can register every file opened by io_uring, after all, file registration
> 
> feature gives performance improvements. In this scenario, this new registration method seems
> 
> simplest.
> 
> 
>> other hand, it sounds what you need is a "grow table" feature.
> 
> No, it's just a result. What I want is that we can use fd as slot info to register files. Once a new fd
> 
> is returned by open(2), it means the slot indexed by this fd in io_uring io_file_table can be updated
> 
> safely, which is convenient for user app. "grow table" feature is just used to implement this support.

You may put it this way. But if the same can be done with smaller features
that can also be used also for other purposes it's preferable. Growing
table may be useful for others not having problems with fds.

>> We can also think about adding new format, instead of array of fds, add
>> passing an array of pairs {offset, fd}.
> 
> Can you explain more about this format, or does this will simply user apps' slot info maintain burden?

Currently, if you're updating slots 1 and 1000 in one operation, you'd
need to pass an array of 1000 elements with -1 between the indexes.
With the mentioned format it would be an array of 2 pairs
{{offset=1, fd1}, {offset=1000, fd2}}

If that's not a problem in your case (e.g. updating only by 1 slot at
a time), then we can just forget about it.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-10-25  9:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12  8:48 [RFC 0/1] Is register file feature hard to use ? Xiaoguang Wang
2021-10-12  8:48 ` [RFC 1/1] io_uring: improve register file feature's usability Xiaoguang Wang
2021-10-12 11:10   ` Pavel Begunkov
2021-10-12 13:11     ` Xiaoguang Wang
2021-10-12 14:33       ` Pavel Begunkov
2021-10-13  3:32         ` Xiaoguang Wang
2021-10-14  9:43           ` Pavel Begunkov
2021-10-21  8:40             ` Xiaoguang Wang
2021-10-25  9:43               ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).