io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] io_uring: disallow overlapping ranges for buffer registration
@ 2020-06-12  2:23 Bijan Mottahedeh
  2020-06-12  2:23 ` [RFC 1/2] " Bijan Mottahedeh
  2020-06-12  2:23 ` [RFC 2/2] io_uring: report pinned memory usage Bijan Mottahedeh
  0 siblings, 2 replies; 11+ messages in thread
From: Bijan Mottahedeh @ 2020-06-12  2:23 UTC (permalink / raw)
  To: axboe; +Cc: io-uring

This patch set has a couple of RFCs.

The first patch aims to disallow overlapping of user buffers for
registration.  For example, right now it is possible to register the same
identical buffer many times with the same system call which doesn't really
make sense.  I'm not sure if there is a valid use case for overlapping
buffers.  I think this check by itself might not be sufficient and more
restrictions may be needed but I figured to send this out for feedback and
check whether overlapping buffers should be allowed in the first place.

The second patch aims to separate reporting and enforcing of pinned
memory usage.  Reporting the usage seems like a good idea even if no
limit is enforced and ctx->account_mem is zero.

However, I'm clear on the proper setting and usage of
	user->locked_vm
	mm->locked_vm
	mm->pinned_vm

Looking at some other pin_user_page() callers, it seem mm->pinned_vm
should be used by I'm not sure.

Bijan Mottahedeh (2):
  io_uring: disallow overlapping ranges for buffer registration
  io_uring: report pinned memory usage

 fs/io_uring.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC 1/2] io_uring: disallow overlapping ranges for buffer registration
  2020-06-12  2:23 [RFC 0/2] io_uring: disallow overlapping ranges for buffer registration Bijan Mottahedeh
@ 2020-06-12  2:23 ` Bijan Mottahedeh
  2020-06-12 15:16   ` Jens Axboe
  2020-06-12  2:23 ` [RFC 2/2] io_uring: report pinned memory usage Bijan Mottahedeh
  1 sibling, 1 reply; 11+ messages in thread
From: Bijan Mottahedeh @ 2020-06-12  2:23 UTC (permalink / raw)
  To: axboe; +Cc: io-uring

Buffer registration is expensive in terms of cpu/mem overhead and there
seems no good reason to allow overlapping ranges.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
---
 fs/io_uring.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9158130..4248726 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7211,6 +7211,12 @@ static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst,
 	return 0;
 }
 
+static inline int iov_overlap(struct iovec *v1, struct iovec *v2)
+{
+	return (v1->iov_base <= (v2->iov_base + v2->iov_len - 1) &&
+		v2->iov_base <= (v1->iov_base + v1->iov_len - 1));
+}
+
 static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 				  unsigned nr_args)
 {
@@ -7233,7 +7239,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 		struct io_mapped_ubuf *imu = &ctx->user_bufs[i];
 		unsigned long off, start, end, ubuf;
 		int pret, nr_pages;
-		struct iovec iov;
+		struct iovec iov, prv_iov;
 		size_t size;
 
 		ret = io_copy_iov(ctx, &iov, arg, i);
@@ -7258,6 +7264,12 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 		start = ubuf >> PAGE_SHIFT;
 		nr_pages = end - start;
 
+		/* disallow overlapping buffers */
+		if (i > 0 && iov_overlap(&prv_iov, &iov))
+			goto err;
+
+		prv_iov = iov;
+
 		if (ctx->account_mem) {
 			ret = io_account_mem(ctx->user, nr_pages);
 			if (ret)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC 2/2] io_uring: report pinned memory usage
  2020-06-12  2:23 [RFC 0/2] io_uring: disallow overlapping ranges for buffer registration Bijan Mottahedeh
  2020-06-12  2:23 ` [RFC 1/2] " Bijan Mottahedeh
@ 2020-06-12  2:23 ` Bijan Mottahedeh
  2020-06-12 15:16   ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Bijan Mottahedeh @ 2020-06-12  2:23 UTC (permalink / raw)
  To: axboe; +Cc: io-uring

Long term, it makes sense to separate reporting and enforcing of pinned
memory usage.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>

It is useful to view
---
 fs/io_uring.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4248726..cf3acaa 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx,
 static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages)
 {
 	atomic_long_sub(nr_pages, &user->locked_vm);
+	if (current->mm)
+		atomic_long_sub(nr_pages, &current->mm->pinned_vm);
 }
 
 static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
@@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
 			return -ENOMEM;
 	} while (atomic_long_cmpxchg(&user->locked_vm, cur_pages,
 					new_pages) != cur_pages);
+	if (current->mm)
+		atomic_long_add(nr_pages, &current->mm->pinned_vm);
 
 	return 0;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC 1/2] io_uring: disallow overlapping ranges for buffer registration
  2020-06-12  2:23 ` [RFC 1/2] " Bijan Mottahedeh
@ 2020-06-12 15:16   ` Jens Axboe
  2020-06-12 18:22     ` Bijan Mottahedeh
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2020-06-12 15:16 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring

On 6/11/20 8:23 PM, Bijan Mottahedeh wrote:
> Buffer registration is expensive in terms of cpu/mem overhead and there
> seems no good reason to allow overlapping ranges.

There's also no good reason to disallow it imho, so not sure we should
be doing that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 2/2] io_uring: report pinned memory usage
  2020-06-12  2:23 ` [RFC 2/2] io_uring: report pinned memory usage Bijan Mottahedeh
@ 2020-06-12 15:16   ` Jens Axboe
  2020-06-12 15:19     ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2020-06-12 15:16 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring

On 6/11/20 8:23 PM, Bijan Mottahedeh wrote:
> Long term, it makes sense to separate reporting and enforcing of pinned
> memory usage.
> 
> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
> 
> It is useful to view
> ---
>  fs/io_uring.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 4248726..cf3acaa 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx,
>  static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages)
>  {
>  	atomic_long_sub(nr_pages, &user->locked_vm);
> +	if (current->mm)
> +		atomic_long_sub(nr_pages, &current->mm->pinned_vm);
>  }
>  
>  static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
> @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
>  			return -ENOMEM;
>  	} while (atomic_long_cmpxchg(&user->locked_vm, cur_pages,
>  					new_pages) != cur_pages);
> +	if (current->mm)
> +		atomic_long_add(nr_pages, &current->mm->pinned_vm);
>  
>  	return 0;
>  }

current->mm should always be valid for these, so I think you can skip the
checking of that and just make it unconditional.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 2/2] io_uring: report pinned memory usage
  2020-06-12 15:16   ` Jens Axboe
@ 2020-06-12 15:19     ` Jens Axboe
  2020-06-12 15:50       ` Jens Axboe
  2020-06-13  4:43       ` Bijan Mottahedeh
  0 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2020-06-12 15:19 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring

On 6/12/20 9:16 AM, Jens Axboe wrote:
> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote:
>> Long term, it makes sense to separate reporting and enforcing of pinned
>> memory usage.
>>
>> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
>>
>> It is useful to view
>> ---
>>  fs/io_uring.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 4248726..cf3acaa 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx,
>>  static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages)
>>  {
>>  	atomic_long_sub(nr_pages, &user->locked_vm);
>> +	if (current->mm)
>> +		atomic_long_sub(nr_pages, &current->mm->pinned_vm);
>>  }
>>  
>>  static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
>> @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
>>  			return -ENOMEM;
>>  	} while (atomic_long_cmpxchg(&user->locked_vm, cur_pages,
>>  					new_pages) != cur_pages);
>> +	if (current->mm)
>> +		atomic_long_add(nr_pages, &current->mm->pinned_vm);
>>  
>>  	return 0;
>>  }
> 
> current->mm should always be valid for these, so I think you can skip the
> checking of that and just make it unconditional.

Two other issues with this:

- It's an atomic64, so seems more appropriate to use the atomic64 helpers
  for this one.
- The unaccount could potentially be a different mm, if the ring is shared
  and one task sets it up while another tears it down. So we'd need something
  to ensure consistency here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 2/2] io_uring: report pinned memory usage
  2020-06-12 15:19     ` Jens Axboe
@ 2020-06-12 15:50       ` Jens Axboe
  2020-06-13  4:43       ` Bijan Mottahedeh
  1 sibling, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2020-06-12 15:50 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring

On 6/12/20 9:19 AM, Jens Axboe wrote:
> On 6/12/20 9:16 AM, Jens Axboe wrote:
>> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote:
>>> Long term, it makes sense to separate reporting and enforcing of pinned
>>> memory usage.
>>>
>>> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
>>>
>>> It is useful to view
>>> ---
>>>  fs/io_uring.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 4248726..cf3acaa 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx,
>>>  static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages)
>>>  {
>>>  	atomic_long_sub(nr_pages, &user->locked_vm);
>>> +	if (current->mm)
>>> +		atomic_long_sub(nr_pages, &current->mm->pinned_vm);
>>>  }
>>>  
>>>  static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
>>> @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
>>>  			return -ENOMEM;
>>>  	} while (atomic_long_cmpxchg(&user->locked_vm, cur_pages,
>>>  					new_pages) != cur_pages);
>>> +	if (current->mm)
>>> +		atomic_long_add(nr_pages, &current->mm->pinned_vm);
>>>  
>>>  	return 0;
>>>  }
>>
>> current->mm should always be valid for these, so I think you can skip the
>> checking of that and just make it unconditional.
> 
> Two other issues with this:
> 
> - It's an atomic64, so seems more appropriate to use the atomic64 helpers
>   for this one.
> - The unaccount could potentially be a different mm, if the ring is shared
>   and one task sets it up while another tears it down. So we'd need something
>   to ensure consistency here.

IOW, this should just use ctx->sqo_mm, as that's grabbed and valid. Just need
to ensure it's done correct in terms of ordering for setup.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/2] io_uring: disallow overlapping ranges for buffer registration
  2020-06-12 15:16   ` Jens Axboe
@ 2020-06-12 18:22     ` Bijan Mottahedeh
  2020-06-14 15:56       ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Bijan Mottahedeh @ 2020-06-12 18:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

On 6/12/2020 8:16 AM, Jens Axboe wrote:
> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote:
>> Buffer registration is expensive in terms of cpu/mem overhead and there
>> seems no good reason to allow overlapping ranges.
> There's also no good reason to disallow it imho, so not sure we should
> be doing that.
>

My concern was about a malicious user without CAP_IPC_LOCK abusing its 
memlock limit and repeatedly register the same buffer as that would tie 
up cpu/mem resources to pin up to 1TB of memory, but maybe the argument 
is that the user should have not been granted that large of memlock 
limit?  Also, without any restrictions, there are a huge number of ways 
overlapping ranges could be specified, creating a very large validation 
matrix.  What the use cases for that flexibility are though, I don't know.

--bijan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 2/2] io_uring: report pinned memory usage
  2020-06-12 15:19     ` Jens Axboe
  2020-06-12 15:50       ` Jens Axboe
@ 2020-06-13  4:43       ` Bijan Mottahedeh
  2020-06-14 15:57         ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Bijan Mottahedeh @ 2020-06-13  4:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

On 6/12/2020 8:19 AM, Jens Axboe wrote:
> On 6/12/20 9:16 AM, Jens Axboe wrote:
>> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote:
>>> Long term, it makes sense to separate reporting and enforcing of pinned
>>> memory usage.
>>>
>>> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
>>>
>>> It is useful to view
>>> ---
>>>   fs/io_uring.c | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 4248726..cf3acaa 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx,
>>>   static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages)
>>>   {
>>>   	atomic_long_sub(nr_pages, &user->locked_vm);
>>> +	if (current->mm)
>>> +		atomic_long_sub(nr_pages, &current->mm->pinned_vm);
>>>   }
>>>   
>>>   static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
>>> @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
>>>   			return -ENOMEM;
>>>   	} while (atomic_long_cmpxchg(&user->locked_vm, cur_pages,
>>>   					new_pages) != cur_pages);
>>> +	if (current->mm)
>>> +		atomic_long_add(nr_pages, &current->mm->pinned_vm);
>>>   
>>>   	return 0;
>>>   }
>> current->mm should always be valid for these, so I think you can skip the
>> checking of that and just make it unconditional.
> Two other issues with this:
>
> - It's an atomic64, so seems more appropriate to use the atomic64 helpers
>    for this one.
> - The unaccount could potentially be a different mm, if the ring is shared
>    and one task sets it up while another tears it down. So we'd need something
>    to ensure consistency here.
>
Are you referring to a case where one process creates a ring and sends 
the ring fd to another process?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/2] io_uring: disallow overlapping ranges for buffer registration
  2020-06-12 18:22     ` Bijan Mottahedeh
@ 2020-06-14 15:56       ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2020-06-14 15:56 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring

On 6/12/20 12:22 PM, Bijan Mottahedeh wrote:
> On 6/12/2020 8:16 AM, Jens Axboe wrote:
>> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote:
>>> Buffer registration is expensive in terms of cpu/mem overhead and there
>>> seems no good reason to allow overlapping ranges.
>> There's also no good reason to disallow it imho, so not sure we should
>> be doing that.
>>
> 
> My concern was about a malicious user without CAP_IPC_LOCK abusing its 
> memlock limit and repeatedly register the same buffer as that would tie 
> up cpu/mem resources to pin up to 1TB of memory, but maybe the argument 
> is that the user should have not been granted that large of memlock 
> limit?  Also, without any restrictions, there are a huge number of ways 
> overlapping ranges could be specified, creating a very large validation 
> matrix.  What the use cases for that flexibility are though, I don't know.

Not sure I follow, so I must be missing something. We're accounting each
buffer as it is, so how are we abusing the limit?

My concern on the overlaps is mainly that someone could be
(inadvertently, perhaps) doing it already, and now we're breaking that.
Or maybe that are doing it purposely, for some reason I just don't know
about.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 2/2] io_uring: report pinned memory usage
  2020-06-13  4:43       ` Bijan Mottahedeh
@ 2020-06-14 15:57         ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2020-06-14 15:57 UTC (permalink / raw)
  To: Bijan Mottahedeh; +Cc: io-uring

On 6/12/20 10:43 PM, Bijan Mottahedeh wrote:
> On 6/12/2020 8:19 AM, Jens Axboe wrote:
>> On 6/12/20 9:16 AM, Jens Axboe wrote:
>>> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote:
>>>> Long term, it makes sense to separate reporting and enforcing of pinned
>>>> memory usage.
>>>>
>>>> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
>>>>
>>>> It is useful to view
>>>> ---
>>>>   fs/io_uring.c | 4 ++++
>>>>   1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 4248726..cf3acaa 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx,
>>>>   static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages)
>>>>   {
>>>>   	atomic_long_sub(nr_pages, &user->locked_vm);
>>>> +	if (current->mm)
>>>> +		atomic_long_sub(nr_pages, &current->mm->pinned_vm);
>>>>   }
>>>>   
>>>>   static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
>>>> @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages)
>>>>   			return -ENOMEM;
>>>>   	} while (atomic_long_cmpxchg(&user->locked_vm, cur_pages,
>>>>   					new_pages) != cur_pages);
>>>> +	if (current->mm)
>>>> +		atomic_long_add(nr_pages, &current->mm->pinned_vm);
>>>>   
>>>>   	return 0;
>>>>   }
>>> current->mm should always be valid for these, so I think you can skip the
>>> checking of that and just make it unconditional.
>> Two other issues with this:
>>
>> - It's an atomic64, so seems more appropriate to use the atomic64 helpers
>>    for this one.
>> - The unaccount could potentially be a different mm, if the ring is shared
>>    and one task sets it up while another tears it down. So we'd need something
>>    to ensure consistency here.
>>
> Are you referring to a case where one process creates a ring and sends 
> the ring fd to another process?

Or a simpler case, where someone has submissions and completions running
on separate threads, and it just so happens that the completion side is
the one to exit the ring.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-06-14 15:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-12  2:23 [RFC 0/2] io_uring: disallow overlapping ranges for buffer registration Bijan Mottahedeh
2020-06-12  2:23 ` [RFC 1/2] " Bijan Mottahedeh
2020-06-12 15:16   ` Jens Axboe
2020-06-12 18:22     ` Bijan Mottahedeh
2020-06-14 15:56       ` Jens Axboe
2020-06-12  2:23 ` [RFC 2/2] io_uring: report pinned memory usage Bijan Mottahedeh
2020-06-12 15:16   ` Jens Axboe
2020-06-12 15:19     ` Jens Axboe
2020-06-12 15:50       ` Jens Axboe
2020-06-13  4:43       ` Bijan Mottahedeh
2020-06-14 15:57         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).