[PATCH v4 0/3] aio: ctx->dead cleanups

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/3] aio: ctx->dead cleanups
@ 2015-06-18 18:39 Oleg Nesterov
  2015-06-18 18:40 ` [PATCH v4 1/3] aio_ring_remap: kill the bogus ctx->dead check Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Oleg Nesterov @ 2015-06-18 18:39 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Benjamin LaHaise, Jeff Moyer
  Cc: linux-aio, linux-kernel

> As Jeff suggested, 1/3 was changed to simply remove the ->dead check.
> I also updated the changelog.
>
> Jeff, I preserved your acks in 2-3.

also remove ctx->dead setting in ioctx_alloc().

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/3] aio_ring_remap: kill the bogus ctx->dead check
  2015-06-18 18:39 [PATCH v4 0/3] aio: ctx->dead cleanups Oleg Nesterov
@ 2015-06-18 18:40 ` Oleg Nesterov
  2015-06-18 20:01   ` Jeff Moyer
  2015-06-18 18:40 ` [PATCH v4 2/3] aio: make aio_ring->dead boolean Oleg Nesterov
  2015-06-18 18:40 ` [PATCH v4 3/3] aio_free_ring: don't do page_count(NULL) Oleg Nesterov
  2 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2015-06-18 18:40 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Benjamin LaHaise, Jeff Moyer
  Cc: linux-aio, linux-kernel

kill_ioctx() sets ctx->dead and removes ctx from ->ioctx_table
"atomically" under mm->ioctx_lock, so aio_ring_remap() can never
see a dead ctx.

And even -EINVAL doesn't look necessary. Yes, if mremap() races
with kill_ioctx() vm_munmap(ctx->mmap_base, ctx->mmap_size) can
unmap the wrong region. In this case the buggy application should
blame itself. And there are other reasons why that vm_munmap() can
be wrong. Say, an application can mremap() the part of aio region
and then do io_destroy(). We could change aio_ring_remap() to
verify vma->that vma_end - vma->vma_start == ctx->mmap_size but
this won't help if the application does munmap() instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |    9 +++------
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 480440f..893d300 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -325,14 +325,11 @@ static int aio_ring_remap(struct file *file, struct vm_area_struct *vma)
 	rcu_read_lock();
 	table = rcu_dereference(mm->ioctx_table);
 	for (i = 0; i < table->nr; i++) {
-		struct kioctx *ctx;
+		struct kioctx *ctx = table->table[i];

-		ctx = table->table[i];
 		if (ctx && ctx->aio_ring_file == file) {
-			if (!atomic_read(&ctx->dead)) {
-				ctx->user_id = ctx->mmap_base = vma->vm_start;
-				res = 0;
-			}
+			ctx->user_id = ctx->mmap_base = vma->vm_start;
+			res = 0;
 			break;
 		}
 	}
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/3] aio: make aio_ring->dead boolean
  2015-06-18 18:39 [PATCH v4 0/3] aio: ctx->dead cleanups Oleg Nesterov
  2015-06-18 18:40 ` [PATCH v4 1/3] aio_ring_remap: kill the bogus ctx->dead check Oleg Nesterov
@ 2015-06-18 18:40 ` Oleg Nesterov
  2015-06-18 18:40 ` [PATCH v4 3/3] aio_free_ring: don't do page_count(NULL) Oleg Nesterov
  2 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2015-06-18 18:40 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Benjamin LaHaise, Jeff Moyer
  Cc: linux-aio, linux-kernel

"atomic_t dead" makes no sense. atomic_read() is the plain LOAD,
it doesn't have some "additional" synchronization with xchg().

And now that kill_ioctx() sets "dead" under mm->ioctx_lock we do
not even need xchg().

Also, the error path in ioctx_alloc() sets ->dead for no reason,
remove this.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 893d300..4a360be 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -84,7 +84,7 @@ struct ctx_rq_wait {
 
 struct kioctx {
 	struct percpu_ref	users;
-	atomic_t		dead;
+	bool			dead;
 
 	struct percpu_ref	reqs;
 
@@ -765,7 +765,6 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
 err_cleanup:
 	aio_nr_sub(ctx->max_reqs);
 err_ctx:
-	atomic_set(&ctx->dead, 1);
 	if (ctx->mmap_size)
 		vm_munmap(ctx->mmap_base, ctx->mmap_size);
 	aio_free_ring(ctx);
@@ -790,11 +789,12 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx,
 	struct kioctx_table *table;
 
 	spin_lock(&mm->ioctx_lock);
-	if (atomic_xchg(&ctx->dead, 1)) {
+	if (unlikely(ctx->dead)) {
 		spin_unlock(&mm->ioctx_lock);
 		return -EINVAL;
 	}
 
+	ctx->dead = true;
 	table = rcu_dereference_raw(mm->ioctx_table);
 	WARN_ON(ctx != table->table[ctx->id]);
 	table->table[ctx->id] = NULL;
@@ -1236,7 +1236,7 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr,
 	if (ret > 0)
 		*i += ret;
 
-	if (unlikely(atomic_read(&ctx->dead)))
+	if (unlikely(ctx->dead))
 		ret = -EINVAL;
 
 	if (!*i)
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 3/3] aio_free_ring: don't do page_count(NULL)
  2015-06-18 18:39 [PATCH v4 0/3] aio: ctx->dead cleanups Oleg Nesterov
  2015-06-18 18:40 ` [PATCH v4 1/3] aio_ring_remap: kill the bogus ctx->dead check Oleg Nesterov
  2015-06-18 18:40 ` [PATCH v4 2/3] aio: make aio_ring->dead boolean Oleg Nesterov
@ 2015-06-18 18:40 ` Oleg Nesterov
  2 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2015-06-18 18:40 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Benjamin LaHaise, Jeff Moyer
  Cc: linux-aio, linux-kernel

aio_free_ring() can actually see the NULL page in ->ring_pages[],
this can happen if aio_setup_ring() fails.

And in this case page_count(ctx->ring_pages[i]) can OOPS.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/aio.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 4a360be..9bc1335 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -292,12 +292,12 @@ static void aio_free_ring(struct kioctx *ctx)
 	put_aio_ring_file(ctx);
 
 	for (i = 0; i < ctx->nr_pages; i++) {
-		struct page *page;
-		pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i,
-				page_count(ctx->ring_pages[i]));
-		page = ctx->ring_pages[i];
+		struct page *page = ctx->ring_pages[i];
 		if (!page)
 			continue;
+
+		pr_debug("pid(%d) [%d] page->count=%d\n",
+				current->pid, i, page_count(page));
 		ctx->ring_pages[i] = NULL;
 		put_page(page);
 	}
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/3] aio_ring_remap: kill the bogus ctx->dead check
  2015-06-18 18:40 ` [PATCH v4 1/3] aio_ring_remap: kill the bogus ctx->dead check Oleg Nesterov
@ 2015-06-18 20:01   ` Jeff Moyer
  2015-06-19 17:55     ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Moyer @ 2015-06-18 20:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Andrew Morton, Benjamin LaHaise, linux-aio, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> kill_ioctx() sets ctx->dead and removes ctx from ->ioctx_table
> "atomically" under mm->ioctx_lock, so aio_ring_remap() can never
> see a dead ctx.
>
> And even -EINVAL doesn't look necessary. Yes, if mremap() races
> with kill_ioctx() vm_munmap(ctx->mmap_base, ctx->mmap_size) can
> unmap the wrong region. In this case the buggy application should
> blame itself. And there are other reasons why that vm_munmap() can
> be wrong. Say, an application can mremap() the part of aio region
> and then do io_destroy(). We could change aio_ring_remap() to
> verify vma->that vma_end - vma->vma_start == ctx->mmap_size but
> this won't help if the application does munmap() instead.

I don't think this paragraph really fits with the patch.  It's
interesting commentary, but if you feel strongly enough about it, send a
patch that undoes b2edffdd912b.  ;-)

So, drop that paragraph from the commit message, and you can add my
reviewed by.  The patch itself looks good to me.

Cheers,
Jeff

>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  fs/aio.c |    9 +++------
>  1 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 480440f..893d300 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -325,14 +325,11 @@ static int aio_ring_remap(struct file *file, struct vm_area_struct *vma)
>  	rcu_read_lock();
>  	table = rcu_dereference(mm->ioctx_table);
>  	for (i = 0; i < table->nr; i++) {
> -		struct kioctx *ctx;
> +		struct kioctx *ctx = table->table[i];
>  
> -		ctx = table->table[i];
>  		if (ctx && ctx->aio_ring_file == file) {
> -			if (!atomic_read(&ctx->dead)) {
> -				ctx->user_id = ctx->mmap_base = vma->vm_start;
> -				res = 0;
> -			}
> +			ctx->user_id = ctx->mmap_base = vma->vm_start;
> +			res = 0;
>  			break;
>  		}
>  	}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/3] aio_ring_remap: kill the bogus ctx->dead check
  2015-06-18 20:01   ` Jeff Moyer
@ 2015-06-19 17:55     ` Oleg Nesterov
  2015-06-19 18:46       ` Jeff Moyer
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2015-06-19 17:55 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Al Viro, Andrew Morton, Benjamin LaHaise, linux-aio, linux-kernel

On 06/18, Jeff Moyer wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > kill_ioctx() sets ctx->dead and removes ctx from ->ioctx_table
> > "atomically" under mm->ioctx_lock, so aio_ring_remap() can never
> > see a dead ctx.
> >
> > And even -EINVAL doesn't look necessary. Yes, if mremap() races
> > with kill_ioctx() vm_munmap(ctx->mmap_base, ctx->mmap_size) can
> > unmap the wrong region. In this case the buggy application should
> > blame itself. And there are other reasons why that vm_munmap() can
> > be wrong. Say, an application can mremap() the part of aio region
> > and then do io_destroy(). We could change aio_ring_remap() to
> > verify vma->that vma_end - vma->vma_start == ctx->mmap_size but
> > this won't help if the application does munmap() instead.
>
> I don't think this paragraph really fits with the patch.  It's
> interesting commentary,

Well, this time I disagree. It would be better to add a comment, but the
changelog can help too to understand the code and potential problems if
it races with kill_ioctx().

> but if you feel strongly enough about it, send a
> patch that undoes b2edffdd912b.  ;-)

No, I think that commit makes sense. Just it was wrong and we need to
fix it. And, in particular, its changelog was wrong (at least confusing),
it looks as if there are strong reasons to prevent this race. So the
changelog above can unconfuse the git-log reader.


But. I never argue with maintainers about non-technical issues ;)

> So, drop that paragraph from the commit message, and you can add my
> reviewed by.  The patch itself looks good to me.

OK, I am sending v5 with your reviewed-by.

Thanks!

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/3] aio_ring_remap: kill the bogus ctx->dead check
  2015-06-19 17:55     ` Oleg Nesterov
@ 2015-06-19 18:46       ` Jeff Moyer
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Moyer @ 2015-06-19 18:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Andrew Morton, Benjamin LaHaise, linux-aio, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> On 06/18, Jeff Moyer wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > kill_ioctx() sets ctx->dead and removes ctx from ->ioctx_table
>> > "atomically" under mm->ioctx_lock, so aio_ring_remap() can never
>> > see a dead ctx.
>> >
>> > And even -EINVAL doesn't look necessary. Yes, if mremap() races
>> > with kill_ioctx() vm_munmap(ctx->mmap_base, ctx->mmap_size) can
>> > unmap the wrong region. In this case the buggy application should
>> > blame itself. And there are other reasons why that vm_munmap() can
>> > be wrong. Say, an application can mremap() the part of aio region
>> > and then do io_destroy(). We could change aio_ring_remap() to
>> > verify vma->that vma_end - vma->vma_start == ctx->mmap_size but
>> > this won't help if the application does munmap() instead.
>>
>> I don't think this paragraph really fits with the patch.  It's
>> interesting commentary,
>
> Well, this time I disagree. It would be better to add a comment, but the
> changelog can help too to understand the code and potential problems if
> it races with kill_ioctx().
>
>> but if you feel strongly enough about it, send a
>> patch that undoes b2edffdd912b.  ;-)
>
> No, I think that commit makes sense. Just it was wrong and we need to
> fix it. And, in particular, its changelog was wrong (at least confusing),
> it looks as if there are strong reasons to prevent this race. So the
> changelog above can unconfuse the git-log reader.
>
>
> But. I never argue with maintainers about non-technical issues ;)

Well, then we can have Ben weigh in.  I'm not trying to make a huge deal
out of it.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-06-19 18:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-18 18:39 [PATCH v4 0/3] aio: ctx->dead cleanups Oleg Nesterov
2015-06-18 18:40 ` [PATCH v4 1/3] aio_ring_remap: kill the bogus ctx->dead check Oleg Nesterov
2015-06-18 20:01   ` Jeff Moyer
2015-06-19 17:55     ` Oleg Nesterov
2015-06-19 18:46       ` Jeff Moyer
2015-06-18 18:40 ` [PATCH v4 2/3] aio: make aio_ring->dead boolean Oleg Nesterov
2015-06-18 18:40 ` [PATCH v4 3/3] aio_free_ring: don't do page_count(NULL) Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.