All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] aio: ctx->dead cleanups
@ 2015-06-16 23:04 Oleg Nesterov
  2015-06-16 23:04 ` [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON() Oleg Nesterov
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-16 23:04 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Benjamin LaHaise, Jeff Moyer
  Cc: linux-aio, linux-kernel

Al, please help. We are trying to backport some aio fixes and I am
absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
race".


Firstly, I simply can't understand what exactly it tries to fix. OK,
aio_free_ring() can race with kill and we can remap the soon-to-be-killed
ctx. So what? kill_ioctx() will the the correct (already re-mapped)
ctx->mmap_base after it drops mm->ioctx_lock.

So it seems to me we only need this change to ensure that move_vma() can
not succeed if ctx was already removed from ->ioctx_table, or, if we race
with ioctx_alloc(), it was not added to ->ioctx_table. IOW, we need to
ensure that move_vma()->aio_ring_mmap() can not race with
vm_munmap(ctx->mmap_base) in kill_ioctx() or ioctx_alloc(). And this race
doesn't look really bad. The kernel can't crash, just the application can
fool itself.

But I guess I missed something, and I'd like to know what I have missed.
Could you explain?


Also. The change in move_vma() looks "obviously wrong". Don't we need
something like the patch at the end to ensure we do not "leak" new_vma
or I am totally confused?


But to me the main problem is atomic_read(ctx->dead) in aio_remap().
I mean, it complicates the backporting, and it looks unnecessary and
confusing. See the 1st patch.

Please review, I do not know how to test this.

Oleg.

--- x/mm/mremap.c
+++ x/mm/mremap.c
@@ -275,6 +275,8 @@ static unsigned long move_vma(struct vm_
 	moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len,
 				     need_rmap_locks);
 	if (moved_len < old_len) {
+		err = -ENOMEM;
+xxx:
 		/*
 		 * On error, move entries back from new area to old,
 		 * which will succeed since page tables still there,
@@ -285,14 +287,11 @@ static unsigned long move_vma(struct vm_
 		vma = new_vma;
 		old_len = new_len;
 		old_addr = new_addr;
-		new_addr = -ENOMEM;
+		new_addr = err;
 	} else if (vma->vm_file && vma->vm_file->f_op->mremap) {
 		err = vma->vm_file->f_op->mremap(vma->vm_file, new_vma);
-		if (err < 0) {
-			move_page_tables(new_vma, new_addr, vma, old_addr,
-					 moved_len, true);
-			return err;
-		}
+		if (err < 0)
+			goto xxx;
 	}
 
 	/* Conceal VM_ACCOUNT so old reservation is not undone */


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON()
  2015-06-16 23:04 [PATCH 0/3] aio: ctx->dead cleanups Oleg Nesterov
@ 2015-06-16 23:04 ` Oleg Nesterov
  2015-06-17 17:50   ` Jeff Moyer
  2015-06-16 23:04 ` [PATCH 2/3] aio: make aio_ring->dead boolean Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-16 23:04 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Benjamin LaHaise, Jeff Moyer
  Cc: linux-aio, linux-kernel

atomic_read(&ctx->dead) in aio_ring_remap() looks confusing.
kill_ioctx() sets ctx->dead and removes ctx from ->ioctx_table
"atomically" under mm->ioctx_lock, so aio_ring_remap() can never
see a dead ctx.

If we really want this check, we should put it under WARN_ON()
and it should not depend on aio_ring_file == file.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 480440f..0693333 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -325,14 +325,14 @@ static int aio_ring_remap(struct file *file, struct vm_area_struct *vma)
 	rcu_read_lock();
 	table = rcu_dereference(mm->ioctx_table);
 	for (i = 0; i < table->nr; i++) {
-		struct kioctx *ctx;
-
-		ctx = table->table[i];
-		if (ctx && ctx->aio_ring_file == file) {
-			if (!atomic_read(&ctx->dead)) {
-				ctx->user_id = ctx->mmap_base = vma->vm_start;
-				res = 0;
-			}
+		struct kioctx *ctx = table->table[i];
+
+		if (!ctx || WARN_ON(atomic_read(&ctx->dead)))
+			continue;
+
+		if (ctx->aio_ring_file == file) {
+			ctx->user_id = ctx->mmap_base = vma->vm_start;
+			res = 0;
 			break;
 		}
 	}
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] aio: make aio_ring->dead boolean
  2015-06-16 23:04 [PATCH 0/3] aio: ctx->dead cleanups Oleg Nesterov
  2015-06-16 23:04 ` [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON() Oleg Nesterov
@ 2015-06-16 23:04 ` Oleg Nesterov
  2015-06-16 23:50   ` [PATCH v2 " Oleg Nesterov
  2015-06-16 23:04 ` [PATCH 3/3] aio_free_ring: don't do page_count(NULL) Oleg Nesterov
  2015-06-17  0:39 ` [PATCH 0/3] aio: ctx->dead cleanups Al Viro
  3 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-16 23:04 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Benjamin LaHaise, Jeff Moyer
  Cc: linux-aio, linux-kernel

"atomic_t dead" makes no sense. atomic_read() is the plain LOAD,
it doesn't have some "additional" synchronization with xchg().

And now that kill_ioctx() sets "dead" under mm->ioctx_lock we do
not even need xchg().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 0693333..b605ab2 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -84,7 +84,7 @@ struct ctx_rq_wait {
 
 struct kioctx {
 	struct percpu_ref	users;
-	atomic_t		dead;
+	bool			dead;
 
 	struct percpu_ref	reqs;
 
@@ -327,7 +327,7 @@ static int aio_ring_remap(struct file *file, struct vm_area_struct *vma)
 	for (i = 0; i < table->nr; i++) {
 		struct kioctx *ctx = table->table[i];
 
-		if (!ctx || WARN_ON(atomic_read(&ctx->dead)))
+		if (!ctx || ctx->dead)
 			continue;
 
 		if (ctx->aio_ring_file == file) {
@@ -768,7 +768,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
 err_cleanup:
 	aio_nr_sub(ctx->max_reqs);
 err_ctx:
-	atomic_set(&ctx->dead, 1);
+	ctx->dead = true; /* unneeded */
 	if (ctx->mmap_size)
 		vm_munmap(ctx->mmap_base, ctx->mmap_size);
 	aio_free_ring(ctx);
@@ -793,11 +793,12 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx,
 	struct kioctx_table *table;
 
 	spin_lock(&mm->ioctx_lock);
-	if (atomic_xchg(&ctx->dead, 1)) {
+	if (unlikely(ctx->dead)) {
 		spin_unlock(&mm->ioctx_lock);
 		return -EINVAL;
 	}
 
+	ctx->dead = true;
 	table = rcu_dereference_raw(mm->ioctx_table);
 	WARN_ON(ctx != table->table[ctx->id]);
 	table->table[ctx->id] = NULL;
@@ -1239,7 +1240,7 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr,
 	if (ret > 0)
 		*i += ret;
 
-	if (unlikely(atomic_read(&ctx->dead)))
+	if (unlikely(ctx->dead))
 		ret = -EINVAL;
 
 	if (!*i)
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] aio_free_ring: don't do page_count(NULL)
  2015-06-16 23:04 [PATCH 0/3] aio: ctx->dead cleanups Oleg Nesterov
  2015-06-16 23:04 ` [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON() Oleg Nesterov
  2015-06-16 23:04 ` [PATCH 2/3] aio: make aio_ring->dead boolean Oleg Nesterov
@ 2015-06-16 23:04 ` Oleg Nesterov
  2015-06-17 18:26   ` Jeff Moyer
  2015-06-17  0:39 ` [PATCH 0/3] aio: ctx->dead cleanups Al Viro
  3 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-16 23:04 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Benjamin LaHaise, Jeff Moyer
  Cc: linux-aio, linux-kernel

aio_free_ring() can actually see the NULL page in ->ring_pages[],
this can happen if aio_setup_ring() fails.

And in this case page_count(ctx->ring_pages[i]) can OOPS.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index b605ab2..666fbb8 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -292,12 +292,12 @@ static void aio_free_ring(struct kioctx *ctx)
 	put_aio_ring_file(ctx);
 
 	for (i = 0; i < ctx->nr_pages; i++) {
-		struct page *page;
-		pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i,
-				page_count(ctx->ring_pages[i]));
-		page = ctx->ring_pages[i];
+		struct page *page = ctx->ring_pages[i];
 		if (!page)
 			continue;
+
+		pr_debug("pid(%d) [%d] page->count=%d\n",
+				current->pid, i, page_count(page));
 		ctx->ring_pages[i] = NULL;
 		put_page(page);
 	}
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/3] aio: make aio_ring->dead boolean
  2015-06-16 23:04 ` [PATCH 2/3] aio: make aio_ring->dead boolean Oleg Nesterov
@ 2015-06-16 23:50   ` Oleg Nesterov
  2015-06-17 19:33     ` Jeff Moyer
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-16 23:50 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Benjamin LaHaise, Jeff Moyer
  Cc: linux-aio, linux-kernel

On 06/17, Oleg Nesterov wrote:
>
> @@ -327,7 +327,7 @@ static int aio_ring_remap(struct file *file, struct vm_area_struct *vma)
>  	for (i = 0; i < table->nr; i++) {
>  		struct kioctx *ctx = table->table[i];
>
> -		if (!ctx || WARN_ON(atomic_read(&ctx->dead)))
> +		if (!ctx || ctx->dead)

Argh, sorry, I removed WARN_ON() by accident.

------------------------------------------------------------------------------
Subject: [PATCH v2 2/3] aio: make aio_ring->dead boolean

"atomic_t dead" makes no sense. atomic_read() is the plain LOAD,
it doesn't have some "additional" synchronization with xchg().

And now that kill_ioctx() sets "dead" under mm->ioctx_lock we do
not even need xchg().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 0693333..b22c20a 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -84,7 +84,7 @@ struct ctx_rq_wait {
 
 struct kioctx {
 	struct percpu_ref	users;
-	atomic_t		dead;
+	bool			dead;
 
 	struct percpu_ref	reqs;
 
@@ -327,7 +327,7 @@ static int aio_ring_remap(struct file *file, struct vm_area_struct *vma)
 	for (i = 0; i < table->nr; i++) {
 		struct kioctx *ctx = table->table[i];
 
-		if (!ctx || WARN_ON(atomic_read(&ctx->dead)))
+		if (!ctx || WARN_ON(ctx->dead))
 			continue;
 
 		if (ctx->aio_ring_file == file) {
@@ -768,7 +768,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
 err_cleanup:
 	aio_nr_sub(ctx->max_reqs);
 err_ctx:
-	atomic_set(&ctx->dead, 1);
+	ctx->dead = true; /* unneeded */
 	if (ctx->mmap_size)
 		vm_munmap(ctx->mmap_base, ctx->mmap_size);
 	aio_free_ring(ctx);
@@ -793,11 +793,12 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx,
 	struct kioctx_table *table;
 
 	spin_lock(&mm->ioctx_lock);
-	if (atomic_xchg(&ctx->dead, 1)) {
+	if (unlikely(ctx->dead)) {
 		spin_unlock(&mm->ioctx_lock);
 		return -EINVAL;
 	}
 
+	ctx->dead = true;
 	table = rcu_dereference_raw(mm->ioctx_table);
 	WARN_ON(ctx != table->table[ctx->id]);
 	table->table[ctx->id] = NULL;
@@ -1239,7 +1240,7 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr,
 	if (ret > 0)
 		*i += ret;
 
-	if (unlikely(atomic_read(&ctx->dead)))
+	if (unlikely(ctx->dead))
 		ret = -EINVAL;
 
 	if (!*i)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] aio: ctx->dead cleanups
  2015-06-16 23:04 [PATCH 0/3] aio: ctx->dead cleanups Oleg Nesterov
                   ` (2 preceding siblings ...)
  2015-06-16 23:04 ` [PATCH 3/3] aio_free_ring: don't do page_count(NULL) Oleg Nesterov
@ 2015-06-17  0:39 ` Al Viro
  2015-06-17  0:50   ` Al Viro
  2015-06-17  1:05   ` Oleg Nesterov
  3 siblings, 2 replies; 16+ messages in thread
From: Al Viro @ 2015-06-17  0:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Benjamin LaHaise, Jeff Moyer, linux-aio, linux-kernel

On Wed, Jun 17, 2015 at 01:04:14AM +0200, Oleg Nesterov wrote:
> Al, please help. We are trying to backport some aio fixes and I am
> absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
> race".
> 
> 
> Firstly, I simply can't understand what exactly it tries to fix. OK,
> aio_free_ring() can race with kill and we can remap the soon-to-be-killed
> ctx. So what? kill_ioctx() will the the correct (already re-mapped)
> ctx->mmap_base after it drops mm->ioctx_lock.

Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
Which tries to grab mmap_sem, blocks for mremap() from another thread
and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
nothing whatsoever to the argument we'd passed to vm_munmap().  Sure,
it had been recalculated by aio_ring_remap(), but it's too late for
us - we'd already fetched the old value.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] aio: ctx->dead cleanups
  2015-06-17  0:39 ` [PATCH 0/3] aio: ctx->dead cleanups Al Viro
@ 2015-06-17  0:50   ` Al Viro
  2015-06-17  1:22     ` Oleg Nesterov
  2015-06-17  1:05   ` Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Al Viro @ 2015-06-17  0:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Benjamin LaHaise, Jeff Moyer, linux-aio, linux-kernel

On Wed, Jun 17, 2015 at 01:39:06AM +0100, Al Viro wrote:

> Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> Which tries to grab mmap_sem, blocks for mremap() from another thread
> and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> nothing whatsoever to the argument we'd passed to vm_munmap().  Sure,
> it had been recalculated by aio_ring_remap(), but it's too late for
> us - we'd already fetched the old value.

And yes, the leak you've spotted is real, but I would very much prefer
to avoid that goto - something like this instead:

diff --git a/mm/mremap.c b/mm/mremap.c
index 034e2d3..b36b530 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -291,7 +291,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 		if (err < 0) {
 			move_page_tables(new_vma, new_addr, vma, old_addr,
 					 moved_len, true);
-			return err;
+			vma = new_vma;
+			old_len = new_len;
+			old_addr = new_addr;
+			new_addr = err;
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] aio: ctx->dead cleanups
  2015-06-17  0:39 ` [PATCH 0/3] aio: ctx->dead cleanups Al Viro
  2015-06-17  0:50   ` Al Viro
@ 2015-06-17  1:05   ` Oleg Nesterov
  2015-06-17  1:14     ` Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-17  1:05 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Benjamin LaHaise, Jeff Moyer, linux-aio, linux-kernel

On 06/17, Al Viro wrote:
>
> On Wed, Jun 17, 2015 at 01:04:14AM +0200, Oleg Nesterov wrote:
> > Al, please help. We are trying to backport some aio fixes and I am
> > absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
> > race".
> >
> >
> > Firstly, I simply can't understand what exactly it tries to fix. OK,
> > aio_free_ring() can race with kill and we can remap the soon-to-be-killed
> > ctx. So what? kill_ioctx() will the the correct (already re-mapped)
> > ctx->mmap_base after it drops mm->ioctx_lock.
>
> Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> Which tries to grab mmap_sem, blocks for mremap() from another thread
> and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> nothing whatsoever to the argument we'd passed to vm_munmap().

Yes. But it seems that you missed another part of my email:

	So it seems to me we only need this change to ensure that move_vma() can
	not succeed if ctx was already removed from ->ioctx_table, or, if we race
	with ioctx_alloc(), it was not added to ->ioctx_table. IOW, we need to
	ensure that move_vma()->aio_ring_mmap() can not race with
	vm_munmap(ctx->mmap_base) in kill_ioctx() or ioctx_alloc(). And this race
	doesn't look really bad. The kernel can't crash, just the application can
	fool itself.

So once again, could explain why do we really need to prevent this?
Afaics, if the application is stupid, it can only fool itself.

And please note that ctx->mmap_base or/and ctx->mmap_size can be wrong
anyway. Say, an application can munmap() this vma, or munmap() the part
of this vma.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] aio: ctx->dead cleanups
  2015-06-17  1:05   ` Oleg Nesterov
@ 2015-06-17  1:14     ` Oleg Nesterov
  2015-06-17  1:32       ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-17  1:14 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Benjamin LaHaise, Jeff Moyer, linux-aio, linux-kernel

On 06/17, Oleg Nesterov wrote:
>
> On 06/17, Al Viro wrote:
> >
> > On Wed, Jun 17, 2015 at 01:04:14AM +0200, Oleg Nesterov wrote:
> > > Al, please help. We are trying to backport some aio fixes and I am
> > > absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
> > > race".
> > >
> > >
> > > Firstly, I simply can't understand what exactly it tries to fix. OK,
> > > aio_free_ring() can race with kill and we can remap the soon-to-be-killed
> > > ctx. So what? kill_ioctx() will the the correct (already re-mapped)
> > > ctx->mmap_base after it drops mm->ioctx_lock.
> >
> > Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> > Which tries to grab mmap_sem, blocks for mremap() from another thread
> > and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> > nothing whatsoever to the argument we'd passed to vm_munmap().
>
> Yes. But it seems that you missed another part of my email:
>
> 	So it seems to me we only need this change to ensure that move_vma() can
> 	not succeed if ctx was already removed from ->ioctx_table, or, if we race
> 	with ioctx_alloc(), it was not added to ->ioctx_table. IOW, we need to
> 	ensure that move_vma()->aio_ring_mmap() can not race with
> 	vm_munmap(ctx->mmap_base) in kill_ioctx() or ioctx_alloc(). And this race
> 	doesn't look really bad. The kernel can't crash, just the application can
> 	fool itself.
>
> So once again, could explain why do we really need to prevent this?
> Afaics, if the application is stupid, it can only fool itself.
>
> And please note that ctx->mmap_base or/and ctx->mmap_size can be wrong
> anyway. Say, an application can munmap() this vma, or munmap() the part
> of this vma.

And speaking of aio_ring_remap() it can "corrupt" ->mmap_base even with
this patch. Just you need to mremap() the tail of aio-mapped memory.

No?

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] aio: ctx->dead cleanups
  2015-06-17  0:50   ` Al Viro
@ 2015-06-17  1:22     ` Oleg Nesterov
  2015-06-18 16:08       ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-17  1:22 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Benjamin LaHaise, Jeff Moyer, linux-aio, linux-kernel

On 06/17, Al Viro wrote:
>
> On Wed, Jun 17, 2015 at 01:39:06AM +0100, Al Viro wrote:
>
> > Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> > Which tries to grab mmap_sem, blocks for mremap() from another thread
> > and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> > nothing whatsoever to the argument we'd passed to vm_munmap().  Sure,
> > it had been recalculated by aio_ring_remap(), but it's too late for
> > us - we'd already fetched the old value.
>
> And yes, the leak you've spotted is real, but I would very much prefer
> to avoid that goto - something like this instead:
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 034e2d3..b36b530 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -291,7 +291,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
>  		if (err < 0) {
>  			move_page_tables(new_vma, new_addr, vma, old_addr,
>  					 moved_len, true);
> -			return err;
> +			vma = new_vma;
> +			old_len = new_len;
> +			old_addr = new_addr;
> +			new_addr = err;

Personally, I'd really prefer to factor out at least this
move_page_tables() with six args. Although I agree, "goto previous_if"
doesn't look nice too, this needs cleanup.

But this is minor. I am already sleeping, most probably I misread
this code. But it seems that there is another bug with VM_ACCOUNT.

I'll recheck tomorrow and write another email.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] aio: ctx->dead cleanups
  2015-06-17  1:14     ` Oleg Nesterov
@ 2015-06-17  1:32       ` Oleg Nesterov
  0 siblings, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-17  1:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Benjamin LaHaise, Jeff Moyer, linux-aio, linux-kernel

Damn, sorry for noise, forgot to mention.

And whatever I missed, this ctx->dead check in aio_ring_remap()
looks wrong anyway. Please correct me.

On 06/17, Oleg Nesterov wrote:
>
> On 06/17, Oleg Nesterov wrote:
> >
> > On 06/17, Al Viro wrote:
> > >
> > > On Wed, Jun 17, 2015 at 01:04:14AM +0200, Oleg Nesterov wrote:
> > > > Al, please help. We are trying to backport some aio fixes and I am
> > > > absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
> > > > race".
> > > >
> > > >
> > > > Firstly, I simply can't understand what exactly it tries to fix. OK,
> > > > aio_free_ring() can race with kill and we can remap the soon-to-be-killed
> > > > ctx. So what? kill_ioctx() will the the correct (already re-mapped)
> > > > ctx->mmap_base after it drops mm->ioctx_lock.
> > >
> > > Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> > > Which tries to grab mmap_sem, blocks for mremap() from another thread
> > > and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> > > nothing whatsoever to the argument we'd passed to vm_munmap().
> >
> > Yes. But it seems that you missed another part of my email:
> >
> > 	So it seems to me we only need this change to ensure that move_vma() can
> > 	not succeed if ctx was already removed from ->ioctx_table, or, if we race
> > 	with ioctx_alloc(), it was not added to ->ioctx_table. IOW, we need to
> > 	ensure that move_vma()->aio_ring_mmap() can not race with
> > 	vm_munmap(ctx->mmap_base) in kill_ioctx() or ioctx_alloc(). And this race
> > 	doesn't look really bad. The kernel can't crash, just the application can
> > 	fool itself.
> >
> > So once again, could explain why do we really need to prevent this?
> > Afaics, if the application is stupid, it can only fool itself.
> >
> > And please note that ctx->mmap_base or/and ctx->mmap_size can be wrong
> > anyway. Say, an application can munmap() this vma, or munmap() the part
> > of this vma.
> 
> And speaking of aio_ring_remap() it can "corrupt" ->mmap_base even with
> this patch. Just you need to mremap() the tail of aio-mapped memory.
> 
> No?
> 
> Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON()
  2015-06-16 23:04 ` [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON() Oleg Nesterov
@ 2015-06-17 17:50   ` Jeff Moyer
  2015-06-17 18:42     ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Moyer @ 2015-06-17 17:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Andrew Morton, Benjamin LaHaise, linux-aio, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> atomic_read(&ctx->dead) in aio_ring_remap() looks confusing.
> kill_ioctx() sets ctx->dead and removes ctx from ->ioctx_table
> "atomically" under mm->ioctx_lock, so aio_ring_remap() can never
> see a dead ctx.
>
> If we really want this check, we should put it under WARN_ON()
> and it should not depend on aio_ring_file == file.

Just get rid of the check, it doesn't make any sense.

-Jeff

>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  fs/aio.c |   16 ++++++++--------
>  1 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 480440f..0693333 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -325,14 +325,14 @@ static int aio_ring_remap(struct file *file, struct vm_area_struct *vma)
>  	rcu_read_lock();
>  	table = rcu_dereference(mm->ioctx_table);
>  	for (i = 0; i < table->nr; i++) {
> -		struct kioctx *ctx;
> -
> -		ctx = table->table[i];
> -		if (ctx && ctx->aio_ring_file == file) {
> -			if (!atomic_read(&ctx->dead)) {
> -				ctx->user_id = ctx->mmap_base = vma->vm_start;
> -				res = 0;
> -			}
> +		struct kioctx *ctx = table->table[i];
> +
> +		if (!ctx || WARN_ON(atomic_read(&ctx->dead)))
> +			continue;
> +
> +		if (ctx->aio_ring_file == file) {
> +			ctx->user_id = ctx->mmap_base = vma->vm_start;
> +			res = 0;
>  			break;
>  		}
>  	}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] aio_free_ring: don't do page_count(NULL)
  2015-06-16 23:04 ` [PATCH 3/3] aio_free_ring: don't do page_count(NULL) Oleg Nesterov
@ 2015-06-17 18:26   ` Jeff Moyer
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Moyer @ 2015-06-17 18:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Andrew Morton, Benjamin LaHaise, linux-aio, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> aio_free_ring() can actually see the NULL page in ->ring_pages[],
> this can happen if aio_setup_ring() fails.
>
> And in this case page_count(ctx->ring_pages[i]) can OOPS.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

I'm not sure the reformatting was necessary, but whatever.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

> ---
>  fs/aio.c |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index b605ab2..666fbb8 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -292,12 +292,12 @@ static void aio_free_ring(struct kioctx *ctx)
>  	put_aio_ring_file(ctx);
>  
>  	for (i = 0; i < ctx->nr_pages; i++) {
> -		struct page *page;
> -		pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i,
> -				page_count(ctx->ring_pages[i]));
> -		page = ctx->ring_pages[i];
> +		struct page *page = ctx->ring_pages[i];
>  		if (!page)
>  			continue;
> +
> +		pr_debug("pid(%d) [%d] page->count=%d\n",
> +				current->pid, i, page_count(page));
>  		ctx->ring_pages[i] = NULL;
>  		put_page(page);
>  	}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON()
  2015-06-17 17:50   ` Jeff Moyer
@ 2015-06-17 18:42     ` Oleg Nesterov
  0 siblings, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-17 18:42 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Al Viro, Andrew Morton, Benjamin LaHaise, linux-aio, linux-kernel

On 06/17, Jeff Moyer wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > atomic_read(&ctx->dead) in aio_ring_remap() looks confusing.
> > kill_ioctx() sets ctx->dead and removes ctx from ->ioctx_table
> > "atomically" under mm->ioctx_lock, so aio_ring_remap() can never
> > see a dead ctx.
> >
> > If we really want this check, we should put it under WARN_ON()
> > and it should not depend on aio_ring_file == file.
>
> Just get rid of the check, it doesn't make any sense.

Yes, agreed, will redo/resend tomorrow.

Thanks!

> -Jeff
> 
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > ---
> >  fs/aio.c |   16 ++++++++--------
> >  1 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/aio.c b/fs/aio.c
> > index 480440f..0693333 100644
> > --- a/fs/aio.c
> > +++ b/fs/aio.c
> > @@ -325,14 +325,14 @@ static int aio_ring_remap(struct file *file, struct vm_area_struct *vma)
> >  	rcu_read_lock();
> >  	table = rcu_dereference(mm->ioctx_table);
> >  	for (i = 0; i < table->nr; i++) {
> > -		struct kioctx *ctx;
> > -
> > -		ctx = table->table[i];
> > -		if (ctx && ctx->aio_ring_file == file) {
> > -			if (!atomic_read(&ctx->dead)) {
> > -				ctx->user_id = ctx->mmap_base = vma->vm_start;
> > -				res = 0;
> > -			}
> > +		struct kioctx *ctx = table->table[i];
> > +
> > +		if (!ctx || WARN_ON(atomic_read(&ctx->dead)))
> > +			continue;
> > +
> > +		if (ctx->aio_ring_file == file) {
> > +			ctx->user_id = ctx->mmap_base = vma->vm_start;
> > +			res = 0;
> >  			break;
> >  		}
> >  	}


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] aio: make aio_ring->dead boolean
  2015-06-16 23:50   ` [PATCH v2 " Oleg Nesterov
@ 2015-06-17 19:33     ` Jeff Moyer
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Moyer @ 2015-06-17 19:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Andrew Morton, Benjamin LaHaise, linux-aio, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> On 06/17, Oleg Nesterov wrote:
>>
>> @@ -327,7 +327,7 @@ static int aio_ring_remap(struct file *file, struct vm_area_struct *vma)
>>  	for (i = 0; i < table->nr; i++) {
>>  		struct kioctx *ctx = table->table[i];
>>
>> -		if (!ctx || WARN_ON(atomic_read(&ctx->dead)))
>> +		if (!ctx || ctx->dead)
>
> Argh, sorry, I removed WARN_ON() by accident.
>
> ------------------------------------------------------------------------------
> Subject: [PATCH v2 2/3] aio: make aio_ring->dead boolean
>
> "atomic_t dead" makes no sense. atomic_read() is the plain LOAD,
> it doesn't have some "additional" synchronization with xchg().
>
> And now that kill_ioctx() sets "dead" under mm->ioctx_lock we do
> not even need xchg().

I think this makes sense and is safe.  The key for the reader is that it
will see the updated ctx->dead after it's been woken up.

> @@ -768,7 +768,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
>  err_cleanup:
>  	aio_nr_sub(ctx->max_reqs);
>  err_ctx:
> -	atomic_set(&ctx->dead, 1);
> +	ctx->dead = true; /* unneeded */

And I agree that this can be nuked.  You can add my "Reviewed-by: Jeff
Moyer <jmoyer@redhat.com>" to your v3 posting.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] aio: ctx->dead cleanups
  2015-06-17  1:22     ` Oleg Nesterov
@ 2015-06-18 16:08       ` Oleg Nesterov
  0 siblings, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2015-06-18 16:08 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Benjamin LaHaise, Jeff Moyer, linux-aio, linux-kernel

On 06/17, Oleg Nesterov wrote:
>
> On 06/17, Al Viro wrote:
> >
> > On Wed, Jun 17, 2015 at 01:39:06AM +0100, Al Viro wrote:
> >
> > And yes, the leak you've spotted is real, but I would very much prefer
> > to avoid that goto - something like this instead:
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 034e2d3..b36b530 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -291,7 +291,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> >  		if (err < 0) {
> >  			move_page_tables(new_vma, new_addr, vma, old_addr,
> >  					 moved_len, true);
> > -			return err;
> > +			vma = new_vma;
> > +			old_len = new_len;
> > +			old_addr = new_addr;
> > +			new_addr = err;
>
> Personally, I'd really prefer to factor out at least this
> move_page_tables() with six args. Although I agree, "goto previous_if"
> doesn't look nice too, this needs cleanup.
>
> But this is minor. I am already sleeping, most probably I misread
> this code. But it seems that there is another bug with VM_ACCOUNT.
>
> I'll recheck tomorrow and write another email.

Yes this look wrong. At least we shouldn't set *locked on failure.
mm->locked_vm += new_len is probably fine, but doesn't look really
nice because ->locked_vm can underflow in do_munmap() before that.

I'll send the fixes, but also I'll try to cleanup this code. Not
sure I will succeed ;)

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-06-18 16:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-16 23:04 [PATCH 0/3] aio: ctx->dead cleanups Oleg Nesterov
2015-06-16 23:04 ` [PATCH 1/3] aio_ring_remap: turn the ctx->dead check into WARN_ON() Oleg Nesterov
2015-06-17 17:50   ` Jeff Moyer
2015-06-17 18:42     ` Oleg Nesterov
2015-06-16 23:04 ` [PATCH 2/3] aio: make aio_ring->dead boolean Oleg Nesterov
2015-06-16 23:50   ` [PATCH v2 " Oleg Nesterov
2015-06-17 19:33     ` Jeff Moyer
2015-06-16 23:04 ` [PATCH 3/3] aio_free_ring: don't do page_count(NULL) Oleg Nesterov
2015-06-17 18:26   ` Jeff Moyer
2015-06-17  0:39 ` [PATCH 0/3] aio: ctx->dead cleanups Al Viro
2015-06-17  0:50   ` Al Viro
2015-06-17  1:22     ` Oleg Nesterov
2015-06-18 16:08       ` Oleg Nesterov
2015-06-17  1:05   ` Oleg Nesterov
2015-06-17  1:14     ` Oleg Nesterov
2015-06-17  1:32       ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.