All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND v2 PATCH 0/2] Bug fix in aio ring page migration
@ 2014-03-10  8:15 ` Tang Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2014-03-10  8:15 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki
  Cc: guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

This patch-set fixes the following two problems:

1. Need to use ctx->completion_lock to protect ring pages
   from being mis-written while migration.

2. Need memory barrier to ensure memory copy is done before
   ctx->ring_pages[] is updated.

NOTE: AIO ring page migration was implemented since Linux 3.12.
      So we need to merge these two patches into 3.12 stable tree.

Tang Chen (2):
  aio, memory-hotplug: Fix confliction when migrating and accessing
    ring pages.
  aio, mem-hotplug: Add memory barrier to aio ring page migration.

 fs/aio.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 42 insertions(+), 0 deletions(-)

-- 
1.7.7


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RESEND v2 PATCH 0/2] Bug fix in aio ring page migration
@ 2014-03-10  8:15 ` Tang Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2014-03-10  8:15 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki
  Cc: guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

This patch-set fixes the following two problems:

1. Need to use ctx->completion_lock to protect ring pages
   from being mis-written while migration.

2. Need memory barrier to ensure memory copy is done before
   ctx->ring_pages[] is updated.

NOTE: AIO ring page migration was implemented since Linux 3.12.
      So we need to merge these two patches into 3.12 stable tree.

Tang Chen (2):
  aio, memory-hotplug: Fix confliction when migrating and accessing
    ring pages.
  aio, mem-hotplug: Add memory barrier to aio ring page migration.

 fs/aio.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 42 insertions(+), 0 deletions(-)

-- 
1.7.7

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-03-10  8:15 ` Tang Chen
@ 2014-03-10  8:15   ` Tang Chen
  -1 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2014-03-10  8:15 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki
  Cc: guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

IO ring page migration has been implemented by the following patch:

        https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3

In this patch, ctx->completion_lock is used to prevent other processes
from accessing the ring page being migrated.

But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
when writing to the ring page, they didn't take ctx->completion_lock.

As a result, for example, we have the following problem:

            thread 1                      |              thread 2
                                          |
aio_migratepage()                         |
 |-> take ctx->completion_lock            |
 |-> migrate_page_copy(new, old)          |
 |   *NOW*, ctx->ring_pages[idx] == old   |
                                          |
                                          |    *NOW*, ctx->ring_pages[idx] == old
                                          |    aio_read_events_ring()
                                          |     |-> ring = kmap_atomic(ctx->ring_pages[0])
                                          |     |-> ring->head = head;          *HERE, write to the old ring page*
                                          |     |-> kunmap_atomic(ring);
                                          |
 |-> ctx->ring_pages[idx] = new           |
 |   *BUT NOW*, the content of            |
 |    ring_pages[idx] is old.             |
 |-> release ctx->completion_lock         |

As above, the new ring page will not be updated.

The solution is taking ctx->completion_lock in thread 2, which means,
in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
writing to ring pages.

Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
v2:
  Use spin_lock_irq rather than spin_lock_irqsave as Jeff suggested.
---
 fs/aio.c |   28 ++++++++++++++++++++++++++++
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..dc70246 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -437,6 +437,14 @@ static int aio_setup_ring(struct kioctx *ctx)
 	ctx->user_id = ctx->mmap_base;
 	ctx->nr_events = nr_events; /* trusted copy */
 
+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irq(&ctx->completion_lock);
+
 	ring = kmap_atomic(ctx->ring_pages[0]);
 	ring->nr = nr_events;	/* user copy */
 	ring->id = ~0U;
@@ -448,6 +456,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 	kunmap_atomic(ring);
 	flush_dcache_page(ctx->ring_pages[0]);
 
+	spin_unlock_irq(&ctx->completion_lock);
+
 	return 0;
 }
 
@@ -556,9 +566,17 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 					rcu_read_unlock();
 					spin_unlock(&mm->ioctx_lock);
 
+					/*
+					 * Accessing ring pages must be done
+					 * holding ctx->completion_lock to
+					 * prevent aio ring page migration
+					 * procedure from migrating ring pages.
+					 */
+					spin_lock_irq(&ctx->completion_lock);
 					ring = kmap_atomic(ctx->ring_pages[0]);
 					ring->id = ctx->id;
 					kunmap_atomic(ring);
+					spin_unlock_irq(&ctx->completion_lock);
 					return 0;
 				}
 
@@ -1066,11 +1084,21 @@ static long aio_read_events_ring(struct kioctx *ctx,
 		head %= ctx->nr_events;
 	}
 
+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irq(&ctx->completion_lock);
+
 	ring = kmap_atomic(ctx->ring_pages[0]);
 	ring->head = head;
 	kunmap_atomic(ring);
 	flush_dcache_page(ctx->ring_pages[0]);
 
+	spin_unlock_irq(&ctx->completion_lock);
+
 	pr_debug("%li  h%u t%u\n", ret, head, tail);
 
 	put_reqs_available(ctx, ret);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
@ 2014-03-10  8:15   ` Tang Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2014-03-10  8:15 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki
  Cc: guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

IO ring page migration has been implemented by the following patch:

        https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3

In this patch, ctx->completion_lock is used to prevent other processes
from accessing the ring page being migrated.

But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
when writing to the ring page, they didn't take ctx->completion_lock.

As a result, for example, we have the following problem:

            thread 1                      |              thread 2
                                          |
aio_migratepage()                         |
 |-> take ctx->completion_lock            |
 |-> migrate_page_copy(new, old)          |
 |   *NOW*, ctx->ring_pages[idx] == old   |
                                          |
                                          |    *NOW*, ctx->ring_pages[idx] == old
                                          |    aio_read_events_ring()
                                          |     |-> ring = kmap_atomic(ctx->ring_pages[0])
                                          |     |-> ring->head = head;          *HERE, write to the old ring page*
                                          |     |-> kunmap_atomic(ring);
                                          |
 |-> ctx->ring_pages[idx] = new           |
 |   *BUT NOW*, the content of            |
 |    ring_pages[idx] is old.             |
 |-> release ctx->completion_lock         |

As above, the new ring page will not be updated.

The solution is taking ctx->completion_lock in thread 2, which means,
in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
writing to ring pages.

Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
v2:
  Use spin_lock_irq rather than spin_lock_irqsave as Jeff suggested.
---
 fs/aio.c |   28 ++++++++++++++++++++++++++++
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..dc70246 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -437,6 +437,14 @@ static int aio_setup_ring(struct kioctx *ctx)
 	ctx->user_id = ctx->mmap_base;
 	ctx->nr_events = nr_events; /* trusted copy */
 
+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irq(&ctx->completion_lock);
+
 	ring = kmap_atomic(ctx->ring_pages[0]);
 	ring->nr = nr_events;	/* user copy */
 	ring->id = ~0U;
@@ -448,6 +456,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 	kunmap_atomic(ring);
 	flush_dcache_page(ctx->ring_pages[0]);
 
+	spin_unlock_irq(&ctx->completion_lock);
+
 	return 0;
 }
 
@@ -556,9 +566,17 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 					rcu_read_unlock();
 					spin_unlock(&mm->ioctx_lock);
 
+					/*
+					 * Accessing ring pages must be done
+					 * holding ctx->completion_lock to
+					 * prevent aio ring page migration
+					 * procedure from migrating ring pages.
+					 */
+					spin_lock_irq(&ctx->completion_lock);
 					ring = kmap_atomic(ctx->ring_pages[0]);
 					ring->id = ctx->id;
 					kunmap_atomic(ring);
+					spin_unlock_irq(&ctx->completion_lock);
 					return 0;
 				}
 
@@ -1066,11 +1084,21 @@ static long aio_read_events_ring(struct kioctx *ctx,
 		head %= ctx->nr_events;
 	}
 
+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irq(&ctx->completion_lock);
+
 	ring = kmap_atomic(ctx->ring_pages[0]);
 	ring->head = head;
 	kunmap_atomic(ring);
 	flush_dcache_page(ctx->ring_pages[0]);
 
+	spin_unlock_irq(&ctx->completion_lock);
+
 	pr_debug("%li  h%u t%u\n", ret, head, tail);
 
 	put_reqs_available(ctx, ret);
-- 
1.7.7

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RESEND v2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration.
  2014-03-10  8:15 ` Tang Chen
@ 2014-03-10  8:15   ` Tang Chen
  -1 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2014-03-10  8:15 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki
  Cc: guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

When doing aio ring page migration, we migrated the page, and update
ctx->ring_pages[]. Like the following:

aio_migratepage()
 |-> migrate_page_copy(new, old)
 |   ......				/* Need barrier here */
 |-> ctx->ring_pages[idx] = new

Actually, we need a memory barrier between these two operations.
Otherwise, if ctx->ring_pages[] is updated before memory copy due to
the compiler optimization, other processes may have an opportunity
to access to the not fully initialized new ring page.

So add a wmb and rmb to synchronize them.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
v2: change smp_rmb() to smp_read_barrier_depends(). Thanks Miao.
---
 fs/aio.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index dc70246..4133ba9 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -327,6 +327,14 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 		pgoff_t idx;
 		spin_lock_irqsave(&ctx->completion_lock, flags);
 		migrate_page_copy(new, old);
+
+		/*
+		 * Ensure memory copy is finished before updating
+		 * ctx->ring_pages[]. Otherwise other processes may access to
+		 * new ring pages which are not fully initialized.
+		 */
+		smp_wmb();
+
 		idx = old->index;
 		if (idx < (pgoff_t)ctx->nr_pages) {
 			/* And only do the move if things haven't changed */
@@ -1069,6 +1077,12 @@ static long aio_read_events_ring(struct kioctx *ctx,
 		page = ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE];
 		pos %= AIO_EVENTS_PER_PAGE;
 
+		/*
+		 * Ensure that the page's data was copied from old one by
+		 * aio_migratepage().
+		 */
+		smp_read_barrier_depends();
+
 		ev = kmap(page);
 		copy_ret = copy_to_user(event + ret, ev + pos,
 					sizeof(*ev) * avail);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RESEND v2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration.
@ 2014-03-10  8:15   ` Tang Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2014-03-10  8:15 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki
  Cc: guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

When doing aio ring page migration, we migrated the page, and update
ctx->ring_pages[]. Like the following:

aio_migratepage()
 |-> migrate_page_copy(new, old)
 |   ......				/* Need barrier here */
 |-> ctx->ring_pages[idx] = new

Actually, we need a memory barrier between these two operations.
Otherwise, if ctx->ring_pages[] is updated before memory copy due to
the compiler optimization, other processes may have an opportunity
to access to the not fully initialized new ring page.

So add a wmb and rmb to synchronize them.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
v2: change smp_rmb() to smp_read_barrier_depends(). Thanks Miao.
---
 fs/aio.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index dc70246..4133ba9 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -327,6 +327,14 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 		pgoff_t idx;
 		spin_lock_irqsave(&ctx->completion_lock, flags);
 		migrate_page_copy(new, old);
+
+		/*
+		 * Ensure memory copy is finished before updating
+		 * ctx->ring_pages[]. Otherwise other processes may access to
+		 * new ring pages which are not fully initialized.
+		 */
+		smp_wmb();
+
 		idx = old->index;
 		if (idx < (pgoff_t)ctx->nr_pages) {
 			/* And only do the move if things haven't changed */
@@ -1069,6 +1077,12 @@ static long aio_read_events_ring(struct kioctx *ctx,
 		page = ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE];
 		pos %= AIO_EVENTS_PER_PAGE;
 
+		/*
+		 * Ensure that the page's data was copied from old one by
+		 * aio_migratepage().
+		 */
+		smp_read_barrier_depends();
+
 		ev = kmap(page);
 		copy_ret = copy_to_user(event + ret, ev + pos,
 					sizeof(*ev) * avail);
-- 
1.7.7

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-03-10  8:15   ` Tang Chen
@ 2014-03-11 18:46     ` Benjamin LaHaise
  -1 siblings, 0 replies; 24+ messages in thread
From: Benjamin LaHaise @ 2014-03-11 18:46 UTC (permalink / raw)
  To: Tang Chen
  Cc: viro, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki,
	guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

On Mon, Mar 10, 2014 at 04:15:33PM +0800, Tang Chen wrote:
> IO ring page migration has been implemented by the following patch:
> 
>         https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3
> 
> In this patch, ctx->completion_lock is used to prevent other processes
> from accessing the ring page being migrated.
> 
> But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
> when writing to the ring page, they didn't take ctx->completion_lock.

> As a result, for example, we have the following problem:
...
> As above, the new ring page will not be updated.
> 
> The solution is taking ctx->completion_lock in thread 2, which means,
> in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
> writing to ring pages.

Upon review, there are still two accesses of ->ring_pages that are not 
protected by any spinlocks which could potentially race with migration.  One 
is in aio_setup_ring(), which can be easily resolved by moving the assignment 
of ->ring_pages above the unlock_page().  Another spot is in 
aio_read_events_ring() where head and tail are fetched from the ring without 
any locking.  I also fear we'll be introducing new performance issues with 
all the additonal spinlock bouncing, despite the fact that is only ever 
needed for migration.  I'm going to continue looking into this today and 
will try to send out a followup to this email later.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
@ 2014-03-11 18:46     ` Benjamin LaHaise
  0 siblings, 0 replies; 24+ messages in thread
From: Benjamin LaHaise @ 2014-03-11 18:46 UTC (permalink / raw)
  To: Tang Chen
  Cc: viro, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki,
	guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

On Mon, Mar 10, 2014 at 04:15:33PM +0800, Tang Chen wrote:
> IO ring page migration has been implemented by the following patch:
> 
>         https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3
> 
> In this patch, ctx->completion_lock is used to prevent other processes
> from accessing the ring page being migrated.
> 
> But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
> when writing to the ring page, they didn't take ctx->completion_lock.

> As a result, for example, we have the following problem:
...
> As above, the new ring page will not be updated.
> 
> The solution is taking ctx->completion_lock in thread 2, which means,
> in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
> writing to ring pages.

Upon review, there are still two accesses of ->ring_pages that are not 
protected by any spinlocks which could potentially race with migration.  One 
is in aio_setup_ring(), which can be easily resolved by moving the assignment 
of ->ring_pages above the unlock_page().  Another spot is in 
aio_read_events_ring() where head and tail are fetched from the ring without 
any locking.  I also fear we'll be introducing new performance issues with 
all the additonal spinlock bouncing, despite the fact that is only ever 
needed for migration.  I'm going to continue looking into this today and 
will try to send out a followup to this email later.

		-ben
-- 
"Thought is the essence of where you are now."

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-03-11 18:46     ` Benjamin LaHaise
@ 2014-03-12  5:25       ` Tang Chen
  -1 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2014-03-12  5:25 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: viro, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki,
	guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox


Hi Ben,

Sorry for the delay.

On 03/12/2014 02:46 AM, Benjamin LaHaise wrote:
> On Mon, Mar 10, 2014 at 04:15:33PM +0800, Tang Chen wrote:
>> IO ring page migration has been implemented by the following patch:
>>
>>          https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3
>>
>> In this patch, ctx->completion_lock is used to prevent other processes
>> from accessing the ring page being migrated.
>>
>> But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
>> when writing to the ring page, they didn't take ctx->completion_lock.
>
>> As a result, for example, we have the following problem:
> ...
>> As above, the new ring page will not be updated.
>>
>> The solution is taking ctx->completion_lock in thread 2, which means,
>> in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
>> writing to ring pages.
>
> Upon review, there are still two accesses of ->ring_pages that are not
> protected by any spinlocks which could potentially race with migration.  One
> is in aio_setup_ring(), which can be easily resolved by moving the assignment
> of ->ring_pages above the unlock_page().

Yes, you are right. Followed, thanks.

>Another spot is in
> aio_read_events_ring() where head and tail are fetched from the ring without
> any locking.  I also fear we'll be introducing new performance issues with
> all the additonal spinlock bouncing, despite the fact that is only ever
> needed for migration.  I'm going to continue looking into this today and
> will try to send out a followup to this email later.

In the beginning of aio_read_events_ring(), it reads head and tail, not 
write.
So even if ring pages are migrated, the contents of the pages will not 
be changed.
So reading it is OK, from old page or from the new page, I think.


>
> 		-ben

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
@ 2014-03-12  5:25       ` Tang Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2014-03-12  5:25 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: viro, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki,
	guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox


Hi Ben,

Sorry for the delay.

On 03/12/2014 02:46 AM, Benjamin LaHaise wrote:
> On Mon, Mar 10, 2014 at 04:15:33PM +0800, Tang Chen wrote:
>> IO ring page migration has been implemented by the following patch:
>>
>>          https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3
>>
>> In this patch, ctx->completion_lock is used to prevent other processes
>> from accessing the ring page being migrated.
>>
>> But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
>> when writing to the ring page, they didn't take ctx->completion_lock.
>
>> As a result, for example, we have the following problem:
> ...
>> As above, the new ring page will not be updated.
>>
>> The solution is taking ctx->completion_lock in thread 2, which means,
>> in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
>> writing to ring pages.
>
> Upon review, there are still two accesses of ->ring_pages that are not
> protected by any spinlocks which could potentially race with migration.  One
> is in aio_setup_ring(), which can be easily resolved by moving the assignment
> of ->ring_pages above the unlock_page().

Yes, you are right. Followed, thanks.

>Another spot is in
> aio_read_events_ring() where head and tail are fetched from the ring without
> any locking.  I also fear we'll be introducing new performance issues with
> all the additonal spinlock bouncing, despite the fact that is only ever
> needed for migration.  I'm going to continue looking into this today and
> will try to send out a followup to this email later.

In the beginning of aio_read_events_ring(), it reads head and tail, not 
write.
So even if ring pages are migrated, the contents of the pages will not 
be changed.
So reading it is OK, from old page or from the new page, I think.


>
> 		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-03-12  5:25       ` Tang Chen
@ 2014-03-12 22:17         ` Benjamin LaHaise
  -1 siblings, 0 replies; 24+ messages in thread
From: Benjamin LaHaise @ 2014-03-12 22:17 UTC (permalink / raw)
  To: Tang Chen
  Cc: viro, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki,
	guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

Hello Tang,

On Wed, Mar 12, 2014 at 01:25:26PM +0800, Tang Chen wrote:
... <snip> ...

> >Another spot is in
> >aio_read_events_ring() where head and tail are fetched from the ring 
> >without
> >any locking.  I also fear we'll be introducing new performance issues with
> >all the additonal spinlock bouncing, despite the fact that is only ever
> >needed for migration.  I'm going to continue looking into this today and
> >will try to send out a followup to this email later.
> 
> In the beginning of aio_read_events_ring(), it reads head and tail, not 
> write.
> So even if ring pages are migrated, the contents of the pages will not 
> be changed.
> So reading it is OK, from old page or from the new page, I think.

Your assumption that reading it is okay is incorrect.  Since we do not have 
a reference on the page at that point, it is possible that the read of the 
page takes place after the page has been freed and allocated to another part 
of the kernel.  This would result in the read returning invalid information.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
@ 2014-03-12 22:17         ` Benjamin LaHaise
  0 siblings, 0 replies; 24+ messages in thread
From: Benjamin LaHaise @ 2014-03-12 22:17 UTC (permalink / raw)
  To: Tang Chen
  Cc: viro, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki,
	guz.fnst, linux-fsdevel, linux-aio, linux-kernel, miaox

Hello Tang,

On Wed, Mar 12, 2014 at 01:25:26PM +0800, Tang Chen wrote:
... <snip> ...

> >Another spot is in
> >aio_read_events_ring() where head and tail are fetched from the ring 
> >without
> >any locking.  I also fear we'll be introducing new performance issues with
> >all the additonal spinlock bouncing, despite the fact that is only ever
> >needed for migration.  I'm going to continue looking into this today and
> >will try to send out a followup to this email later.
> 
> In the beginning of aio_read_events_ring(), it reads head and tail, not 
> write.
> So even if ring pages are migrated, the contents of the pages will not 
> be changed.
> So reading it is OK, from old page or from the new page, I think.

Your assumption that reading it is okay is incorrect.  Since we do not have 
a reference on the page at that point, it is possible that the read of the 
page takes place after the page has been freed and allocated to another part 
of the kernel.  This would result in the read returning invalid information.

		-ben
-- 
"Thought is the essence of where you are now."

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 0/2] Bug fix in aio ring page migration
  2014-03-10  8:15 ` Tang Chen
@ 2014-03-13  9:45   ` Gu Zheng
  -1 siblings, 0 replies; 24+ messages in thread
From: Gu Zheng @ 2014-03-13  9:45 UTC (permalink / raw)
  To: stable
  Cc: Greg Kroah-Hartman, viro, bcrl, jmoyer, kosaki.motohiro,
	kosaki.motohiro, isimatu.yasuaki, linux-fsdevel, linux-aio,
	linux-kernel, miaox, tangchen, guz.fnst

This patchset has been applied to linux-next, and these problems also exist
in 3.12.y and 3.13.y stable tree.
So please merge this patchset to 3.12.y and 3.13.y stable tree.

commit 692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
Author: Tang Chen <tangchen@cn.fujitsu.com>
Date:   Mon Mar 10 16:15:33 2014 +0800

    aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.

commit e0f5e0add36d2e3c456cf2f4283673ed834b3c24
Author: Tang Chen <tangchen@cn.fujitsu.com>
Date:   Mon Mar 10 16:15:34 2014 +0800

    aio, mem-hotplug: Add memory barrier to aio ring page migration.

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/fs/aio.c?id=692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/fs/aio.c?id=e0f5e0add36d2e3c456cf2f4283673ed834b3c24

https://lkml.org/lkml/2014/3/10/56
https://lkml.org/lkml/2014/3/10/58


On 03/10/2014 04:15 PM, Tang Chen wrote:

> This patch-set fixes the following two problems:
> 
> 1. Need to use ctx->completion_lock to protect ring pages
>    from being mis-written while migration.
> 
> 2. Need memory barrier to ensure memory copy is done before
>    ctx->ring_pages[] is updated.
> 
> NOTE: AIO ring page migration was implemented since Linux 3.12.
>       So we need to merge these two patches into 3.12 stable tree.
> 
> Tang Chen (2):
>   aio, memory-hotplug: Fix confliction when migrating and accessing
>     ring pages.
>   aio, mem-hotplug: Add memory barrier to aio ring page migration.
> 
>  fs/aio.c |   42 ++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 42 insertions(+), 0 deletions(-)
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 0/2] Bug fix in aio ring page migration
@ 2014-03-13  9:45   ` Gu Zheng
  0 siblings, 0 replies; 24+ messages in thread
From: Gu Zheng @ 2014-03-13  9:45 UTC (permalink / raw)
  To: stable
  Cc: Greg Kroah-Hartman, viro, bcrl, jmoyer, kosaki.motohiro,
	kosaki.motohiro, isimatu.yasuaki, linux-fsdevel, linux-aio,
	linux-kernel, miaox, tangchen, guz.fnst

This patchset has been applied to linux-next, and these problems also exist
in 3.12.y and 3.13.y stable tree.
So please merge this patchset to 3.12.y and 3.13.y stable tree.

commit 692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
Author: Tang Chen <tangchen@cn.fujitsu.com>
Date:   Mon Mar 10 16:15:33 2014 +0800

    aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.

commit e0f5e0add36d2e3c456cf2f4283673ed834b3c24
Author: Tang Chen <tangchen@cn.fujitsu.com>
Date:   Mon Mar 10 16:15:34 2014 +0800

    aio, mem-hotplug: Add memory barrier to aio ring page migration.

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/fs/aio.c?id=692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/fs/aio.c?id=e0f5e0add36d2e3c456cf2f4283673ed834b3c24

https://lkml.org/lkml/2014/3/10/56
https://lkml.org/lkml/2014/3/10/58


On 03/10/2014 04:15 PM, Tang Chen wrote:

> This patch-set fixes the following two problems:
> 
> 1. Need to use ctx->completion_lock to protect ring pages
>    from being mis-written while migration.
> 
> 2. Need memory barrier to ensure memory copy is done before
>    ctx->ring_pages[] is updated.
> 
> NOTE: AIO ring page migration was implemented since Linux 3.12.
>       So we need to merge these two patches into 3.12 stable tree.
> 
> Tang Chen (2):
>   aio, memory-hotplug: Fix confliction when migrating and accessing
>     ring pages.
>   aio, mem-hotplug: Add memory barrier to aio ring page migration.
> 
>  fs/aio.c |   42 ++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 42 insertions(+), 0 deletions(-)
> 


--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-03-12 22:17         ` Benjamin LaHaise
@ 2014-03-14 10:25           ` Gu Zheng
  -1 siblings, 0 replies; 24+ messages in thread
From: Gu Zheng @ 2014-03-14 10:25 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, viro, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, linux-fsdevel, linux-aio, linux-kernel, miaox

Hi Ben,
On 03/13/2014 06:17 AM, Benjamin LaHaise wrote:

> Hello Tang,
> 
> On Wed, Mar 12, 2014 at 01:25:26PM +0800, Tang Chen wrote:
> ... <snip> ...
> 
>>> Another spot is in
>>> aio_read_events_ring() where head and tail are fetched from the ring 
>>> without
>>> any locking.  I also fear we'll be introducing new performance issues with
>>> all the additonal spinlock bouncing, despite the fact that is only ever
>>> needed for migration.  I'm going to continue looking into this today and
>>> will try to send out a followup to this email later.
>>
>> In the beginning of aio_read_events_ring(), it reads head and tail, not 
>> write.
>> So even if ring pages are migrated, the contents of the pages will not 
>> be changed.
>> So reading it is OK, from old page or from the new page, I think.
> 
> Your assumption that reading it is okay is incorrect.  Since we do not have 
> a reference on the page at that point, it is possible that the read of the 
> page takes place after the page has been freed and allocated to another part 
> of the kernel.  This would result in the read returning invalid information.

What about the following patch? It adds additional reference to protect the page
avoid being freed when we reading it.
ps.It is applied on linux-next(3-13).

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 fs/aio.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 4133ba9..a4f3a4f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -283,7 +283,7 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 {
 	struct kioctx *ctx;
 	unsigned long flags;
-	int rc;
+	int rc, extra_count;
 
 	rc = 0;
 
@@ -311,7 +311,10 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 	BUG_ON(PageWriteback(old));
 	get_page(new);
 
-	rc = migrate_page_move_mapping(mapping, new, old, NULL, mode, 1);
+	extra_count = page_count(old) - page_has_private(old) - 2;
+
+	rc = migrate_page_move_mapping(mapping, new, old,
+						NULL, mode, extra_count);
 	if (rc != MIGRATEPAGE_SUCCESS) {
 		put_page(new);
 		return rc;
@@ -1047,13 +1050,17 @@ static long aio_read_events_ring(struct kioctx *ctx,
 	unsigned head, tail, pos;
 	long ret = 0;
 	int copy_ret;
+	struct page *page;
 
 	mutex_lock(&ctx->ring_lock);
 
-	ring = kmap_atomic(ctx->ring_pages[0]);
+	page = ctx->ring_pages[0];
+	get_page(page);
+	ring = kmap_atomic(page);
 	head = ring->head;
 	tail = ring->tail;
 	kunmap_atomic(ring);
+	put_page(page);
 
 	pr_debug("h%u t%u m%u\n", head, tail, ctx->nr_events);
 
@@ -1063,7 +1070,6 @@ static long aio_read_events_ring(struct kioctx *ctx,
 	while (ret < nr) {
 		long avail;
 		struct io_event *ev;
-		struct page *page;
 
 		avail = (head <= tail ?  tail : ctx->nr_events) - head;
 		if (head == tail)
@@ -1075,6 +1081,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
 
 		pos = head + AIO_EVENTS_OFFSET;
 		page = ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE];
+		get_page(page);
 		pos %= AIO_EVENTS_PER_PAGE;
 
 		/*
@@ -1087,6 +1094,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
 		copy_ret = copy_to_user(event + ret, ev + pos,
 					sizeof(*ev) * avail);
 		kunmap(page);
+		put_page(page);
 
 		if (unlikely(copy_ret)) {
 			ret = -EFAULT;
-- 
1.7.7


> 
> 		-ben



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
@ 2014-03-14 10:25           ` Gu Zheng
  0 siblings, 0 replies; 24+ messages in thread
From: Gu Zheng @ 2014-03-14 10:25 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, viro, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, linux-fsdevel, linux-aio, linux-kernel, miaox

Hi Ben,
On 03/13/2014 06:17 AM, Benjamin LaHaise wrote:

> Hello Tang,
> 
> On Wed, Mar 12, 2014 at 01:25:26PM +0800, Tang Chen wrote:
> ... <snip> ...
> 
>>> Another spot is in
>>> aio_read_events_ring() where head and tail are fetched from the ring 
>>> without
>>> any locking.  I also fear we'll be introducing new performance issues with
>>> all the additonal spinlock bouncing, despite the fact that is only ever
>>> needed for migration.  I'm going to continue looking into this today and
>>> will try to send out a followup to this email later.
>>
>> In the beginning of aio_read_events_ring(), it reads head and tail, not 
>> write.
>> So even if ring pages are migrated, the contents of the pages will not 
>> be changed.
>> So reading it is OK, from old page or from the new page, I think.
> 
> Your assumption that reading it is okay is incorrect.  Since we do not have 
> a reference on the page at that point, it is possible that the read of the 
> page takes place after the page has been freed and allocated to another part 
> of the kernel.  This would result in the read returning invalid information.

What about the following patch? It adds additional reference to protect the page
avoid being freed when we reading it.
ps.It is applied on linux-next(3-13).

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 fs/aio.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 4133ba9..a4f3a4f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -283,7 +283,7 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 {
 	struct kioctx *ctx;
 	unsigned long flags;
-	int rc;
+	int rc, extra_count;
 
 	rc = 0;
 
@@ -311,7 +311,10 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 	BUG_ON(PageWriteback(old));
 	get_page(new);
 
-	rc = migrate_page_move_mapping(mapping, new, old, NULL, mode, 1);
+	extra_count = page_count(old) - page_has_private(old) - 2;
+
+	rc = migrate_page_move_mapping(mapping, new, old,
+						NULL, mode, extra_count);
 	if (rc != MIGRATEPAGE_SUCCESS) {
 		put_page(new);
 		return rc;
@@ -1047,13 +1050,17 @@ static long aio_read_events_ring(struct kioctx *ctx,
 	unsigned head, tail, pos;
 	long ret = 0;
 	int copy_ret;
+	struct page *page;
 
 	mutex_lock(&ctx->ring_lock);
 
-	ring = kmap_atomic(ctx->ring_pages[0]);
+	page = ctx->ring_pages[0];
+	get_page(page);
+	ring = kmap_atomic(page);
 	head = ring->head;
 	tail = ring->tail;
 	kunmap_atomic(ring);
+	put_page(page);
 
 	pr_debug("h%u t%u m%u\n", head, tail, ctx->nr_events);
 
@@ -1063,7 +1070,6 @@ static long aio_read_events_ring(struct kioctx *ctx,
 	while (ret < nr) {
 		long avail;
 		struct io_event *ev;
-		struct page *page;
 
 		avail = (head <= tail ?  tail : ctx->nr_events) - head;
 		if (head == tail)
@@ -1075,6 +1081,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
 
 		pos = head + AIO_EVENTS_OFFSET;
 		page = ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE];
+		get_page(page);
 		pos %= AIO_EVENTS_PER_PAGE;
 
 		/*
@@ -1087,6 +1094,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
 		copy_ret = copy_to_user(event + ret, ev + pos,
 					sizeof(*ev) * avail);
 		kunmap(page);
+		put_page(page);
 
 		if (unlikely(copy_ret)) {
 			ret = -EFAULT;
-- 
1.7.7


> 
> 		-ben


--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-03-14 10:25           ` Gu Zheng
@ 2014-03-14 15:14             ` Benjamin LaHaise
  -1 siblings, 0 replies; 24+ messages in thread
From: Benjamin LaHaise @ 2014-03-14 15:14 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Tang Chen, viro, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, linux-fsdevel, linux-aio, linux-kernel, miaox

Hi Gu,

On Fri, Mar 14, 2014 at 06:25:16PM +0800, Gu Zheng wrote:
> Hi Ben,
> On 03/13/2014 06:17 AM, Benjamin LaHaise wrote:
> 
> > Hello Tang,
> > 
> > On Wed, Mar 12, 2014 at 01:25:26PM +0800, Tang Chen wrote:
> > ... <snip> ...
> > 
> >>> Another spot is in
> >>> aio_read_events_ring() where head and tail are fetched from the ring 
> >>> without
> >>> any locking.  I also fear we'll be introducing new performance issues with
> >>> all the additonal spinlock bouncing, despite the fact that is only ever
> >>> needed for migration.  I'm going to continue looking into this today and
> >>> will try to send out a followup to this email later.
> >>
> >> In the beginning of aio_read_events_ring(), it reads head and tail, not 
> >> write.
> >> So even if ring pages are migrated, the contents of the pages will not 
> >> be changed.
> >> So reading it is OK, from old page or from the new page, I think.
> > 
> > Your assumption that reading it is okay is incorrect.  Since we do not have 
> > a reference on the page at that point, it is possible that the read of the 
> > page takes place after the page has been freed and allocated to another part 
> > of the kernel.  This would result in the read returning invalid information.
> 
> What about the following patch? It adds additional reference to protect the page
> avoid being freed when we reading it.
> ps.It is applied on linux-next(3-13).

I think that's even worse than the spinlock approach since we'll end up 
bouncing around the struct page's cacheline in addition to spinlock we're 
going to end up taking anyways.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
@ 2014-03-14 15:14             ` Benjamin LaHaise
  0 siblings, 0 replies; 24+ messages in thread
From: Benjamin LaHaise @ 2014-03-14 15:14 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Tang Chen, viro, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, linux-fsdevel, linux-aio, linux-kernel, miaox

Hi Gu,

On Fri, Mar 14, 2014 at 06:25:16PM +0800, Gu Zheng wrote:
> Hi Ben,
> On 03/13/2014 06:17 AM, Benjamin LaHaise wrote:
> 
> > Hello Tang,
> > 
> > On Wed, Mar 12, 2014 at 01:25:26PM +0800, Tang Chen wrote:
> > ... <snip> ...
> > 
> >>> Another spot is in
> >>> aio_read_events_ring() where head and tail are fetched from the ring 
> >>> without
> >>> any locking.  I also fear we'll be introducing new performance issues with
> >>> all the additonal spinlock bouncing, despite the fact that is only ever
> >>> needed for migration.  I'm going to continue looking into this today and
> >>> will try to send out a followup to this email later.
> >>
> >> In the beginning of aio_read_events_ring(), it reads head and tail, not 
> >> write.
> >> So even if ring pages are migrated, the contents of the pages will not 
> >> be changed.
> >> So reading it is OK, from old page or from the new page, I think.
> > 
> > Your assumption that reading it is okay is incorrect.  Since we do not have 
> > a reference on the page at that point, it is possible that the read of the 
> > page takes place after the page has been freed and allocated to another part 
> > of the kernel.  This would result in the read returning invalid information.
> 
> What about the following patch? It adds additional reference to protect the page
> avoid being freed when we reading it.
> ps.It is applied on linux-next(3-13).

I think that's even worse than the spinlock approach since we'll end up 
bouncing around the struct page's cacheline in addition to spinlock we're 
going to end up taking anyways.

		-ben
-- 
"Thought is the essence of where you are now."

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-03-14 15:14             ` Benjamin LaHaise
@ 2014-03-16  2:06               ` Gu Zheng
  -1 siblings, 0 replies; 24+ messages in thread
From: Gu Zheng @ 2014-03-16  2:06 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, viro, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, linux-fsdevel, linux-aio, linux-kernel, miaox,
	Andrew Morton

Hi Ben,
Sorry for late.
On 03/14/2014 11:14 PM, Benjamin LaHaise wrote:

> Hi Gu,
> 
> On Fri, Mar 14, 2014 at 06:25:16PM +0800, Gu Zheng wrote:
>> Hi Ben,
>> On 03/13/2014 06:17 AM, Benjamin LaHaise wrote:
>>
>>> Hello Tang,
>>>
>>> On Wed, Mar 12, 2014 at 01:25:26PM +0800, Tang Chen wrote:
>>> ... <snip> ...
>>>
>>>>> Another spot is in
>>>>> aio_read_events_ring() where head and tail are fetched from the ring 
>>>>> without
>>>>> any locking.  I also fear we'll be introducing new performance issues with
>>>>> all the additonal spinlock bouncing, despite the fact that is only ever
>>>>> needed for migration.  I'm going to continue looking into this today and
>>>>> will try to send out a followup to this email later.
>>>>
>>>> In the beginning of aio_read_events_ring(), it reads head and tail, not 
>>>> write.
>>>> So even if ring pages are migrated, the contents of the pages will not 
>>>> be changed.
>>>> So reading it is OK, from old page or from the new page, I think.
>>>
>>> Your assumption that reading it is okay is incorrect.  Since we do not have 
>>> a reference on the page at that point, it is possible that the read of the 
>>> page takes place after the page has been freed and allocated to another part 
>>> of the kernel.  This would result in the read returning invalid information.
>>
>> What about the following patch? It adds additional reference to protect the page
>> avoid being freed when we reading it.
>> ps.It is applied on linux-next(3-13).
> 
> I think that's even worse than the spinlock approach since we'll end up 
> bouncing around the struct page's cacheline in addition to spinlock we're 
> going to end up taking anyways.

But we can not use spinlock approach to avoid this issue in aio_read_events_ring(),
because we need to copy events to user space. And on the other side, it will break
the concurrency of aio_read_events_ring() and aio_complete().
Besides, IMHO, the problem you mentioned above is almost insignificant when reading
events.
Any better solution? Other guys?

Regards,
Gu 

> 
> 		-ben



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
@ 2014-03-16  2:06               ` Gu Zheng
  0 siblings, 0 replies; 24+ messages in thread
From: Gu Zheng @ 2014-03-16  2:06 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, viro, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, linux-fsdevel, linux-aio, linux-kernel, miaox,
	Andrew Morton

Hi Ben,
Sorry for late.
On 03/14/2014 11:14 PM, Benjamin LaHaise wrote:

> Hi Gu,
> 
> On Fri, Mar 14, 2014 at 06:25:16PM +0800, Gu Zheng wrote:
>> Hi Ben,
>> On 03/13/2014 06:17 AM, Benjamin LaHaise wrote:
>>
>>> Hello Tang,
>>>
>>> On Wed, Mar 12, 2014 at 01:25:26PM +0800, Tang Chen wrote:
>>> ... <snip> ...
>>>
>>>>> Another spot is in
>>>>> aio_read_events_ring() where head and tail are fetched from the ring 
>>>>> without
>>>>> any locking.  I also fear we'll be introducing new performance issues with
>>>>> all the additonal spinlock bouncing, despite the fact that is only ever
>>>>> needed for migration.  I'm going to continue looking into this today and
>>>>> will try to send out a followup to this email later.
>>>>
>>>> In the beginning of aio_read_events_ring(), it reads head and tail, not 
>>>> write.
>>>> So even if ring pages are migrated, the contents of the pages will not 
>>>> be changed.
>>>> So reading it is OK, from old page or from the new page, I think.
>>>
>>> Your assumption that reading it is okay is incorrect.  Since we do not have 
>>> a reference on the page at that point, it is possible that the read of the 
>>> page takes place after the page has been freed and allocated to another part 
>>> of the kernel.  This would result in the read returning invalid information.
>>
>> What about the following patch? It adds additional reference to protect the page
>> avoid being freed when we reading it.
>> ps.It is applied on linux-next(3-13).
> 
> I think that's even worse than the spinlock approach since we'll end up 
> bouncing around the struct page's cacheline in addition to spinlock we're 
> going to end up taking anyways.

But we can not use spinlock approach to avoid this issue in aio_read_events_ring(),
because we need to copy events to user space. And on the other side, it will break
the concurrency of aio_read_events_ring() and aio_complete().
Besides, IMHO, the problem you mentioned above is almost insignificant when reading
events.
Any better solution? Other guys?

Regards,
Gu 

> 
> 		-ben


--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 0/2] Bug fix in aio ring page migration
  2014-03-13  9:45   ` Gu Zheng
  (?)
@ 2014-03-16 21:21   ` Ben Hutchings
  -1 siblings, 0 replies; 24+ messages in thread
From: Ben Hutchings @ 2014-03-16 21:21 UTC (permalink / raw)
  To: Gu Zheng
  Cc: stable, Greg Kroah-Hartman, viro, bcrl, jmoyer, kosaki.motohiro,
	kosaki.motohiro, isimatu.yasuaki, linux-fsdevel, linux-aio,
	linux-kernel, miaox, tangchen

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

On Thu, 2014-03-13 at 17:45 +0800, Gu Zheng wrote:
> This patchset has been applied to linux-next, and these problems also exist
> in 3.12.y and 3.13.y stable tree.
> So please merge this patchset to 3.12.y and 3.13.y stable tree.

They must be in Linus's tree before they are acceptable for stable.

Ben.

> commit 692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
> Author: Tang Chen <tangchen@cn.fujitsu.com>
> Date:   Mon Mar 10 16:15:33 2014 +0800
> 
>     aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
> 
> commit e0f5e0add36d2e3c456cf2f4283673ed834b3c24
> Author: Tang Chen <tangchen@cn.fujitsu.com>
> Date:   Mon Mar 10 16:15:34 2014 +0800
> 
>     aio, mem-hotplug: Add memory barrier to aio ring page migration.
[...]

-- 
Ben Hutchings
Computers are not intelligent.	They only think they are.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-03-14 15:14             ` Benjamin LaHaise
  (?)
  (?)
@ 2014-03-17  6:50             ` Tang Chen
  -1 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2014-03-17  6:50 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Gu Zheng, viro, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, linux-fsdevel, linux-aio, linux-kernel, miaox

On 03/14/2014 11:14 PM, Benjamin LaHaise wrote:
......
>> What about the following patch? It adds additional reference to protect the page
>> avoid being freed when we reading it.
>> ps.It is applied on linux-next(3-13).
>
> I think that's even worse than the spinlock approach since we'll end up
> bouncing around the struct page's cacheline in addition to spinlock we're
> going to end up taking anyways.
>

Hi Benjamin,

I'm sorry, I don't quite understand the cacheline problem you mentioned 
above.
Would you please explain more ?

Thanks.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 0/2] Bug fix in aio ring page migration
  2014-03-13  9:45   ` Gu Zheng
  (?)
  (?)
@ 2014-05-13 23:58   ` Greg Kroah-Hartman
  2014-05-14  1:13     ` Gu Zheng
  -1 siblings, 1 reply; 24+ messages in thread
From: Greg Kroah-Hartman @ 2014-05-13 23:58 UTC (permalink / raw)
  To: Gu Zheng
  Cc: stable, viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, linux-fsdevel, linux-aio, linux-kernel, miaox,
	tangchen

On Thu, Mar 13, 2014 at 05:45:42PM +0800, Gu Zheng wrote:
> This patchset has been applied to linux-next, and these problems also exist
> in 3.12.y and 3.13.y stable tree.
> So please merge this patchset to 3.12.y and 3.13.y stable tree.
> 
> commit 692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
> Author: Tang Chen <tangchen@cn.fujitsu.com>
> Date:   Mon Mar 10 16:15:33 2014 +0800
> 
>     aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
> 
> commit e0f5e0add36d2e3c456cf2f4283673ed834b3c24
> Author: Tang Chen <tangchen@cn.fujitsu.com>
> Date:   Mon Mar 10 16:15:34 2014 +0800
> 
>     aio, mem-hotplug: Add memory barrier to aio ring page migration.
> 
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/fs/aio.c?id=692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/fs/aio.c?id=e0f5e0add36d2e3c456cf2f4283673ed834b3c24
> 
> https://lkml.org/lkml/2014/3/10/56
> https://lkml.org/lkml/2014/3/10/58

I don't see these in Linus's tree, what happened?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RESEND v2 PATCH 0/2] Bug fix in aio ring page migration
  2014-05-13 23:58   ` Greg Kroah-Hartman
@ 2014-05-14  1:13     ` Gu Zheng
  0 siblings, 0 replies; 24+ messages in thread
From: Gu Zheng @ 2014-05-14  1:13 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: stable, viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, linux-fsdevel, linux-aio, linux-kernel, miaox,
	tangchen

Hi Greg,
Thanks for your attention on this thread.
On 05/14/2014 07:58 AM, Greg Kroah-Hartman wrote:

> On Thu, Mar 13, 2014 at 05:45:42PM +0800, Gu Zheng wrote:
>> This patchset has been applied to linux-next, and these problems also exist
>> in 3.12.y and 3.13.y stable tree.
>> So please merge this patchset to 3.12.y and 3.13.y stable tree.
>>
>> commit 692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
>> Author: Tang Chen <tangchen@cn.fujitsu.com>
>> Date:   Mon Mar 10 16:15:33 2014 +0800
>>
>>     aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
>>
>> commit e0f5e0add36d2e3c456cf2f4283673ed834b3c24
>> Author: Tang Chen <tangchen@cn.fujitsu.com>
>> Date:   Mon Mar 10 16:15:34 2014 +0800
>>
>>     aio, mem-hotplug: Add memory barrier to aio ring page migration.
>>
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/fs/aio.c?id=692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/fs/aio.c?id=e0f5e0add36d2e3c456cf2f4283673ed834b3c24
>>
>> https://lkml.org/lkml/2014/3/10/56
>> https://lkml.org/lkml/2014/3/10/58
> 
> I don't see these in Linus's tree, what happened?

These two patches were replaced by the better one from Benjamin, and it has been
merged into upstream.
commit fa8a53c39f3fdde98c9eace6a9b412143f0f6ed6
Author: Benjamin LaHaise <bcrl@kvack.org>
Date:   Fri Mar 28 10:14:45 2014 -0400

    aio: v4 ensure access to ctx->ring_pages is correctly serialised for migration
And I have already received the notify that it has been applied into stable tree.

Best regards,
Gu

> 
> thanks,
> 
> greg k-h
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
> .
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-05-14  1:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-10  8:15 [RESEND v2 PATCH 0/2] Bug fix in aio ring page migration Tang Chen
2014-03-10  8:15 ` Tang Chen
2014-03-10  8:15 ` [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
2014-03-10  8:15   ` Tang Chen
2014-03-11 18:46   ` Benjamin LaHaise
2014-03-11 18:46     ` Benjamin LaHaise
2014-03-12  5:25     ` Tang Chen
2014-03-12  5:25       ` Tang Chen
2014-03-12 22:17       ` Benjamin LaHaise
2014-03-12 22:17         ` Benjamin LaHaise
2014-03-14 10:25         ` Gu Zheng
2014-03-14 10:25           ` Gu Zheng
2014-03-14 15:14           ` Benjamin LaHaise
2014-03-14 15:14             ` Benjamin LaHaise
2014-03-16  2:06             ` Gu Zheng
2014-03-16  2:06               ` Gu Zheng
2014-03-17  6:50             ` Tang Chen
2014-03-10  8:15 ` [RESEND v2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration Tang Chen
2014-03-10  8:15   ` Tang Chen
2014-03-13  9:45 ` [RESEND v2 PATCH 0/2] Bug fix in " Gu Zheng
2014-03-13  9:45   ` Gu Zheng
2014-03-16 21:21   ` Ben Hutchings
2014-05-13 23:58   ` Greg Kroah-Hartman
2014-05-14  1:13     ` Gu Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.