All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] THP: Use explicit memory barrier
@ 2013-03-31 23:45 ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2013-03-31 23:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Minchan Kim, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

__do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
spinlock for making sure that clear_huge_page write become visible
after set set_pmd_at() write.

But lru_cache_add_lru uses pagevec so it could miss spinlock
easily so above rule was broken so user may see inconsistent data.

This patch fixes it with using explict barrier rather than depending
on lru spinlock.

Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/huge_memory.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bfa142e..fad800e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 		pmd_t entry;
 		entry = mk_huge_pmd(page, vma);
 		/*
-		 * The spinlocking to take the lru_lock inside
-		 * page_add_new_anon_rmap() acts as a full memory
-		 * barrier to be sure clear_huge_page writes become
-		 * visible after the set_pmd_at() write.
+		 * clear_huge_page write become visible after the
+		 * set_pmd_at() write.
 		 */
+		smp_wmb();
 		page_add_new_anon_rmap(page, vma, haddr);
 		set_pmd_at(mm, haddr, pmd, entry);
 		pgtable_trans_huge_deposit(mm, pgtable);
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] THP: Use explicit memory barrier
@ 2013-03-31 23:45 ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2013-03-31 23:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Minchan Kim, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

__do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
spinlock for making sure that clear_huge_page write become visible
after set set_pmd_at() write.

But lru_cache_add_lru uses pagevec so it could miss spinlock
easily so above rule was broken so user may see inconsistent data.

This patch fixes it with using explict barrier rather than depending
on lru spinlock.

Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/huge_memory.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bfa142e..fad800e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 		pmd_t entry;
 		entry = mk_huge_pmd(page, vma);
 		/*
-		 * The spinlocking to take the lru_lock inside
-		 * page_add_new_anon_rmap() acts as a full memory
-		 * barrier to be sure clear_huge_page writes become
-		 * visible after the set_pmd_at() write.
+		 * clear_huge_page write become visible after the
+		 * set_pmd_at() write.
 		 */
+		smp_wmb();
 		page_add_new_anon_rmap(page, vma, haddr);
 		set_pmd_at(mm, haddr, pmd, entry);
 		pgtable_trans_huge_deposit(mm, pgtable);
-- 
1.8.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
  2013-03-31 23:45 ` Minchan Kim
@ 2013-04-01  1:01   ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 20+ messages in thread
From: Kamezawa Hiroyuki @ 2013-04-01  1:01 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

(2013/04/01 8:45), Minchan Kim wrote:
> __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> spinlock for making sure that clear_huge_page write become visible
> after set set_pmd_at() write.
> 
> But lru_cache_add_lru uses pagevec so it could miss spinlock
> easily so above rule was broken so user may see inconsistent data.
> This patch fixes it with using explict barrier rather than depending
> on lru spinlock.
> 

Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
Users can see non-zero value after page fault in theory ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
@ 2013-04-01  1:01   ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: Kamezawa Hiroyuki @ 2013-04-01  1:01 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

(2013/04/01 8:45), Minchan Kim wrote:
> __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> spinlock for making sure that clear_huge_page write become visible
> after set set_pmd_at() write.
> 
> But lru_cache_add_lru uses pagevec so it could miss spinlock
> easily so above rule was broken so user may see inconsistent data.
> This patch fixes it with using explict barrier rather than depending
> on lru spinlock.
> 

Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
Users can see non-zero value after page fault in theory ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
  2013-04-01  1:01   ` Kamezawa Hiroyuki
@ 2013-04-01  4:45     ` Minchan Kim
  -1 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2013-04-01  4:45 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

Hey Kame,

On Mon, Apr 01, 2013 at 10:01:49AM +0900, Kamezawa Hiroyuki wrote:
> (2013/04/01 8:45), Minchan Kim wrote:
> > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > spinlock for making sure that clear_huge_page write become visible
> > after set set_pmd_at() write.
> > 
> > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > easily so above rule was broken so user may see inconsistent data.
> > This patch fixes it with using explict barrier rather than depending
> > on lru spinlock.
> > 
> 
> Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> Users can see non-zero value after page fault in theory ?

Maybe, but as you know well, we didn't see any report about that until now
so I'm not sure. Ccing people in this thread could have clear answer rather
than me.

In THP, I just found inconsistency between Andrea's comment and code.
That's why I sent a patch.

If your concern turns out truth, Ya, you might find ancient bug. :)
Thanks.

> 
> Thanks,
> -Kame
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
@ 2013-04-01  4:45     ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2013-04-01  4:45 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

Hey Kame,

On Mon, Apr 01, 2013 at 10:01:49AM +0900, Kamezawa Hiroyuki wrote:
> (2013/04/01 8:45), Minchan Kim wrote:
> > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > spinlock for making sure that clear_huge_page write become visible
> > after set set_pmd_at() write.
> > 
> > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > easily so above rule was broken so user may see inconsistent data.
> > This patch fixes it with using explict barrier rather than depending
> > on lru spinlock.
> > 
> 
> Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> Users can see non-zero value after page fault in theory ?

Maybe, but as you know well, we didn't see any report about that until now
so I'm not sure. Ccing people in this thread could have clear answer rather
than me.

In THP, I just found inconsistency between Andrea's comment and code.
That's why I sent a patch.

If your concern turns out truth, Ya, you might find ancient bug. :)
Thanks.

> 
> Thanks,
> -Kame
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
  2013-03-31 23:45 ` Minchan Kim
@ 2013-04-01 23:35   ` David Rientjes
  -1 siblings, 0 replies; 20+ messages in thread
From: David Rientjes @ 2013-04-01 23:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

On Mon, 1 Apr 2013, Minchan Kim wrote:

> __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> spinlock for making sure that clear_huge_page write become visible
> after set set_pmd_at() write.
> 
> But lru_cache_add_lru uses pagevec so it could miss spinlock
> easily so above rule was broken so user may see inconsistent data.
> 
> This patch fixes it with using explict barrier rather than depending
> on lru spinlock.
> 

Is this the same issue that Andrea responded to in the "thp and memory 
barrier assumptions" thread at http://marc.info/?t=134333512700004 ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
@ 2013-04-01 23:35   ` David Rientjes
  0 siblings, 0 replies; 20+ messages in thread
From: David Rientjes @ 2013-04-01 23:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

On Mon, 1 Apr 2013, Minchan Kim wrote:

> __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> spinlock for making sure that clear_huge_page write become visible
> after set set_pmd_at() write.
> 
> But lru_cache_add_lru uses pagevec so it could miss spinlock
> easily so above rule was broken so user may see inconsistent data.
> 
> This patch fixes it with using explict barrier rather than depending
> on lru spinlock.
> 

Is this the same issue that Andrea responded to in the "thp and memory 
barrier assumptions" thread at http://marc.info/?t=134333512700004 ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
  2013-04-01 23:35   ` David Rientjes
@ 2013-04-02  0:37     ` Minchan Kim
  -1 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2013-04-02  0:37 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> On Mon, 1 Apr 2013, Minchan Kim wrote:
> 
> > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > spinlock for making sure that clear_huge_page write become visible
> > after set set_pmd_at() write.
> > 
> > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > easily so above rule was broken so user may see inconsistent data.
> > 
> > This patch fixes it with using explict barrier rather than depending
> > on lru spinlock.
> > 
> 
> Is this the same issue that Andrea responded to in the "thp and memory 
> barrier assumptions" thread at http://marc.info/?t=134333512700004 ?

Yes and Peter pointed out further step.
Thanks for pointing out.
Not that I know that Andrea alreay noticed it, I don't care about this
patch.

Remaining question is Kame's one.
Isn't there anyone could answer it?

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
@ 2013-04-02  0:37     ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2013-04-02  0:37 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> On Mon, 1 Apr 2013, Minchan Kim wrote:
> 
> > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > spinlock for making sure that clear_huge_page write become visible
> > after set set_pmd_at() write.
> > 
> > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > easily so above rule was broken so user may see inconsistent data.
> > 
> > This patch fixes it with using explict barrier rather than depending
> > on lru spinlock.
> > 
> 
> Is this the same issue that Andrea responded to in the "thp and memory 
> barrier assumptions" thread at http://marc.info/?t=134333512700004 ?

Yes and Peter pointed out further step.
Thanks for pointing out.
Not that I know that Andrea alreay noticed it, I don't care about this
patch.

Remaining question is Kame's one.
Isn't there anyone could answer it?

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
  2013-04-02  0:37     ` Minchan Kim
@ 2013-04-02 19:20       ` David Rientjes
  -1 siblings, 0 replies; 20+ messages in thread
From: David Rientjes @ 2013-04-02 19:20 UTC (permalink / raw)
  To: Minchan Kim, Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Hugh Dickins

On Tue, 2 Apr 2013, Minchan Kim wrote:

> Yes and Peter pointed out further step.
> Thanks for pointing out.
> Not that I know that Andrea alreay noticed it, I don't care about this
> patch.
> 

Andrea, do you have time to send

c08e0c9ee786 ("thp: add memory barrier to __do_huge_pmd_anonymous_page")
b08d75a595ec ("thp: document barrier() in wrprotect THP fault path")

from the master branch of aa.git to Andrew?  I would do it, but one isn't 
signed-off in your tree.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
@ 2013-04-02 19:20       ` David Rientjes
  0 siblings, 0 replies; 20+ messages in thread
From: David Rientjes @ 2013-04-02 19:20 UTC (permalink / raw)
  To: Minchan Kim, Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Hugh Dickins

On Tue, 2 Apr 2013, Minchan Kim wrote:

> Yes and Peter pointed out further step.
> Thanks for pointing out.
> Not that I know that Andrea alreay noticed it, I don't care about this
> patch.
> 

Andrea, do you have time to send

c08e0c9ee786 ("thp: add memory barrier to __do_huge_pmd_anonymous_page")
b08d75a595ec ("thp: document barrier() in wrprotect THP fault path")

from the master branch of aa.git to Andrew?  I would do it, but one isn't 
signed-off in your tree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
  2013-04-02  0:37     ` Minchan Kim
@ 2013-04-02 19:30       ` Hugh Dickins
  -1 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-04-02 19:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, linux-kernel, linux-mm,
	Mel Gorman, Andrea Arcangeli, Kamezawa Hiroyuki

On Tue, 2 Apr 2013, Minchan Kim wrote:
> On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> > On Mon, 1 Apr 2013, Minchan Kim wrote:
> > 
> > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > > spinlock for making sure that clear_huge_page write become visible
> > > after set set_pmd_at() write.
> > > 
> > > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > > easily so above rule was broken so user may see inconsistent data.
> > > 
> > > This patch fixes it with using explict barrier rather than depending
> > > on lru spinlock.
> > > 
> > 
> > Is this the same issue that Andrea responded to in the "thp and memory 
> > barrier assumptions" thread at http://marc.info/?t=134333512700004 ?
> 
> Yes and Peter pointed out further step.
> Thanks for pointing out.
> Not that I know that Andrea alreay noticed it, I don't care about this
> patch.
> 
> Remaining question is Kame's one.
> > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> > Users can see non-zero value after page fault in theory ?
> Isn't there anyone could answer it?

See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us

static inline void __SetPageUptodate(struct page *page)
{
	smp_wmb();
	__set_bit(PG_uptodate, &(page)->flags);
}

So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe
to me already, though the huge_memory one could do with a fixed comment.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
@ 2013-04-02 19:30       ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-04-02 19:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, linux-kernel, linux-mm,
	Mel Gorman, Andrea Arcangeli, Kamezawa Hiroyuki

On Tue, 2 Apr 2013, Minchan Kim wrote:
> On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> > On Mon, 1 Apr 2013, Minchan Kim wrote:
> > 
> > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > > spinlock for making sure that clear_huge_page write become visible
> > > after set set_pmd_at() write.
> > > 
> > > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > > easily so above rule was broken so user may see inconsistent data.
> > > 
> > > This patch fixes it with using explict barrier rather than depending
> > > on lru spinlock.
> > > 
> > 
> > Is this the same issue that Andrea responded to in the "thp and memory 
> > barrier assumptions" thread at http://marc.info/?t=134333512700004 ?
> 
> Yes and Peter pointed out further step.
> Thanks for pointing out.
> Not that I know that Andrea alreay noticed it, I don't care about this
> patch.
> 
> Remaining question is Kame's one.
> > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> > Users can see non-zero value after page fault in theory ?
> Isn't there anyone could answer it?

See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us

static inline void __SetPageUptodate(struct page *page)
{
	smp_wmb();
	__set_bit(PG_uptodate, &(page)->flags);
}

So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe
to me already, though the huge_memory one could do with a fixed comment.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
  2013-04-02 19:30       ` Hugh Dickins
@ 2013-04-03  0:14         ` Minchan Kim
  -1 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2013-04-03  0:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: David Rientjes, Andrew Morton, linux-kernel, linux-mm,
	Mel Gorman, Andrea Arcangeli, Kamezawa Hiroyuki, Peter Zijlstra

On Tue, Apr 02, 2013 at 12:30:15PM -0700, Hugh Dickins wrote:
> On Tue, 2 Apr 2013, Minchan Kim wrote:
> > On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> > > On Mon, 1 Apr 2013, Minchan Kim wrote:
> > > 
> > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > > > spinlock for making sure that clear_huge_page write become visible
> > > > after set set_pmd_at() write.
> > > > 
> > > > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > > > easily so above rule was broken so user may see inconsistent data.
> > > > 
> > > > This patch fixes it with using explict barrier rather than depending
> > > > on lru spinlock.
> > > > 
> > > 
> > > Is this the same issue that Andrea responded to in the "thp and memory 
> > > barrier assumptions" thread at http://marc.info/?t=134333512700004 ?
> > 
> > Yes and Peter pointed out further step.
> > Thanks for pointing out.
> > Not that I know that Andrea alreay noticed it, I don't care about this
> > patch.
> > 
> > Remaining question is Kame's one.
> > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> > > Users can see non-zero value after page fault in theory ?
> > Isn't there anyone could answer it?
> 
> See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us
> 
> static inline void __SetPageUptodate(struct page *page)
> {
> 	smp_wmb();
> 	__set_bit(PG_uptodate, &(page)->flags);
> }
> 
> So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe
> to me already, though the huge_memory one could do with a fixed comment.

Thanks you very much!
That's one everybody are really missing.

Here it goes!

==================== 8< =====================

>From fb0b9f3df698547bfb70f81d85e0d1e00f19e1fc Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Wed, 3 Apr 2013 08:53:27 +0900
Subject: [PATCH] THP: fix comment about memory barrier

Now, memory barrier in __do_huge_pmd_anonymous_page doesn't work.
Because lru_cache_add_lru uses pagevec so it could miss spinlock
easily so above rule was broken so user might see inconsistent data.

I was not first person who pointed out the problem. Mel and Peter
pointed out a few months ago and Peter pointed out further that
even spin_lock/unlock can't make sure it.
http://marc.info/?t=134333512700004

	In particular:

        	*A = a;
        	LOCK
        	UNLOCK
        	*B = b;

	may occur as:

        	LOCK, STORE *B, STORE *A, UNLOCK

At last, Hugh pointed out that even we don't need memory barrier
in there because __SetPageUpdate already have done it from
Nick's [1] explicitly.

So this patch fixes comment on THP and adds same comment for
do_anonymous_page, too because everybody except Hugh was missing
that. It means we needs COMMENT about that.

[1] 0ed361dec "mm: fix PageUptodate data race"

Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/huge_memory.c | 11 +++++------
 mm/memory.c      |  5 +++++
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e2f7f5aa..f2f17ff 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -713,6 +713,11 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 		return VM_FAULT_OOM;
 
 	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	/*
+	 * The memory barrier inside __SetPageUptodate makes sure that
+	 * clear_huge_page writes become visible after the set_pmd_at()
+	 * write.
+	 */
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
@@ -724,12 +729,6 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	} else {
 		pmd_t entry;
 		entry = mk_huge_pmd(page, vma);
-		/*
-		 * The spinlocking to take the lru_lock inside
-		 * page_add_new_anon_rmap() acts as a full memory
-		 * barrier to be sure clear_huge_page writes become
-		 * visible after the set_pmd_at() write.
-		 */
 		page_add_new_anon_rmap(page, vma, haddr);
 		set_pmd_at(mm, haddr, pmd, entry);
 		pgtable_trans_huge_deposit(mm, pgtable);
diff --git a/mm/memory.c b/mm/memory.c
index 494526a..d0da51e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3196,6 +3196,11 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	page = alloc_zeroed_user_highpage_movable(vma, address);
 	if (!page)
 		goto oom;
+	/*
+	 * The memory barrier inside __SetPageUptodate makes sure that
+	 * preceeding stores to the page contents become visible after
+	 * the set_pte_at() write.
+	 */
 	__SetPageUptodate(page);
 
 	if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))
-- 
1.8.2

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
@ 2013-04-03  0:14         ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2013-04-03  0:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: David Rientjes, Andrew Morton, linux-kernel, linux-mm,
	Mel Gorman, Andrea Arcangeli, Kamezawa Hiroyuki, Peter Zijlstra

On Tue, Apr 02, 2013 at 12:30:15PM -0700, Hugh Dickins wrote:
> On Tue, 2 Apr 2013, Minchan Kim wrote:
> > On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> > > On Mon, 1 Apr 2013, Minchan Kim wrote:
> > > 
> > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > > > spinlock for making sure that clear_huge_page write become visible
> > > > after set set_pmd_at() write.
> > > > 
> > > > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > > > easily so above rule was broken so user may see inconsistent data.
> > > > 
> > > > This patch fixes it with using explict barrier rather than depending
> > > > on lru spinlock.
> > > > 
> > > 
> > > Is this the same issue that Andrea responded to in the "thp and memory 
> > > barrier assumptions" thread at http://marc.info/?t=134333512700004 ?
> > 
> > Yes and Peter pointed out further step.
> > Thanks for pointing out.
> > Not that I know that Andrea alreay noticed it, I don't care about this
> > patch.
> > 
> > Remaining question is Kame's one.
> > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> > > Users can see non-zero value after page fault in theory ?
> > Isn't there anyone could answer it?
> 
> See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us
> 
> static inline void __SetPageUptodate(struct page *page)
> {
> 	smp_wmb();
> 	__set_bit(PG_uptodate, &(page)->flags);
> }
> 
> So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe
> to me already, though the huge_memory one could do with a fixed comment.

Thanks you very much!
That's one everybody are really missing.

Here it goes!

==================== 8< =====================

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
  2013-03-31 23:45 ` Minchan Kim
@ 2013-04-04  2:45   ` Simon Jeons
  -1 siblings, 0 replies; 20+ messages in thread
From: Simon Jeons @ 2013-04-04  2:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

Hi Minchan,
On 04/01/2013 07:45 AM, Minchan Kim wrote:
> __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> spinlock for making sure that clear_huge_page write become visible
> after set set_pmd_at() write.

1. There are no pte modify, why take page_table_lock here?
2. What's the meaning of "clear_huge_page write become visible after set 
set_pmd_at() write"?

>
> But lru_cache_add_lru uses pagevec so it could miss spinlock
> easily so above rule was broken so user may see inconsistent data.
>
> This patch fixes it with using explict barrier rather than depending
> on lru spinlock.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>   mm/huge_memory.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bfa142e..fad800e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>   		pmd_t entry;
>   		entry = mk_huge_pmd(page, vma);
>   		/*
> -		 * The spinlocking to take the lru_lock inside
> -		 * page_add_new_anon_rmap() acts as a full memory
> -		 * barrier to be sure clear_huge_page writes become
> -		 * visible after the set_pmd_at() write.
> +		 * clear_huge_page write become visible after the
> +		 * set_pmd_at() write.
>   		 */
> +		smp_wmb();
>   		page_add_new_anon_rmap(page, vma, haddr);
>   		set_pmd_at(mm, haddr, pmd, entry);
>   		pgtable_trans_huge_deposit(mm, pgtable);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
@ 2013-04-04  2:45   ` Simon Jeons
  0 siblings, 0 replies; 20+ messages in thread
From: Simon Jeons @ 2013-04-04  2:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	Andrea Arcangeli, Hugh Dickins

Hi Minchan,
On 04/01/2013 07:45 AM, Minchan Kim wrote:
> __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> spinlock for making sure that clear_huge_page write become visible
> after set set_pmd_at() write.

1. There are no pte modify, why take page_table_lock here?
2. What's the meaning of "clear_huge_page write become visible after set 
set_pmd_at() write"?

>
> But lru_cache_add_lru uses pagevec so it could miss spinlock
> easily so above rule was broken so user may see inconsistent data.
>
> This patch fixes it with using explict barrier rather than depending
> on lru spinlock.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>   mm/huge_memory.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bfa142e..fad800e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>   		pmd_t entry;
>   		entry = mk_huge_pmd(page, vma);
>   		/*
> -		 * The spinlocking to take the lru_lock inside
> -		 * page_add_new_anon_rmap() acts as a full memory
> -		 * barrier to be sure clear_huge_page writes become
> -		 * visible after the set_pmd_at() write.
> +		 * clear_huge_page write become visible after the
> +		 * set_pmd_at() write.
>   		 */
> +		smp_wmb();
>   		page_add_new_anon_rmap(page, vma, haddr);
>   		set_pmd_at(mm, haddr, pmd, entry);
>   		pgtable_trans_huge_deposit(mm, pgtable);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
  2013-04-03  0:14         ` Minchan Kim
@ 2013-04-04 13:45           ` Andrea Arcangeli
  -1 siblings, 0 replies; 20+ messages in thread
From: Andrea Arcangeli @ 2013-04-04 13:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, David Rientjes, Andrew Morton, linux-kernel,
	linux-mm, Mel Gorman, Kamezawa Hiroyuki, Peter Zijlstra

On Wed, Apr 03, 2013 at 09:14:01AM +0900, Minchan Kim wrote:
>  	clear_huge_page(page, haddr, HPAGE_PMD_NR);
> +	/*
> +	 * The memory barrier inside __SetPageUptodate makes sure that
> +	 * clear_huge_page writes become visible after the set_pmd_at()

s/after/before/

> +	 * write.
> +	 */
>  	__SetPageUptodate(page);
>  
>  	spin_lock(&mm->page_table_lock);
> @@ -724,12 +729,6 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  	} else {
>  		pmd_t entry;
>  		entry = mk_huge_pmd(page, vma);
> -		/*
> -		 * The spinlocking to take the lru_lock inside
> -		 * page_add_new_anon_rmap() acts as a full memory
> -		 * barrier to be sure clear_huge_page writes become
> -		 * visible after the set_pmd_at() write.
> -		 */
>  		page_add_new_anon_rmap(page, vma, haddr);
>  		set_pmd_at(mm, haddr, pmd, entry);
>  		pgtable_trans_huge_deposit(mm, pgtable);
> diff --git a/mm/memory.c b/mm/memory.c
> index 494526a..d0da51e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3196,6 +3196,11 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	page = alloc_zeroed_user_highpage_movable(vma, address);
>  	if (!page)
>  		goto oom;
> +	/*
> +	 * The memory barrier inside __SetPageUptodate makes sure that
> +	 * preceeding stores to the page contents become visible after
> +	 * the set_pte_at() write.
> +	 */

s/after/before/

After the above correction it looks nice cleanup, thanks!

Acked-by: Andrea Arcangeli <aarcange@redhat.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] THP: Use explicit memory barrier
@ 2013-04-04 13:45           ` Andrea Arcangeli
  0 siblings, 0 replies; 20+ messages in thread
From: Andrea Arcangeli @ 2013-04-04 13:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, David Rientjes, Andrew Morton, linux-kernel,
	linux-mm, Mel Gorman, Kamezawa Hiroyuki, Peter Zijlstra

On Wed, Apr 03, 2013 at 09:14:01AM +0900, Minchan Kim wrote:
>  	clear_huge_page(page, haddr, HPAGE_PMD_NR);
> +	/*
> +	 * The memory barrier inside __SetPageUptodate makes sure that
> +	 * clear_huge_page writes become visible after the set_pmd_at()

s/after/before/

> +	 * write.
> +	 */
>  	__SetPageUptodate(page);
>  
>  	spin_lock(&mm->page_table_lock);
> @@ -724,12 +729,6 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  	} else {
>  		pmd_t entry;
>  		entry = mk_huge_pmd(page, vma);
> -		/*
> -		 * The spinlocking to take the lru_lock inside
> -		 * page_add_new_anon_rmap() acts as a full memory
> -		 * barrier to be sure clear_huge_page writes become
> -		 * visible after the set_pmd_at() write.
> -		 */
>  		page_add_new_anon_rmap(page, vma, haddr);
>  		set_pmd_at(mm, haddr, pmd, entry);
>  		pgtable_trans_huge_deposit(mm, pgtable);
> diff --git a/mm/memory.c b/mm/memory.c
> index 494526a..d0da51e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3196,6 +3196,11 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	page = alloc_zeroed_user_highpage_movable(vma, address);
>  	if (!page)
>  		goto oom;
> +	/*
> +	 * The memory barrier inside __SetPageUptodate makes sure that
> +	 * preceeding stores to the page contents become visible after
> +	 * the set_pte_at() write.
> +	 */

s/after/before/

After the above correction it looks nice cleanup, thanks!

Acked-by: Andrea Arcangeli <aarcange@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-04-04 13:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-31 23:45 [PATCH] THP: Use explicit memory barrier Minchan Kim
2013-03-31 23:45 ` Minchan Kim
2013-04-01  1:01 ` Kamezawa Hiroyuki
2013-04-01  1:01   ` Kamezawa Hiroyuki
2013-04-01  4:45   ` Minchan Kim
2013-04-01  4:45     ` Minchan Kim
2013-04-01 23:35 ` David Rientjes
2013-04-01 23:35   ` David Rientjes
2013-04-02  0:37   ` Minchan Kim
2013-04-02  0:37     ` Minchan Kim
2013-04-02 19:20     ` David Rientjes
2013-04-02 19:20       ` David Rientjes
2013-04-02 19:30     ` Hugh Dickins
2013-04-02 19:30       ` Hugh Dickins
2013-04-03  0:14       ` Minchan Kim
2013-04-03  0:14         ` Minchan Kim
2013-04-04 13:45         ` Andrea Arcangeli
2013-04-04 13:45           ` Andrea Arcangeli
2013-04-04  2:45 ` Simon Jeons
2013-04-04  2:45   ` Simon Jeons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.