* [PATCH] THP: Use explicit memory barrier @ 2013-03-31 23:45 ` Minchan Kim 0 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2013-03-31 23:45 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-mm, Minchan Kim, Mel Gorman, Andrea Arcangeli, Hugh Dickins __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's spinlock for making sure that clear_huge_page write become visible after set set_pmd_at() write. But lru_cache_add_lru uses pagevec so it could miss spinlock easily so above rule was broken so user may see inconsistent data. This patch fixes it with using explict barrier rather than depending on lru spinlock. Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> --- mm/huge_memory.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bfa142e..fad800e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, pmd_t entry; entry = mk_huge_pmd(page, vma); /* - * The spinlocking to take the lru_lock inside - * page_add_new_anon_rmap() acts as a full memory - * barrier to be sure clear_huge_page writes become - * visible after the set_pmd_at() write. + * clear_huge_page write become visible after the + * set_pmd_at() write. */ + smp_wmb(); page_add_new_anon_rmap(page, vma, haddr); set_pmd_at(mm, haddr, pmd, entry); pgtable_trans_huge_deposit(mm, pgtable); -- 1.8.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH] THP: Use explicit memory barrier @ 2013-03-31 23:45 ` Minchan Kim 0 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2013-03-31 23:45 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-mm, Minchan Kim, Mel Gorman, Andrea Arcangeli, Hugh Dickins __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's spinlock for making sure that clear_huge_page write become visible after set set_pmd_at() write. But lru_cache_add_lru uses pagevec so it could miss spinlock easily so above rule was broken so user may see inconsistent data. This patch fixes it with using explict barrier rather than depending on lru spinlock. Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> --- mm/huge_memory.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bfa142e..fad800e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, pmd_t entry; entry = mk_huge_pmd(page, vma); /* - * The spinlocking to take the lru_lock inside - * page_add_new_anon_rmap() acts as a full memory - * barrier to be sure clear_huge_page writes become - * visible after the set_pmd_at() write. + * clear_huge_page write become visible after the + * set_pmd_at() write. */ + smp_wmb(); page_add_new_anon_rmap(page, vma, haddr); set_pmd_at(mm, haddr, pmd, entry); pgtable_trans_huge_deposit(mm, pgtable); -- 1.8.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier 2013-03-31 23:45 ` Minchan Kim @ 2013-04-01 1:01 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 20+ messages in thread From: Kamezawa Hiroyuki @ 2013-04-01 1:01 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins (2013/04/01 8:45), Minchan Kim wrote: > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > spinlock for making sure that clear_huge_page write become visible > after set set_pmd_at() write. > > But lru_cache_add_lru uses pagevec so it could miss spinlock > easily so above rule was broken so user may see inconsistent data. > This patch fixes it with using explict barrier rather than depending > on lru spinlock. > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers. Users can see non-zero value after page fault in theory ? Thanks, -Kame ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier @ 2013-04-01 1:01 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 20+ messages in thread From: Kamezawa Hiroyuki @ 2013-04-01 1:01 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins (2013/04/01 8:45), Minchan Kim wrote: > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > spinlock for making sure that clear_huge_page write become visible > after set set_pmd_at() write. > > But lru_cache_add_lru uses pagevec so it could miss spinlock > easily so above rule was broken so user may see inconsistent data. > This patch fixes it with using explict barrier rather than depending > on lru spinlock. > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers. Users can see non-zero value after page fault in theory ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier 2013-04-01 1:01 ` Kamezawa Hiroyuki @ 2013-04-01 4:45 ` Minchan Kim -1 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2013-04-01 4:45 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins Hey Kame, On Mon, Apr 01, 2013 at 10:01:49AM +0900, Kamezawa Hiroyuki wrote: > (2013/04/01 8:45), Minchan Kim wrote: > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > > spinlock for making sure that clear_huge_page write become visible > > after set set_pmd_at() write. > > > > But lru_cache_add_lru uses pagevec so it could miss spinlock > > easily so above rule was broken so user may see inconsistent data. > > This patch fixes it with using explict barrier rather than depending > > on lru spinlock. > > > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers. > Users can see non-zero value after page fault in theory ? Maybe, but as you know well, we didn't see any report about that until now so I'm not sure. Ccing people in this thread could have clear answer rather than me. In THP, I just found inconsistency between Andrea's comment and code. That's why I sent a patch. If your concern turns out truth, Ya, you might find ancient bug. :) Thanks. > > Thanks, > -Kame > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier @ 2013-04-01 4:45 ` Minchan Kim 0 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2013-04-01 4:45 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins Hey Kame, On Mon, Apr 01, 2013 at 10:01:49AM +0900, Kamezawa Hiroyuki wrote: > (2013/04/01 8:45), Minchan Kim wrote: > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > > spinlock for making sure that clear_huge_page write become visible > > after set set_pmd_at() write. > > > > But lru_cache_add_lru uses pagevec so it could miss spinlock > > easily so above rule was broken so user may see inconsistent data. > > This patch fixes it with using explict barrier rather than depending > > on lru spinlock. > > > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers. > Users can see non-zero value after page fault in theory ? Maybe, but as you know well, we didn't see any report about that until now so I'm not sure. Ccing people in this thread could have clear answer rather than me. In THP, I just found inconsistency between Andrea's comment and code. That's why I sent a patch. If your concern turns out truth, Ya, you might find ancient bug. :) Thanks. > > Thanks, > -Kame > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier 2013-03-31 23:45 ` Minchan Kim @ 2013-04-01 23:35 ` David Rientjes -1 siblings, 0 replies; 20+ messages in thread From: David Rientjes @ 2013-04-01 23:35 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins On Mon, 1 Apr 2013, Minchan Kim wrote: > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > spinlock for making sure that clear_huge_page write become visible > after set set_pmd_at() write. > > But lru_cache_add_lru uses pagevec so it could miss spinlock > easily so above rule was broken so user may see inconsistent data. > > This patch fixes it with using explict barrier rather than depending > on lru spinlock. > Is this the same issue that Andrea responded to in the "thp and memory barrier assumptions" thread at http://marc.info/?t=134333512700004 ? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier @ 2013-04-01 23:35 ` David Rientjes 0 siblings, 0 replies; 20+ messages in thread From: David Rientjes @ 2013-04-01 23:35 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins On Mon, 1 Apr 2013, Minchan Kim wrote: > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > spinlock for making sure that clear_huge_page write become visible > after set set_pmd_at() write. > > But lru_cache_add_lru uses pagevec so it could miss spinlock > easily so above rule was broken so user may see inconsistent data. > > This patch fixes it with using explict barrier rather than depending > on lru spinlock. > Is this the same issue that Andrea responded to in the "thp and memory barrier assumptions" thread at http://marc.info/?t=134333512700004 ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier 2013-04-01 23:35 ` David Rientjes @ 2013-04-02 0:37 ` Minchan Kim -1 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2013-04-02 0:37 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote: > On Mon, 1 Apr 2013, Minchan Kim wrote: > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > > spinlock for making sure that clear_huge_page write become visible > > after set set_pmd_at() write. > > > > But lru_cache_add_lru uses pagevec so it could miss spinlock > > easily so above rule was broken so user may see inconsistent data. > > > > This patch fixes it with using explict barrier rather than depending > > on lru spinlock. > > > > Is this the same issue that Andrea responded to in the "thp and memory > barrier assumptions" thread at http://marc.info/?t=134333512700004 ? Yes and Peter pointed out further step. Thanks for pointing out. Not that I know that Andrea alreay noticed it, I don't care about this patch. Remaining question is Kame's one. Isn't there anyone could answer it? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier @ 2013-04-02 0:37 ` Minchan Kim 0 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2013-04-02 0:37 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote: > On Mon, 1 Apr 2013, Minchan Kim wrote: > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > > spinlock for making sure that clear_huge_page write become visible > > after set set_pmd_at() write. > > > > But lru_cache_add_lru uses pagevec so it could miss spinlock > > easily so above rule was broken so user may see inconsistent data. > > > > This patch fixes it with using explict barrier rather than depending > > on lru spinlock. > > > > Is this the same issue that Andrea responded to in the "thp and memory > barrier assumptions" thread at http://marc.info/?t=134333512700004 ? Yes and Peter pointed out further step. Thanks for pointing out. Not that I know that Andrea alreay noticed it, I don't care about this patch. Remaining question is Kame's one. Isn't there anyone could answer it? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier 2013-04-02 0:37 ` Minchan Kim @ 2013-04-02 19:20 ` David Rientjes -1 siblings, 0 replies; 20+ messages in thread From: David Rientjes @ 2013-04-02 19:20 UTC (permalink / raw) To: Minchan Kim, Andrea Arcangeli Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Hugh Dickins On Tue, 2 Apr 2013, Minchan Kim wrote: > Yes and Peter pointed out further step. > Thanks for pointing out. > Not that I know that Andrea alreay noticed it, I don't care about this > patch. > Andrea, do you have time to send c08e0c9ee786 ("thp: add memory barrier to __do_huge_pmd_anonymous_page") b08d75a595ec ("thp: document barrier() in wrprotect THP fault path") from the master branch of aa.git to Andrew? I would do it, but one isn't signed-off in your tree. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier @ 2013-04-02 19:20 ` David Rientjes 0 siblings, 0 replies; 20+ messages in thread From: David Rientjes @ 2013-04-02 19:20 UTC (permalink / raw) To: Minchan Kim, Andrea Arcangeli Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Hugh Dickins On Tue, 2 Apr 2013, Minchan Kim wrote: > Yes and Peter pointed out further step. > Thanks for pointing out. > Not that I know that Andrea alreay noticed it, I don't care about this > patch. > Andrea, do you have time to send c08e0c9ee786 ("thp: add memory barrier to __do_huge_pmd_anonymous_page") b08d75a595ec ("thp: document barrier() in wrprotect THP fault path") from the master branch of aa.git to Andrew? I would do it, but one isn't signed-off in your tree. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier 2013-04-02 0:37 ` Minchan Kim @ 2013-04-02 19:30 ` Hugh Dickins -1 siblings, 0 replies; 20+ messages in thread From: Hugh Dickins @ 2013-04-02 19:30 UTC (permalink / raw) To: Minchan Kim Cc: David Rientjes, Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Kamezawa Hiroyuki On Tue, 2 Apr 2013, Minchan Kim wrote: > On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote: > > On Mon, 1 Apr 2013, Minchan Kim wrote: > > > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > > > spinlock for making sure that clear_huge_page write become visible > > > after set set_pmd_at() write. > > > > > > But lru_cache_add_lru uses pagevec so it could miss spinlock > > > easily so above rule was broken so user may see inconsistent data. > > > > > > This patch fixes it with using explict barrier rather than depending > > > on lru spinlock. > > > > > > > Is this the same issue that Andrea responded to in the "thp and memory > > barrier assumptions" thread at http://marc.info/?t=134333512700004 ? > > Yes and Peter pointed out further step. > Thanks for pointing out. > Not that I know that Andrea alreay noticed it, I don't care about this > patch. > > Remaining question is Kame's one. > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers. > > Users can see non-zero value after page fault in theory ? > Isn't there anyone could answer it? See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us static inline void __SetPageUptodate(struct page *page) { smp_wmb(); __set_bit(PG_uptodate, &(page)->flags); } So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe to me already, though the huge_memory one could do with a fixed comment. Hugh ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier @ 2013-04-02 19:30 ` Hugh Dickins 0 siblings, 0 replies; 20+ messages in thread From: Hugh Dickins @ 2013-04-02 19:30 UTC (permalink / raw) To: Minchan Kim Cc: David Rientjes, Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Kamezawa Hiroyuki On Tue, 2 Apr 2013, Minchan Kim wrote: > On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote: > > On Mon, 1 Apr 2013, Minchan Kim wrote: > > > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > > > spinlock for making sure that clear_huge_page write become visible > > > after set set_pmd_at() write. > > > > > > But lru_cache_add_lru uses pagevec so it could miss spinlock > > > easily so above rule was broken so user may see inconsistent data. > > > > > > This patch fixes it with using explict barrier rather than depending > > > on lru spinlock. > > > > > > > Is this the same issue that Andrea responded to in the "thp and memory > > barrier assumptions" thread at http://marc.info/?t=134333512700004 ? > > Yes and Peter pointed out further step. > Thanks for pointing out. > Not that I know that Andrea alreay noticed it, I don't care about this > patch. > > Remaining question is Kame's one. > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers. > > Users can see non-zero value after page fault in theory ? > Isn't there anyone could answer it? See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us static inline void __SetPageUptodate(struct page *page) { smp_wmb(); __set_bit(PG_uptodate, &(page)->flags); } So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe to me already, though the huge_memory one could do with a fixed comment. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier 2013-04-02 19:30 ` Hugh Dickins @ 2013-04-03 0:14 ` Minchan Kim -1 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2013-04-03 0:14 UTC (permalink / raw) To: Hugh Dickins Cc: David Rientjes, Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Kamezawa Hiroyuki, Peter Zijlstra On Tue, Apr 02, 2013 at 12:30:15PM -0700, Hugh Dickins wrote: > On Tue, 2 Apr 2013, Minchan Kim wrote: > > On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote: > > > On Mon, 1 Apr 2013, Minchan Kim wrote: > > > > > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > > > > spinlock for making sure that clear_huge_page write become visible > > > > after set set_pmd_at() write. > > > > > > > > But lru_cache_add_lru uses pagevec so it could miss spinlock > > > > easily so above rule was broken so user may see inconsistent data. > > > > > > > > This patch fixes it with using explict barrier rather than depending > > > > on lru spinlock. > > > > > > > > > > Is this the same issue that Andrea responded to in the "thp and memory > > > barrier assumptions" thread at http://marc.info/?t=134333512700004 ? > > > > Yes and Peter pointed out further step. > > Thanks for pointing out. > > Not that I know that Andrea alreay noticed it, I don't care about this > > patch. > > > > Remaining question is Kame's one. > > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers. > > > Users can see non-zero value after page fault in theory ? > > Isn't there anyone could answer it? > > See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us > > static inline void __SetPageUptodate(struct page *page) > { > smp_wmb(); > __set_bit(PG_uptodate, &(page)->flags); > } > > So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe > to me already, though the huge_memory one could do with a fixed comment. Thanks you very much! That's one everybody are really missing. Here it goes! ==================== 8< ===================== >From fb0b9f3df698547bfb70f81d85e0d1e00f19e1fc Mon Sep 17 00:00:00 2001 From: Minchan Kim <minchan@kernel.org> Date: Wed, 3 Apr 2013 08:53:27 +0900 Subject: [PATCH] THP: fix comment about memory barrier Now, memory barrier in __do_huge_pmd_anonymous_page doesn't work. Because lru_cache_add_lru uses pagevec so it could miss spinlock easily so above rule was broken so user might see inconsistent data. I was not first person who pointed out the problem. Mel and Peter pointed out a few months ago and Peter pointed out further that even spin_lock/unlock can't make sure it. http://marc.info/?t=134333512700004 In particular: *A = a; LOCK UNLOCK *B = b; may occur as: LOCK, STORE *B, STORE *A, UNLOCK At last, Hugh pointed out that even we don't need memory barrier in there because __SetPageUpdate already have done it from Nick's [1] explicitly. So this patch fixes comment on THP and adds same comment for do_anonymous_page, too because everybody except Hugh was missing that. It means we needs COMMENT about that. [1] 0ed361dec "mm: fix PageUptodate data race" Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Minchan Kim <minchan@kernel.org> --- mm/huge_memory.c | 11 +++++------ mm/memory.c | 5 +++++ 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e2f7f5aa..f2f17ff 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -713,6 +713,11 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, return VM_FAULT_OOM; clear_huge_page(page, haddr, HPAGE_PMD_NR); + /* + * The memory barrier inside __SetPageUptodate makes sure that + * clear_huge_page writes become visible after the set_pmd_at() + * write. + */ __SetPageUptodate(page); spin_lock(&mm->page_table_lock); @@ -724,12 +729,6 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, } else { pmd_t entry; entry = mk_huge_pmd(page, vma); - /* - * The spinlocking to take the lru_lock inside - * page_add_new_anon_rmap() acts as a full memory - * barrier to be sure clear_huge_page writes become - * visible after the set_pmd_at() write. - */ page_add_new_anon_rmap(page, vma, haddr); set_pmd_at(mm, haddr, pmd, entry); pgtable_trans_huge_deposit(mm, pgtable); diff --git a/mm/memory.c b/mm/memory.c index 494526a..d0da51e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3196,6 +3196,11 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, page = alloc_zeroed_user_highpage_movable(vma, address); if (!page) goto oom; + /* + * The memory barrier inside __SetPageUptodate makes sure that + * preceeding stores to the page contents become visible after + * the set_pte_at() write. + */ __SetPageUptodate(page); if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) -- 1.8.2 -- Kind regards, Minchan Kim ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier @ 2013-04-03 0:14 ` Minchan Kim 0 siblings, 0 replies; 20+ messages in thread From: Minchan Kim @ 2013-04-03 0:14 UTC (permalink / raw) To: Hugh Dickins Cc: David Rientjes, Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Kamezawa Hiroyuki, Peter Zijlstra On Tue, Apr 02, 2013 at 12:30:15PM -0700, Hugh Dickins wrote: > On Tue, 2 Apr 2013, Minchan Kim wrote: > > On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote: > > > On Mon, 1 Apr 2013, Minchan Kim wrote: > > > > > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > > > > spinlock for making sure that clear_huge_page write become visible > > > > after set set_pmd_at() write. > > > > > > > > But lru_cache_add_lru uses pagevec so it could miss spinlock > > > > easily so above rule was broken so user may see inconsistent data. > > > > > > > > This patch fixes it with using explict barrier rather than depending > > > > on lru spinlock. > > > > > > > > > > Is this the same issue that Andrea responded to in the "thp and memory > > > barrier assumptions" thread at http://marc.info/?t=134333512700004 ? > > > > Yes and Peter pointed out further step. > > Thanks for pointing out. > > Not that I know that Andrea alreay noticed it, I don't care about this > > patch. > > > > Remaining question is Kame's one. > > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers. > > > Users can see non-zero value after page fault in theory ? > > Isn't there anyone could answer it? > > See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us > > static inline void __SetPageUptodate(struct page *page) > { > smp_wmb(); > __set_bit(PG_uptodate, &(page)->flags); > } > > So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe > to me already, though the huge_memory one could do with a fixed comment. Thanks you very much! That's one everybody are really missing. Here it goes! ==================== 8< ===================== ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier 2013-04-03 0:14 ` Minchan Kim @ 2013-04-04 13:45 ` Andrea Arcangeli -1 siblings, 0 replies; 20+ messages in thread From: Andrea Arcangeli @ 2013-04-04 13:45 UTC (permalink / raw) To: Minchan Kim Cc: Hugh Dickins, David Rientjes, Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Kamezawa Hiroyuki, Peter Zijlstra On Wed, Apr 03, 2013 at 09:14:01AM +0900, Minchan Kim wrote: > clear_huge_page(page, haddr, HPAGE_PMD_NR); > + /* > + * The memory barrier inside __SetPageUptodate makes sure that > + * clear_huge_page writes become visible after the set_pmd_at() s/after/before/ > + * write. > + */ > __SetPageUptodate(page); > > spin_lock(&mm->page_table_lock); > @@ -724,12 +729,6 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, > } else { > pmd_t entry; > entry = mk_huge_pmd(page, vma); > - /* > - * The spinlocking to take the lru_lock inside > - * page_add_new_anon_rmap() acts as a full memory > - * barrier to be sure clear_huge_page writes become > - * visible after the set_pmd_at() write. > - */ > page_add_new_anon_rmap(page, vma, haddr); > set_pmd_at(mm, haddr, pmd, entry); > pgtable_trans_huge_deposit(mm, pgtable); > diff --git a/mm/memory.c b/mm/memory.c > index 494526a..d0da51e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3196,6 +3196,11 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, > page = alloc_zeroed_user_highpage_movable(vma, address); > if (!page) > goto oom; > + /* > + * The memory barrier inside __SetPageUptodate makes sure that > + * preceeding stores to the page contents become visible after > + * the set_pte_at() write. > + */ s/after/before/ After the above correction it looks nice cleanup, thanks! Acked-by: Andrea Arcangeli <aarcange@redhat.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier @ 2013-04-04 13:45 ` Andrea Arcangeli 0 siblings, 0 replies; 20+ messages in thread From: Andrea Arcangeli @ 2013-04-04 13:45 UTC (permalink / raw) To: Minchan Kim Cc: Hugh Dickins, David Rientjes, Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Kamezawa Hiroyuki, Peter Zijlstra On Wed, Apr 03, 2013 at 09:14:01AM +0900, Minchan Kim wrote: > clear_huge_page(page, haddr, HPAGE_PMD_NR); > + /* > + * The memory barrier inside __SetPageUptodate makes sure that > + * clear_huge_page writes become visible after the set_pmd_at() s/after/before/ > + * write. > + */ > __SetPageUptodate(page); > > spin_lock(&mm->page_table_lock); > @@ -724,12 +729,6 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, > } else { > pmd_t entry; > entry = mk_huge_pmd(page, vma); > - /* > - * The spinlocking to take the lru_lock inside > - * page_add_new_anon_rmap() acts as a full memory > - * barrier to be sure clear_huge_page writes become > - * visible after the set_pmd_at() write. > - */ > page_add_new_anon_rmap(page, vma, haddr); > set_pmd_at(mm, haddr, pmd, entry); > pgtable_trans_huge_deposit(mm, pgtable); > diff --git a/mm/memory.c b/mm/memory.c > index 494526a..d0da51e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3196,6 +3196,11 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, > page = alloc_zeroed_user_highpage_movable(vma, address); > if (!page) > goto oom; > + /* > + * The memory barrier inside __SetPageUptodate makes sure that > + * preceeding stores to the page contents become visible after > + * the set_pte_at() write. > + */ s/after/before/ After the above correction it looks nice cleanup, thanks! Acked-by: Andrea Arcangeli <aarcange@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier 2013-03-31 23:45 ` Minchan Kim @ 2013-04-04 2:45 ` Simon Jeons -1 siblings, 0 replies; 20+ messages in thread From: Simon Jeons @ 2013-04-04 2:45 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins Hi Minchan, On 04/01/2013 07:45 AM, Minchan Kim wrote: > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > spinlock for making sure that clear_huge_page write become visible > after set set_pmd_at() write. 1. There are no pte modify, why take page_table_lock here? 2. What's the meaning of "clear_huge_page write become visible after set set_pmd_at() write"? > > But lru_cache_add_lru uses pagevec so it could miss spinlock > easily so above rule was broken so user may see inconsistent data. > > This patch fixes it with using explict barrier rather than depending > on lru spinlock. > > Cc: Mel Gorman <mgorman@suse.de> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Hugh Dickins <hughd@google.com> > Signed-off-by: Minchan Kim <minchan@kernel.org> > --- > mm/huge_memory.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index bfa142e..fad800e 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, > pmd_t entry; > entry = mk_huge_pmd(page, vma); > /* > - * The spinlocking to take the lru_lock inside > - * page_add_new_anon_rmap() acts as a full memory > - * barrier to be sure clear_huge_page writes become > - * visible after the set_pmd_at() write. > + * clear_huge_page write become visible after the > + * set_pmd_at() write. > */ > + smp_wmb(); > page_add_new_anon_rmap(page, vma, haddr); > set_pmd_at(mm, haddr, pmd, entry); > pgtable_trans_huge_deposit(mm, pgtable); ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] THP: Use explicit memory barrier @ 2013-04-04 2:45 ` Simon Jeons 0 siblings, 0 replies; 20+ messages in thread From: Simon Jeons @ 2013-04-04 2:45 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Hugh Dickins Hi Minchan, On 04/01/2013 07:45 AM, Minchan Kim wrote: > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's > spinlock for making sure that clear_huge_page write become visible > after set set_pmd_at() write. 1. There are no pte modify, why take page_table_lock here? 2. What's the meaning of "clear_huge_page write become visible after set set_pmd_at() write"? > > But lru_cache_add_lru uses pagevec so it could miss spinlock > easily so above rule was broken so user may see inconsistent data. > > This patch fixes it with using explict barrier rather than depending > on lru spinlock. > > Cc: Mel Gorman <mgorman@suse.de> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Hugh Dickins <hughd@google.com> > Signed-off-by: Minchan Kim <minchan@kernel.org> > --- > mm/huge_memory.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index bfa142e..fad800e 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -725,11 +725,10 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, > pmd_t entry; > entry = mk_huge_pmd(page, vma); > /* > - * The spinlocking to take the lru_lock inside > - * page_add_new_anon_rmap() acts as a full memory > - * barrier to be sure clear_huge_page writes become > - * visible after the set_pmd_at() write. > + * clear_huge_page write become visible after the > + * set_pmd_at() write. > */ > + smp_wmb(); > page_add_new_anon_rmap(page, vma, haddr); > set_pmd_at(mm, haddr, pmd, entry); > pgtable_trans_huge_deposit(mm, pgtable); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2013-04-04 13:46 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-03-31 23:45 [PATCH] THP: Use explicit memory barrier Minchan Kim 2013-03-31 23:45 ` Minchan Kim 2013-04-01 1:01 ` Kamezawa Hiroyuki 2013-04-01 1:01 ` Kamezawa Hiroyuki 2013-04-01 4:45 ` Minchan Kim 2013-04-01 4:45 ` Minchan Kim 2013-04-01 23:35 ` David Rientjes 2013-04-01 23:35 ` David Rientjes 2013-04-02 0:37 ` Minchan Kim 2013-04-02 0:37 ` Minchan Kim 2013-04-02 19:20 ` David Rientjes 2013-04-02 19:20 ` David Rientjes 2013-04-02 19:30 ` Hugh Dickins 2013-04-02 19:30 ` Hugh Dickins 2013-04-03 0:14 ` Minchan Kim 2013-04-03 0:14 ` Minchan Kim 2013-04-04 13:45 ` Andrea Arcangeli 2013-04-04 13:45 ` Andrea Arcangeli 2013-04-04 2:45 ` Simon Jeons 2013-04-04 2:45 ` Simon Jeons
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.