All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mel Gorman <mgorman@suse.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH] THP: Use explicit memory barrier
Date: Wed, 3 Apr 2013 09:14:01 +0900	[thread overview]
Message-ID: <20130403001401.GC16026@blaptop> (raw)
In-Reply-To: <alpine.LNX.2.00.1304021221240.5808@eggly.anvils>

On Tue, Apr 02, 2013 at 12:30:15PM -0700, Hugh Dickins wrote:
> On Tue, 2 Apr 2013, Minchan Kim wrote:
> > On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> > > On Mon, 1 Apr 2013, Minchan Kim wrote:
> > > 
> > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > > > spinlock for making sure that clear_huge_page write become visible
> > > > after set set_pmd_at() write.
> > > > 
> > > > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > > > easily so above rule was broken so user may see inconsistent data.
> > > > 
> > > > This patch fixes it with using explict barrier rather than depending
> > > > on lru spinlock.
> > > > 
> > > 
> > > Is this the same issue that Andrea responded to in the "thp and memory 
> > > barrier assumptions" thread at http://marc.info/?t=134333512700004 ?
> > 
> > Yes and Peter pointed out further step.
> > Thanks for pointing out.
> > Not that I know that Andrea alreay noticed it, I don't care about this
> > patch.
> > 
> > Remaining question is Kame's one.
> > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> > > Users can see non-zero value after page fault in theory ?
> > Isn't there anyone could answer it?
> 
> See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us
> 
> static inline void __SetPageUptodate(struct page *page)
> {
> 	smp_wmb();
> 	__set_bit(PG_uptodate, &(page)->flags);
> }
> 
> So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe
> to me already, though the huge_memory one could do with a fixed comment.

Thanks you very much!
That's one everybody are really missing.

Here it goes!

==================== 8< =====================

>From fb0b9f3df698547bfb70f81d85e0d1e00f19e1fc Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Wed, 3 Apr 2013 08:53:27 +0900
Subject: [PATCH] THP: fix comment about memory barrier

Now, memory barrier in __do_huge_pmd_anonymous_page doesn't work.
Because lru_cache_add_lru uses pagevec so it could miss spinlock
easily so above rule was broken so user might see inconsistent data.

I was not first person who pointed out the problem. Mel and Peter
pointed out a few months ago and Peter pointed out further that
even spin_lock/unlock can't make sure it.
http://marc.info/?t=134333512700004

	In particular:

        	*A = a;
        	LOCK
        	UNLOCK
        	*B = b;

	may occur as:

        	LOCK, STORE *B, STORE *A, UNLOCK

At last, Hugh pointed out that even we don't need memory barrier
in there because __SetPageUpdate already have done it from
Nick's [1] explicitly.

So this patch fixes comment on THP and adds same comment for
do_anonymous_page, too because everybody except Hugh was missing
that. It means we needs COMMENT about that.

[1] 0ed361dec "mm: fix PageUptodate data race"

Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/huge_memory.c | 11 +++++------
 mm/memory.c      |  5 +++++
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e2f7f5aa..f2f17ff 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -713,6 +713,11 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 		return VM_FAULT_OOM;
 
 	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	/*
+	 * The memory barrier inside __SetPageUptodate makes sure that
+	 * clear_huge_page writes become visible after the set_pmd_at()
+	 * write.
+	 */
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
@@ -724,12 +729,6 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	} else {
 		pmd_t entry;
 		entry = mk_huge_pmd(page, vma);
-		/*
-		 * The spinlocking to take the lru_lock inside
-		 * page_add_new_anon_rmap() acts as a full memory
-		 * barrier to be sure clear_huge_page writes become
-		 * visible after the set_pmd_at() write.
-		 */
 		page_add_new_anon_rmap(page, vma, haddr);
 		set_pmd_at(mm, haddr, pmd, entry);
 		pgtable_trans_huge_deposit(mm, pgtable);
diff --git a/mm/memory.c b/mm/memory.c
index 494526a..d0da51e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3196,6 +3196,11 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	page = alloc_zeroed_user_highpage_movable(vma, address);
 	if (!page)
 		goto oom;
+	/*
+	 * The memory barrier inside __SetPageUptodate makes sure that
+	 * preceeding stores to the page contents become visible after
+	 * the set_pte_at() write.
+	 */
 	__SetPageUptodate(page);
 
 	if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))
-- 
1.8.2

-- 
Kind regards,
Minchan Kim

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mel Gorman <mgorman@suse.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH] THP: Use explicit memory barrier
Date: Wed, 3 Apr 2013 09:14:01 +0900	[thread overview]
Message-ID: <20130403001401.GC16026@blaptop> (raw)
In-Reply-To: <alpine.LNX.2.00.1304021221240.5808@eggly.anvils>

On Tue, Apr 02, 2013 at 12:30:15PM -0700, Hugh Dickins wrote:
> On Tue, 2 Apr 2013, Minchan Kim wrote:
> > On Mon, Apr 01, 2013 at 04:35:38PM -0700, David Rientjes wrote:
> > > On Mon, 1 Apr 2013, Minchan Kim wrote:
> > > 
> > > > __do_huge_pmd_anonymous_page depends on page_add_new_anon_rmap's
> > > > spinlock for making sure that clear_huge_page write become visible
> > > > after set set_pmd_at() write.
> > > > 
> > > > But lru_cache_add_lru uses pagevec so it could miss spinlock
> > > > easily so above rule was broken so user may see inconsistent data.
> > > > 
> > > > This patch fixes it with using explict barrier rather than depending
> > > > on lru spinlock.
> > > > 
> > > 
> > > Is this the same issue that Andrea responded to in the "thp and memory 
> > > barrier assumptions" thread at http://marc.info/?t=134333512700004 ?
> > 
> > Yes and Peter pointed out further step.
> > Thanks for pointing out.
> > Not that I know that Andrea alreay noticed it, I don't care about this
> > patch.
> > 
> > Remaining question is Kame's one.
> > > Hmm...how about do_anonymous_page() ? there are no comments/locks/barriers.
> > > Users can see non-zero value after page fault in theory ?
> > Isn't there anyone could answer it?
> 
> See Nick's 2008 0ed361dec "mm: fix PageUptodate data race", which gave us
> 
> static inline void __SetPageUptodate(struct page *page)
> {
> 	smp_wmb();
> 	__set_bit(PG_uptodate, &(page)->flags);
> }
> 
> So both do_anonymous_page() and __do_huge_pmd_anonymous_page() look safe
> to me already, though the huge_memory one could do with a fixed comment.

Thanks you very much!
That's one everybody are really missing.

Here it goes!

==================== 8< =====================

  reply	other threads:[~2013-04-03  0:14 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-31 23:45 [PATCH] THP: Use explicit memory barrier Minchan Kim
2013-03-31 23:45 ` Minchan Kim
2013-04-01  1:01 ` Kamezawa Hiroyuki
2013-04-01  1:01   ` Kamezawa Hiroyuki
2013-04-01  4:45   ` Minchan Kim
2013-04-01  4:45     ` Minchan Kim
2013-04-01 23:35 ` David Rientjes
2013-04-01 23:35   ` David Rientjes
2013-04-02  0:37   ` Minchan Kim
2013-04-02  0:37     ` Minchan Kim
2013-04-02 19:20     ` David Rientjes
2013-04-02 19:20       ` David Rientjes
2013-04-02 19:30     ` Hugh Dickins
2013-04-02 19:30       ` Hugh Dickins
2013-04-03  0:14       ` Minchan Kim [this message]
2013-04-03  0:14         ` Minchan Kim
2013-04-04 13:45         ` Andrea Arcangeli
2013-04-04 13:45           ` Andrea Arcangeli
2013-04-04  2:45 ` Simon Jeons
2013-04-04  2:45   ` Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130403001401.GC16026@blaptop \
    --to=minchan@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.