All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ia64: race flushing icache in COW path
@ 2006-07-13 17:00 ` Luck, Tony, Anil Keshavamurthy
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2006-07-13 17:00 UTC (permalink / raw)
  To: torvalds, akpm; +Cc: linux-ia64, linux-kernel, linux-mm

From: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>

There is a race condition that showed up in a threaded JIT environment. The
situation is that a process with a JIT code page forks, so the page is marked
read-only, then some threads are created in the child.  One of the threads
attempts to add a new code block to the JIT page, so a copy-on-write fault is
taken, and the kernel allocates a new page, copies the data, installs the new
pte, and then calls lazy_mmu_prot_update() to flush caches to make sure that
the icache and dcache are in sync.  Unfortunately, the other thread runs right
after the new pte is installed, but before the caches have been flushed. It
tries to execute some old JIT code that was already in this page, but it sees
some garbage in the i-cache from the previous users of the new physical page.

Fix: we must make the caches consistent before installing the pte. This is
an ia64 only fix because lazy_mmu_prot_update() is a no-op on all other
architectures.

Signed-off-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

---

diff --git a/mm/memory.c b/mm/memory.c
index dc0d82c..de8bc85 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1549,9 +1549,9 @@ gotten:
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		lazy_mmu_prot_update(entry);
 		ptep_establish(vma, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
 		lru_cache_add_active(new_page);
 		page_add_new_anon_rmap(new_page, vma, address);
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] ia64: race flushing icache in COW path
@ 2006-07-13 17:00 ` Luck, Tony, Anil Keshavamurthy
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony, Anil Keshavamurthy @ 2006-07-13 17:00 UTC (permalink / raw)
  To: torvalds, akpm; +Cc: linux-ia64, linux-kernel, linux-mm

There is a race condition that showed up in a threaded JIT environment. The
situation is that a process with a JIT code page forks, so the page is marked
read-only, then some threads are created in the child.  One of the threads
attempts to add a new code block to the JIT page, so a copy-on-write fault is
taken, and the kernel allocates a new page, copies the data, installs the new
pte, and then calls lazy_mmu_prot_update() to flush caches to make sure that
the icache and dcache are in sync.  Unfortunately, the other thread runs right
after the new pte is installed, but before the caches have been flushed. It
tries to execute some old JIT code that was already in this page, but it sees
some garbage in the i-cache from the previous users of the new physical page.

Fix: we must make the caches consistent before installing the pte. This is
an ia64 only fix because lazy_mmu_prot_update() is a no-op on all other
architectures.

Signed-off-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

---

diff --git a/mm/memory.c b/mm/memory.c
index dc0d82c..de8bc85 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1549,9 +1549,9 @@ gotten:
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		lazy_mmu_prot_update(entry);
 		ptep_establish(vma, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
 		lru_cache_add_active(new_page);
 		page_add_new_anon_rmap(new_page, vma, address);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] ia64: race flushing icache in COW path
@ 2006-07-13 17:00 ` Luck, Tony, Anil Keshavamurthy
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2006-07-13 17:00 UTC (permalink / raw)
  To: torvalds, akpm; +Cc: linux-ia64, linux-kernel, linux-mm

From: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>

There is a race condition that showed up in a threaded JIT environment. The
situation is that a process with a JIT code page forks, so the page is marked
read-only, then some threads are created in the child.  One of the threads
attempts to add a new code block to the JIT page, so a copy-on-write fault is
taken, and the kernel allocates a new page, copies the data, installs the new
pte, and then calls lazy_mmu_prot_update() to flush caches to make sure that
the icache and dcache are in sync.  Unfortunately, the other thread runs right
after the new pte is installed, but before the caches have been flushed. It
tries to execute some old JIT code that was already in this page, but it sees
some garbage in the i-cache from the previous users of the new physical page.

Fix: we must make the caches consistent before installing the pte. This is
an ia64 only fix because lazy_mmu_prot_update() is a no-op on all other
architectures.

Signed-off-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

---

diff --git a/mm/memory.c b/mm/memory.c
index dc0d82c..de8bc85 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1549,9 +1549,9 @@ gotten:
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		lazy_mmu_prot_update(entry);
 		ptep_establish(vma, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
 		lru_cache_add_active(new_page);
 		page_add_new_anon_rmap(new_page, vma, address);
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] ia64: race flushing icache in COW path
  2006-07-13 17:00 ` Luck, Tony, Anil Keshavamurthy
  (?)
@ 2006-07-13 19:16   ` Jason Baron
  -1 siblings, 0 replies; 15+ messages in thread
From: Jason Baron @ 2006-07-13 19:16 UTC (permalink / raw)
  To: Luck, Tony; +Cc: torvalds, akpm, linux-ia64, linux-kernel, linux-mm


On Thu, 13 Jul 2006, Luck, Tony wrote:

> From: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
> 
> There is a race condition that showed up in a threaded JIT environment. The
> situation is that a process with a JIT code page forks, so the page is marked
> read-only, then some threads are created in the child.  One of the threads
> attempts to add a new code block to the JIT page, so a copy-on-write fault is
> taken, and the kernel allocates a new page, copies the data, installs the new
> pte, and then calls lazy_mmu_prot_update() to flush caches to make sure that
> the icache and dcache are in sync.  Unfortunately, the other thread runs right
> after the new pte is installed, but before the caches have been flushed. It
> tries to execute some old JIT code that was already in this page, but it sees
> some garbage in the i-cache from the previous users of the new physical page.
> 
> Fix: we must make the caches consistent before installing the pte. This is
> an ia64 only fix because lazy_mmu_prot_update() is a no-op on all other
> architectures.
> 
> Signed-off-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> 
> ---
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index dc0d82c..de8bc85 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1549,9 +1549,9 @@ gotten:
>  		flush_cache_page(vma, address, pte_pfn(orig_pte));
>  		entry = mk_pte(new_page, vma->vm_page_prot);
>  		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +		lazy_mmu_prot_update(entry);
>  		ptep_establish(vma, address, page_table, entry);
>  		update_mmu_cache(vma, address, entry);
> -		lazy_mmu_prot_update(entry);
>  		lru_cache_add_active(new_page);
>  		page_add_new_anon_rmap(new_page, vma, address);
>  
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


lazy_mmu_prot_update() is used in a number of other places *after* the pte 
is established. An explanation as to why this case is different, would be 
interesting.

thanks,

-Jason 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-13 19:16   ` Jason Baron
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Baron @ 2006-07-13 19:16 UTC (permalink / raw)
  To: Luck, Tony; +Cc: torvalds, akpm, linux-ia64, linux-kernel, linux-mm

On Thu, 13 Jul 2006, Luck, Tony wrote:

> From: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
> 
> There is a race condition that showed up in a threaded JIT environment. The
> situation is that a process with a JIT code page forks, so the page is marked
> read-only, then some threads are created in the child.  One of the threads
> attempts to add a new code block to the JIT page, so a copy-on-write fault is
> taken, and the kernel allocates a new page, copies the data, installs the new
> pte, and then calls lazy_mmu_prot_update() to flush caches to make sure that
> the icache and dcache are in sync.  Unfortunately, the other thread runs right
> after the new pte is installed, but before the caches have been flushed. It
> tries to execute some old JIT code that was already in this page, but it sees
> some garbage in the i-cache from the previous users of the new physical page.
> 
> Fix: we must make the caches consistent before installing the pte. This is
> an ia64 only fix because lazy_mmu_prot_update() is a no-op on all other
> architectures.
> 
> Signed-off-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> 
> ---
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index dc0d82c..de8bc85 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1549,9 +1549,9 @@ gotten:
>  		flush_cache_page(vma, address, pte_pfn(orig_pte));
>  		entry = mk_pte(new_page, vma->vm_page_prot);
>  		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +		lazy_mmu_prot_update(entry);
>  		ptep_establish(vma, address, page_table, entry);
>  		update_mmu_cache(vma, address, entry);
> -		lazy_mmu_prot_update(entry);
>  		lru_cache_add_active(new_page);
>  		page_add_new_anon_rmap(new_page, vma, address);
>  
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


lazy_mmu_prot_update() is used in a number of other places *after* the pte 
is established. An explanation as to why this case is different, would be 
interesting.

thanks,

-Jason 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-13 19:16   ` Jason Baron
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Baron @ 2006-07-13 19:16 UTC (permalink / raw)
  To: Luck, Tony; +Cc: torvalds, akpm, linux-ia64, linux-kernel, linux-mm


On Thu, 13 Jul 2006, Luck, Tony wrote:

> From: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
> 
> There is a race condition that showed up in a threaded JIT environment. The
> situation is that a process with a JIT code page forks, so the page is marked
> read-only, then some threads are created in the child.  One of the threads
> attempts to add a new code block to the JIT page, so a copy-on-write fault is
> taken, and the kernel allocates a new page, copies the data, installs the new
> pte, and then calls lazy_mmu_prot_update() to flush caches to make sure that
> the icache and dcache are in sync.  Unfortunately, the other thread runs right
> after the new pte is installed, but before the caches have been flushed. It
> tries to execute some old JIT code that was already in this page, but it sees
> some garbage in the i-cache from the previous users of the new physical page.
> 
> Fix: we must make the caches consistent before installing the pte. This is
> an ia64 only fix because lazy_mmu_prot_update() is a no-op on all other
> architectures.
> 
> Signed-off-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> 
> ---
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index dc0d82c..de8bc85 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1549,9 +1549,9 @@ gotten:
>  		flush_cache_page(vma, address, pte_pfn(orig_pte));
>  		entry = mk_pte(new_page, vma->vm_page_prot);
>  		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +		lazy_mmu_prot_update(entry);
>  		ptep_establish(vma, address, page_table, entry);
>  		update_mmu_cache(vma, address, entry);
> -		lazy_mmu_prot_update(entry);
>  		lru_cache_add_active(new_page);
>  		page_add_new_anon_rmap(new_page, vma, address);
>  
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


lazy_mmu_prot_update() is used in a number of other places *after* the pte 
is established. An explanation as to why this case is different, would be 
interesting.

thanks,

-Jason 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-13 20:37 ` Luck, Tony
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2006-07-13 20:37 UTC (permalink / raw)
  To: Jason Baron; +Cc: torvalds, akpm, linux-ia64, linux-kernel, linux-mm

> lazy_mmu_prot_update() is used in a number of other places *after* the pte 
> is established. An explanation as to why this case is different, would be 
> interesting.

The other places do need a close look, it seems that some of
them may not be needed (e.g. the one inside "if (reuse) { }" at
the top of do_wp_page() ... at the moment I'm struggling to see
what it manages to achieve).

Most of the rest are in cases where we are adding a new virtual
page (comments like "No need to invalidate - it was non-present
before").  These may also need to have the order shuffled, but
they seem unlikely to be the cause of a bug (it is unlikely
that an application has threads that branch to new anonymous
pages as they are being attached to the process).

So you are right that there may be some more work here, but
I wanted to get the one-liner that is a clear and obvious
bugfix posted without being cluttered with some less obvious
fixes.

-Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-13 20:37 ` Luck, Tony
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2006-07-13 20:37 UTC (permalink / raw)
  To: Jason Baron; +Cc: torvalds, akpm, linux-ia64, linux-kernel, linux-mm

> lazy_mmu_prot_update() is used in a number of other places *after* the pte 
> is established. An explanation as to why this case is different, would be 
> interesting.

The other places do need a close look, it seems that some of
them may not be needed (e.g. the one inside "if (reuse) { }" at
the top of do_wp_page() ... at the moment I'm struggling to see
what it manages to achieve).

Most of the rest are in cases where we are adding a new virtual
page (comments like "No need to invalidate - it was non-present
before").  These may also need to have the order shuffled, but
they seem unlikely to be the cause of a bug (it is unlikely
that an application has threads that branch to new anonymous
pages as they are being attached to the process).

So you are right that there may be some more work here, but
I wanted to get the one-liner that is a clear and obvious
bugfix posted without being cluttered with some less obvious
fixes.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-13 20:37 ` Luck, Tony
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2006-07-13 20:37 UTC (permalink / raw)
  To: Jason Baron; +Cc: torvalds, akpm, linux-ia64, linux-kernel, linux-mm

> lazy_mmu_prot_update() is used in a number of other places *after* the pte 
> is established. An explanation as to why this case is different, would be 
> interesting.

The other places do need a close look, it seems that some of
them may not be needed (e.g. the one inside "if (reuse) { }" at
the top of do_wp_page() ... at the moment I'm struggling to see
what it manages to achieve).

Most of the rest are in cases where we are adding a new virtual
page (comments like "No need to invalidate - it was non-present
before").  These may also need to have the order shuffled, but
they seem unlikely to be the cause of a bug (it is unlikely
that an application has threads that branch to new anonymous
pages as they are being attached to the process).

So you are right that there may be some more work here, but
I wanted to get the one-liner that is a clear and obvious
bugfix posted without being cluttered with some less obvious
fixes.

-Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] ia64: race flushing icache in COW path
  2006-07-13 20:37 ` Luck, Tony
  (?)
@ 2006-07-14  3:11   ` Peter Zijlstra
  -1 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2006-07-14  3:11 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Jason Baron, torvalds, akpm, linux-ia64, linux-kernel, linux-mm

On Thu, 2006-07-13 at 13:37 -0700, Luck, Tony wrote:
> > lazy_mmu_prot_update() is used in a number of other places *after* the pte 
> > is established. An explanation as to why this case is different, would be 
> > interesting.
> 
> The other places do need a close look, it seems that some of
> them may not be needed (e.g. the one inside "if (reuse) { }" at
> the top of do_wp_page() ... at the moment I'm struggling to see
> what it manages to achieve).
> 
> Most of the rest are in cases where we are adding a new virtual
> page (comments like "No need to invalidate - it was non-present
> before").  These may also need to have the order shuffled, but
> they seem unlikely to be the cause of a bug (it is unlikely
> that an application has threads that branch to new anonymous
> pages as they are being attached to the process).
> 
> So you are right that there may be some more work here, but
> I wanted to get the one-liner that is a clear and obvious
> bugfix posted without being cluttered with some less obvious
> fixes.

How about something like this.
---

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Since calling of lazy_mmu_prot_update() while the PTE is already set
could open up a race window wrt. icache coherency, make sure all
invocations are done before setting the PTE.

These call-sites are currently not problematic, the one that was has 
been addressed by a previous patch. This patch just tidies up the
call semantics for lazy_mmu_prot_update().

Signed-Off-By: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 mm/hugetlb.c  |    4 ++--
 mm/memory.c   |   10 +++++-----
 mm/mprotect.c |    2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c	2006-07-14 04:54:48.000000000 +0200
+++ linux-2.6/mm/hugetlb.c	2006-07-14 05:04:58.000000000 +0200
@@ -313,9 +313,9 @@ static void set_huge_ptep_writable(struc
 	pte_t entry;
 
 	entry = pte_mkwrite(pte_mkdirty(*ptep));
+	lazy_mmu_prot_update(entry);
 	ptep_set_access_flags(vma, address, ptep, entry, 1);
 	update_mmu_cache(vma, address, entry);
-	lazy_mmu_prot_update(entry);
 }
 
 
@@ -634,8 +634,8 @@ void hugetlb_change_protection(struct vm
 		if (!pte_none(*ptep)) {
 			pte = huge_ptep_get_and_clear(mm, address, ptep);
 			pte = pte_mkhuge(pte_modify(pte, newprot));
-			set_huge_pte_at(mm, address, ptep, pte);
 			lazy_mmu_prot_update(pte);
+			set_huge_pte_at(mm, address, ptep, pte);
 		}
 	}
 	spin_unlock(&mm->page_table_lock);
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c	2006-07-14 05:00:06.000000000 +0200
+++ linux-2.6/mm/memory.c	2006-07-14 05:02:07.000000000 +0200
@@ -1506,9 +1506,9 @@ static int do_wp_page(struct mm_struct *
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = pte_mkyoung(orig_pte);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		lazy_mmu_prot_update(entry);
 		ptep_set_access_flags(vma, address, page_table, entry, 1);
 		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
 		ret |= VM_FAULT_WRITE;
 		goto unlock;
 	}
@@ -1980,6 +1980,7 @@ static int do_swap_page(struct mm_struct
 	}
 
 	flush_icache_page(vma, page);
+	lazy_mmu_prot_update(pte);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
 
@@ -1997,7 +1998,6 @@ static int do_swap_page(struct mm_struct
 
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, address, pte);
-	lazy_mmu_prot_update(pte);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 out:
@@ -2055,11 +2055,11 @@ static int do_anonymous_page(struct mm_s
 		page_add_file_rmap(page);
 	}
 
+	lazy_mmu_prot_update(entry);
 	set_pte_at(mm, address, page_table, entry);
 
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, address, entry);
-	lazy_mmu_prot_update(entry);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	return VM_FAULT_MINOR;
@@ -2180,6 +2180,7 @@ retry:
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		if (write_access)
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		lazy_mmu_prot_update(entry);
 		set_pte_at(mm, address, page_table, entry);
 		if (anon) {
 			inc_mm_counter(mm, anon_rss);
@@ -2197,7 +2198,6 @@ retry:
 
 	/* no need to invalidate: a not-present page shouldn't be cached */
 	update_mmu_cache(vma, address, entry);
-	lazy_mmu_prot_update(entry);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	return ret;
@@ -2293,9 +2293,9 @@ static inline int handle_pte_fault(struc
 	}
 	entry = pte_mkyoung(entry);
 	if (!pte_same(old_entry, entry)) {
+		lazy_mmu_prot_update(entry);
 		ptep_set_access_flags(vma, address, pte, entry, write_access);
 		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
 	} else {
 		/*
 		 * This is needed only for protection faults but the arch code
Index: linux-2.6/mm/mprotect.c
===================================================================
--- linux-2.6.orig/mm/mprotect.c	2006-07-14 04:54:48.000000000 +0200
+++ linux-2.6/mm/mprotect.c	2006-07-14 05:03:37.000000000 +0200
@@ -43,8 +43,8 @@ static void change_pte_range(struct mm_s
 			 * into place.
 			 */
 			ptent = pte_modify(ptep_get_and_clear(mm, addr, pte), newprot);
-			set_pte_at(mm, addr, pte, ptent);
 			lazy_mmu_prot_update(ptent);
+			set_pte_at(mm, addr, pte, ptent);
 #ifdef CONFIG_MIGRATION
 		} else if (!pte_file(oldpte)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-14  3:11   ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2006-07-14  3:11 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Jason Baron, torvalds, akpm, linux-ia64, linux-kernel, linux-mm

On Thu, 2006-07-13 at 13:37 -0700, Luck, Tony wrote:
> > lazy_mmu_prot_update() is used in a number of other places *after* the pte 
> > is established. An explanation as to why this case is different, would be 
> > interesting.
> 
> The other places do need a close look, it seems that some of
> them may not be needed (e.g. the one inside "if (reuse) { }" at
> the top of do_wp_page() ... at the moment I'm struggling to see
> what it manages to achieve).
> 
> Most of the rest are in cases where we are adding a new virtual
> page (comments like "No need to invalidate - it was non-present
> before").  These may also need to have the order shuffled, but
> they seem unlikely to be the cause of a bug (it is unlikely
> that an application has threads that branch to new anonymous
> pages as they are being attached to the process).
> 
> So you are right that there may be some more work here, but
> I wanted to get the one-liner that is a clear and obvious
> bugfix posted without being cluttered with some less obvious
> fixes.

How about something like this.
---

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Since calling of lazy_mmu_prot_update() while the PTE is already set
could open up a race window wrt. icache coherency, make sure all
invocations are done before setting the PTE.

These call-sites are currently not problematic, the one that was has 
been addressed by a previous patch. This patch just tidies up the
call semantics for lazy_mmu_prot_update().

Signed-Off-By: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 mm/hugetlb.c  |    4 ++--
 mm/memory.c   |   10 +++++-----
 mm/mprotect.c |    2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c	2006-07-14 04:54:48.000000000 +0200
+++ linux-2.6/mm/hugetlb.c	2006-07-14 05:04:58.000000000 +0200
@@ -313,9 +313,9 @@ static void set_huge_ptep_writable(struc
 	pte_t entry;
 
 	entry = pte_mkwrite(pte_mkdirty(*ptep));
+	lazy_mmu_prot_update(entry);
 	ptep_set_access_flags(vma, address, ptep, entry, 1);
 	update_mmu_cache(vma, address, entry);
-	lazy_mmu_prot_update(entry);
 }
 
 
@@ -634,8 +634,8 @@ void hugetlb_change_protection(struct vm
 		if (!pte_none(*ptep)) {
 			pte = huge_ptep_get_and_clear(mm, address, ptep);
 			pte = pte_mkhuge(pte_modify(pte, newprot));
-			set_huge_pte_at(mm, address, ptep, pte);
 			lazy_mmu_prot_update(pte);
+			set_huge_pte_at(mm, address, ptep, pte);
 		}
 	}
 	spin_unlock(&mm->page_table_lock);
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c	2006-07-14 05:00:06.000000000 +0200
+++ linux-2.6/mm/memory.c	2006-07-14 05:02:07.000000000 +0200
@@ -1506,9 +1506,9 @@ static int do_wp_page(struct mm_struct *
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = pte_mkyoung(orig_pte);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		lazy_mmu_prot_update(entry);
 		ptep_set_access_flags(vma, address, page_table, entry, 1);
 		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
 		ret |= VM_FAULT_WRITE;
 		goto unlock;
 	}
@@ -1980,6 +1980,7 @@ static int do_swap_page(struct mm_struct
 	}
 
 	flush_icache_page(vma, page);
+	lazy_mmu_prot_update(pte);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
 
@@ -1997,7 +1998,6 @@ static int do_swap_page(struct mm_struct
 
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, address, pte);
-	lazy_mmu_prot_update(pte);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 out:
@@ -2055,11 +2055,11 @@ static int do_anonymous_page(struct mm_s
 		page_add_file_rmap(page);
 	}
 
+	lazy_mmu_prot_update(entry);
 	set_pte_at(mm, address, page_table, entry);
 
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, address, entry);
-	lazy_mmu_prot_update(entry);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	return VM_FAULT_MINOR;
@@ -2180,6 +2180,7 @@ retry:
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		if (write_access)
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		lazy_mmu_prot_update(entry);
 		set_pte_at(mm, address, page_table, entry);
 		if (anon) {
 			inc_mm_counter(mm, anon_rss);
@@ -2197,7 +2198,6 @@ retry:
 
 	/* no need to invalidate: a not-present page shouldn't be cached */
 	update_mmu_cache(vma, address, entry);
-	lazy_mmu_prot_update(entry);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	return ret;
@@ -2293,9 +2293,9 @@ static inline int handle_pte_fault(struc
 	}
 	entry = pte_mkyoung(entry);
 	if (!pte_same(old_entry, entry)) {
+		lazy_mmu_prot_update(entry);
 		ptep_set_access_flags(vma, address, pte, entry, write_access);
 		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
 	} else {
 		/*
 		 * This is needed only for protection faults but the arch code
Index: linux-2.6/mm/mprotect.c
===================================================================
--- linux-2.6.orig/mm/mprotect.c	2006-07-14 04:54:48.000000000 +0200
+++ linux-2.6/mm/mprotect.c	2006-07-14 05:03:37.000000000 +0200
@@ -43,8 +43,8 @@ static void change_pte_range(struct mm_s
 			 * into place.
 			 */
 			ptent = pte_modify(ptep_get_and_clear(mm, addr, pte), newprot);
-			set_pte_at(mm, addr, pte, ptent);
 			lazy_mmu_prot_update(ptent);
+			set_pte_at(mm, addr, pte, ptent);
 #ifdef CONFIG_MIGRATION
 		} else if (!pte_file(oldpte)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-14  3:11   ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2006-07-14  3:11 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Jason Baron, torvalds, akpm, linux-ia64, linux-kernel, linux-mm

On Thu, 2006-07-13 at 13:37 -0700, Luck, Tony wrote:
> > lazy_mmu_prot_update() is used in a number of other places *after* the pte 
> > is established. An explanation as to why this case is different, would be 
> > interesting.
> 
> The other places do need a close look, it seems that some of
> them may not be needed (e.g. the one inside "if (reuse) { }" at
> the top of do_wp_page() ... at the moment I'm struggling to see
> what it manages to achieve).
> 
> Most of the rest are in cases where we are adding a new virtual
> page (comments like "No need to invalidate - it was non-present
> before").  These may also need to have the order shuffled, but
> they seem unlikely to be the cause of a bug (it is unlikely
> that an application has threads that branch to new anonymous
> pages as they are being attached to the process).
> 
> So you are right that there may be some more work here, but
> I wanted to get the one-liner that is a clear and obvious
> bugfix posted without being cluttered with some less obvious
> fixes.

How about something like this.
---

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Since calling of lazy_mmu_prot_update() while the PTE is already set
could open up a race window wrt. icache coherency, make sure all
invocations are done before setting the PTE.

These call-sites are currently not problematic, the one that was has 
been addressed by a previous patch. This patch just tidies up the
call semantics for lazy_mmu_prot_update().

Signed-Off-By: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 mm/hugetlb.c  |    4 ++--
 mm/memory.c   |   10 +++++-----
 mm/mprotect.c |    2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

Index: linux-2.6/mm/hugetlb.c
=================================--- linux-2.6.orig/mm/hugetlb.c	2006-07-14 04:54:48.000000000 +0200
+++ linux-2.6/mm/hugetlb.c	2006-07-14 05:04:58.000000000 +0200
@@ -313,9 +313,9 @@ static void set_huge_ptep_writable(struc
 	pte_t entry;
 
 	entry = pte_mkwrite(pte_mkdirty(*ptep));
+	lazy_mmu_prot_update(entry);
 	ptep_set_access_flags(vma, address, ptep, entry, 1);
 	update_mmu_cache(vma, address, entry);
-	lazy_mmu_prot_update(entry);
 }
 
 
@@ -634,8 +634,8 @@ void hugetlb_change_protection(struct vm
 		if (!pte_none(*ptep)) {
 			pte = huge_ptep_get_and_clear(mm, address, ptep);
 			pte = pte_mkhuge(pte_modify(pte, newprot));
-			set_huge_pte_at(mm, address, ptep, pte);
 			lazy_mmu_prot_update(pte);
+			set_huge_pte_at(mm, address, ptep, pte);
 		}
 	}
 	spin_unlock(&mm->page_table_lock);
Index: linux-2.6/mm/memory.c
=================================--- linux-2.6.orig/mm/memory.c	2006-07-14 05:00:06.000000000 +0200
+++ linux-2.6/mm/memory.c	2006-07-14 05:02:07.000000000 +0200
@@ -1506,9 +1506,9 @@ static int do_wp_page(struct mm_struct *
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = pte_mkyoung(orig_pte);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		lazy_mmu_prot_update(entry);
 		ptep_set_access_flags(vma, address, page_table, entry, 1);
 		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
 		ret |= VM_FAULT_WRITE;
 		goto unlock;
 	}
@@ -1980,6 +1980,7 @@ static int do_swap_page(struct mm_struct
 	}
 
 	flush_icache_page(vma, page);
+	lazy_mmu_prot_update(pte);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
 
@@ -1997,7 +1998,6 @@ static int do_swap_page(struct mm_struct
 
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, address, pte);
-	lazy_mmu_prot_update(pte);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 out:
@@ -2055,11 +2055,11 @@ static int do_anonymous_page(struct mm_s
 		page_add_file_rmap(page);
 	}
 
+	lazy_mmu_prot_update(entry);
 	set_pte_at(mm, address, page_table, entry);
 
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, address, entry);
-	lazy_mmu_prot_update(entry);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	return VM_FAULT_MINOR;
@@ -2180,6 +2180,7 @@ retry:
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		if (write_access)
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		lazy_mmu_prot_update(entry);
 		set_pte_at(mm, address, page_table, entry);
 		if (anon) {
 			inc_mm_counter(mm, anon_rss);
@@ -2197,7 +2198,6 @@ retry:
 
 	/* no need to invalidate: a not-present page shouldn't be cached */
 	update_mmu_cache(vma, address, entry);
-	lazy_mmu_prot_update(entry);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	return ret;
@@ -2293,9 +2293,9 @@ static inline int handle_pte_fault(struc
 	}
 	entry = pte_mkyoung(entry);
 	if (!pte_same(old_entry, entry)) {
+		lazy_mmu_prot_update(entry);
 		ptep_set_access_flags(vma, address, pte, entry, write_access);
 		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
 	} else {
 		/*
 		 * This is needed only for protection faults but the arch code
Index: linux-2.6/mm/mprotect.c
=================================--- linux-2.6.orig/mm/mprotect.c	2006-07-14 04:54:48.000000000 +0200
+++ linux-2.6/mm/mprotect.c	2006-07-14 05:03:37.000000000 +0200
@@ -43,8 +43,8 @@ static void change_pte_range(struct mm_s
 			 * into place.
 			 */
 			ptent = pte_modify(ptep_get_and_clear(mm, addr, pte), newprot);
-			set_pte_at(mm, addr, pte, ptent);
 			lazy_mmu_prot_update(ptent);
+			set_pte_at(mm, addr, pte, ptent);
 #ifdef CONFIG_MIGRATION
 		} else if (!pte_file(oldpte)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-14 17:11 ` Luck, Tony
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2006-07-14 17:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, torvalds, akpm, linux-ia64, linux-kernel, linux-mm

@@ -1980,6 +1980,7 @@ static int do_swap_page(struct mm_struct
 	}
 
 	flush_icache_page(vma, page);
+	lazy_mmu_prot_update(pte);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
 

At first sight, this looks redundant ... but then I saw that
on ia64 "flush_icache_page()" is actually a no-op.  Perhaps
we can enter this in the next obfuscated C competition.

-Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-14 17:11 ` Luck, Tony
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2006-07-14 17:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, torvalds, akpm, linux-ia64, linux-kernel, linux-mm

@@ -1980,6 +1980,7 @@ static int do_swap_page(struct mm_struct
 	}
 
 	flush_icache_page(vma, page);
+	lazy_mmu_prot_update(pte);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
 

At first sight, this looks redundant ... but then I saw that
on ia64 "flush_icache_page()" is actually a no-op.  Perhaps
we can enter this in the next obfuscated C competition.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] ia64: race flushing icache in COW path
@ 2006-07-14 17:11 ` Luck, Tony
  0 siblings, 0 replies; 15+ messages in thread
From: Luck, Tony @ 2006-07-14 17:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, torvalds, akpm, linux-ia64, linux-kernel, linux-mm

@@ -1980,6 +1980,7 @@ static int do_swap_page(struct mm_struct
 	}
 
 	flush_icache_page(vma, page);
+	lazy_mmu_prot_update(pte);
 	set_pte_at(mm, address, page_table, pte);
 	page_add_anon_rmap(page, vma, address);
 

At first sight, this looks redundant ... but then I saw that
on ia64 "flush_icache_page()" is actually a no-op.  Perhaps
we can enter this in the next obfuscated C competition.

-Tony

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-07-14 17:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-14 17:11 [PATCH] ia64: race flushing icache in COW path Luck, Tony
2006-07-14 17:11 ` Luck, Tony
2006-07-14 17:11 ` Luck, Tony
  -- strict thread matches above, loose matches on Subject: below --
2006-07-13 20:37 Luck, Tony
2006-07-13 20:37 ` Luck, Tony
2006-07-13 20:37 ` Luck, Tony
2006-07-14  3:11 ` Peter Zijlstra
2006-07-14  3:11   ` Peter Zijlstra
2006-07-14  3:11   ` Peter Zijlstra
2006-07-13 17:00 Luck, Tony
2006-07-13 17:00 ` Luck, Tony
2006-07-13 17:00 ` Luck, Tony, Anil Keshavamurthy
2006-07-13 19:16 ` Jason Baron
2006-07-13 19:16   ` Jason Baron
2006-07-13 19:16   ` Jason Baron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.