All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] MIPS cache & highmem fixes
@ 2016-03-01  2:37 ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips
  Cc: Paul Burton, Lars Persson, Steven J. Hill, Huacai Chen,
	David Daney, Aneesh Kumar K.V, linux-kernel, Jerome Marchand,
	Ralf Baechle, Kirill A. Shutemov, Andrew Morton

This series fixes up a few issues with our current cache maintenance
code, some specific to highmem & some not. It fixes an issue with icache
corruption seen on the pistachio SoC & Ci40, and icache corruption seen
using highmem on MIPS Malta boards with a P5600 CPU.

Applies atop v4.5-rc6. It would be great to squeeze these in for v4.5,
but I recognise it's quite late in the cycle & this brokenness has been
around for a while so won't object to v4.6.

Thanks,
    Paul

Paul Burton (4):
  MIPS: Flush dcache for flush_kernel_dcache_page
  MIPS: Flush highmem pages in __flush_dcache_page
  MIPS: Handle highmem pages in __update_cache
  MIPS: Sync icache & dcache in set_pte_at

 arch/mips/include/asm/cacheflush.h |  7 +------
 arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++-----
 arch/mips/mm/cache.c               | 41 +++++++++++++++++++-------------------
 3 files changed, 43 insertions(+), 31 deletions(-)

-- 
2.7.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 0/4] MIPS cache & highmem fixes
@ 2016-03-01  2:37 ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips
  Cc: Paul Burton, Lars Persson, Steven J. Hill, Huacai Chen,
	David Daney, Aneesh Kumar K.V, linux-kernel, Jerome Marchand,
	Ralf Baechle, Kirill A. Shutemov, Andrew Morton

This series fixes up a few issues with our current cache maintenance
code, some specific to highmem & some not. It fixes an issue with icache
corruption seen on the pistachio SoC & Ci40, and icache corruption seen
using highmem on MIPS Malta boards with a P5600 CPU.

Applies atop v4.5-rc6. It would be great to squeeze these in for v4.5,
but I recognise it's quite late in the cycle & this brokenness has been
around for a while so won't object to v4.6.

Thanks,
    Paul

Paul Burton (4):
  MIPS: Flush dcache for flush_kernel_dcache_page
  MIPS: Flush highmem pages in __flush_dcache_page
  MIPS: Handle highmem pages in __update_cache
  MIPS: Sync icache & dcache in set_pte_at

 arch/mips/include/asm/cacheflush.h |  7 +------
 arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++-----
 arch/mips/mm/cache.c               | 41 +++++++++++++++++++-------------------
 3 files changed, 43 insertions(+), 31 deletions(-)

-- 
2.7.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/4] MIPS: Flush dcache for flush_kernel_dcache_page
@ 2016-03-01  2:37   ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle, Lars Persson, linux-kernel

The flush_kernel_dcache_page function was previously essentially a nop.
This is incorrect for MIPS, where if a page has been modified & either
it aliases or it's executable & the icache doesn't fill from dcache then
the content needs to be written back from dcache to the next level of
the cache hierarchy (which is shared with the icache).

Implement this by simply calling flush_dcache_page, treating this
kmapped cache flush function (flush_kernel_dcache_page) exactly the same
as its non-kmapped counterpart (flush_dcache_page).

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
---

 arch/mips/include/asm/cacheflush.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index 723229f..7e9f468 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -132,6 +132,7 @@ static inline void kunmap_noncoherent(void)
 static inline void flush_kernel_dcache_page(struct page *page)
 {
 	BUG_ON(cpu_has_dc_aliases && PageHighMem(page));
+	flush_dcache_page(page);
 }
 
 /*
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 1/4] MIPS: Flush dcache for flush_kernel_dcache_page
@ 2016-03-01  2:37   ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle, Lars Persson, linux-kernel

The flush_kernel_dcache_page function was previously essentially a nop.
This is incorrect for MIPS, where if a page has been modified & either
it aliases or it's executable & the icache doesn't fill from dcache then
the content needs to be written back from dcache to the next level of
the cache hierarchy (which is shared with the icache).

Implement this by simply calling flush_dcache_page, treating this
kmapped cache flush function (flush_kernel_dcache_page) exactly the same
as its non-kmapped counterpart (flush_dcache_page).

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
---

 arch/mips/include/asm/cacheflush.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index 723229f..7e9f468 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -132,6 +132,7 @@ static inline void kunmap_noncoherent(void)
 static inline void flush_kernel_dcache_page(struct page *page)
 {
 	BUG_ON(cpu_has_dc_aliases && PageHighMem(page));
+	flush_dcache_page(page);
 }
 
 /*
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/4] MIPS: Flush highmem pages in __flush_dcache_page
@ 2016-03-01  2:37   ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips
  Cc: Paul Burton, Lars Persson, linux-kernel, Andrew Morton,
	Kirill A. Shutemov, Ralf Baechle

When flush_dcache_page is called on an executable page, that page is
about to be provided to userland & we can presume that the icache
contains no valid entries for its address range. However if the icache
does not fill from the dcache then we cannot presume that the pages
content has been written back as far as the memories that the dcache
will fill from (ie. L2 or further out).

This was being done for lowmem pages, but not for highmem which can lead
to icache corruption. Fix this by mapping highmem pages & flushing their
content from the dcache in __flush_dcache_page before providing the page
to userland, just as is done for lowmem pages.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Lars Persson <lars.persson@axis.com>
---

 arch/mips/mm/cache.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 3f159ca..5a67d8c 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -16,6 +16,7 @@
 #include <linux/mm.h>
 
 #include <asm/cacheflush.h>
+#include <asm/highmem.h>
 #include <asm/processor.h>
 #include <asm/cpu.h>
 #include <asm/cpu-features.h>
@@ -83,8 +84,6 @@ void __flush_dcache_page(struct page *page)
 	struct address_space *mapping = page_mapping(page);
 	unsigned long addr;
 
-	if (PageHighMem(page))
-		return;
 	if (mapping && !mapping_mapped(mapping)) {
 		SetPageDcacheDirty(page);
 		return;
@@ -95,8 +94,15 @@ void __flush_dcache_page(struct page *page)
 	 * case is for exec env/arg pages and those are %99 certainly going to
 	 * get faulted into the tlb (and thus flushed) anyways.
 	 */
-	addr = (unsigned long) page_address(page);
+	if (PageHighMem(page))
+		addr = (unsigned long)kmap_atomic(page);
+	else
+		addr = (unsigned long)page_address(page);
+
 	flush_data_cache_page(addr);
+
+	if (PageHighMem(page))
+		__kunmap_atomic((void *)addr);
 }
 
 EXPORT_SYMBOL(__flush_dcache_page);
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/4] MIPS: Flush highmem pages in __flush_dcache_page
@ 2016-03-01  2:37   ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips
  Cc: Paul Burton, Lars Persson, linux-kernel, Andrew Morton,
	Kirill A. Shutemov, Ralf Baechle

When flush_dcache_page is called on an executable page, that page is
about to be provided to userland & we can presume that the icache
contains no valid entries for its address range. However if the icache
does not fill from the dcache then we cannot presume that the pages
content has been written back as far as the memories that the dcache
will fill from (ie. L2 or further out).

This was being done for lowmem pages, but not for highmem which can lead
to icache corruption. Fix this by mapping highmem pages & flushing their
content from the dcache in __flush_dcache_page before providing the page
to userland, just as is done for lowmem pages.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Lars Persson <lars.persson@axis.com>
---

 arch/mips/mm/cache.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 3f159ca..5a67d8c 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -16,6 +16,7 @@
 #include <linux/mm.h>
 
 #include <asm/cacheflush.h>
+#include <asm/highmem.h>
 #include <asm/processor.h>
 #include <asm/cpu.h>
 #include <asm/cpu-features.h>
@@ -83,8 +84,6 @@ void __flush_dcache_page(struct page *page)
 	struct address_space *mapping = page_mapping(page);
 	unsigned long addr;
 
-	if (PageHighMem(page))
-		return;
 	if (mapping && !mapping_mapped(mapping)) {
 		SetPageDcacheDirty(page);
 		return;
@@ -95,8 +94,15 @@ void __flush_dcache_page(struct page *page)
 	 * case is for exec env/arg pages and those are %99 certainly going to
 	 * get faulted into the tlb (and thus flushed) anyways.
 	 */
-	addr = (unsigned long) page_address(page);
+	if (PageHighMem(page))
+		addr = (unsigned long)kmap_atomic(page);
+	else
+		addr = (unsigned long)page_address(page);
+
 	flush_data_cache_page(addr);
+
+	if (PageHighMem(page))
+		__kunmap_atomic((void *)addr);
 }
 
 EXPORT_SYMBOL(__flush_dcache_page);
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/4] MIPS: Handle highmem pages in __update_cache
@ 2016-03-01  2:37   ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips
  Cc: Paul Burton, Lars Persson, stable # v4 . 1+,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

The following patch will expose __update_cache to highmem pages. Handle
them by mapping them in for the duration of the cache maintenance, just
like in __flush_dcache_page. The code for that isn't shared because we
need the page address in __update_cache so sharing became messy. Given
that the entirity is an extra 5 lines, just duplicate it.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: stable <stable@vger.kernel.org> # v4.1+
---

 arch/mips/mm/cache.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 5a67d8c..8befa55 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -149,9 +149,17 @@ void __update_cache(struct vm_area_struct *vma, unsigned long address,
 		return;
 	page = pfn_to_page(pfn);
 	if (page_mapping(page) && Page_dcache_dirty(page)) {
-		addr = (unsigned long) page_address(page);
+		if (PageHighMem(page))
+			addr = (unsigned long)kmap_atomic(page);
+		else
+			addr = (unsigned long)page_address(page);
+
 		if (exec || pages_do_alias(addr, address & PAGE_MASK))
 			flush_data_cache_page(addr);
+
+		if (PageHighMem(page))
+			__kunmap_atomic((void *)addr);
+
 		ClearPageDcacheDirty(page);
 	}
 }
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/4] MIPS: Handle highmem pages in __update_cache
@ 2016-03-01  2:37   ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips
  Cc: Paul Burton, Lars Persson, stable # v4 . 1+,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

The following patch will expose __update_cache to highmem pages. Handle
them by mapping them in for the duration of the cache maintenance, just
like in __flush_dcache_page. The code for that isn't shared because we
need the page address in __update_cache so sharing became messy. Given
that the entirity is an extra 5 lines, just duplicate it.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: stable <stable@vger.kernel.org> # v4.1+
---

 arch/mips/mm/cache.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 5a67d8c..8befa55 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -149,9 +149,17 @@ void __update_cache(struct vm_area_struct *vma, unsigned long address,
 		return;
 	page = pfn_to_page(pfn);
 	if (page_mapping(page) && Page_dcache_dirty(page)) {
-		addr = (unsigned long) page_address(page);
+		if (PageHighMem(page))
+			addr = (unsigned long)kmap_atomic(page);
+		else
+			addr = (unsigned long)page_address(page);
+
 		if (exec || pages_do_alias(addr, address & PAGE_MASK))
 			flush_data_cache_page(addr);
+
+		if (PageHighMem(page))
+			__kunmap_atomic((void *)addr);
+
 		ClearPageDcacheDirty(page);
 	}
 }
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-01  2:37   ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips
  Cc: Paul Burton, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

It's possible for pages to become visible prior to update_mmu_cache
running if a thread within the same address space preempts the current
thread or runs simultaneously on another CPU. That is, the following
scenario is possible:

    CPU0                            CPU1

    write to page
    flush_dcache_page
    flush_icache_page
    set_pte_at
                                    map page
    update_mmu_cache

If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
& visible, and update_mmu_cache where the dcache flush occurs then CPU1s
icache will fill from stale data (unless it fills from the dcache, in
which case all is good, but most MIPS CPUs don't have this property).
Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
attempted to fix that by performing the dcache flush in
flush_icache_page such that it occurs before the set_pte_at call makes
the page visible. However it has the problem that not all code that
writes to pages exposed to userland call flush_icache_page. There are
many callers of set_pte_at under mm/ and only 2 of them do call
flush_icache_page. Thus the race window between a page becoming visible
& being coherent between the icache & dcache remains open in some cases.

To illustrate some of the cases, a WARN was added to __update_cache with
this patch applied that triggered in cases where a page about to be
flushed from the dcache was not the last page provided to
flush_icache_page. That is, backtraces were obtained for cases in which
the race window is left open without this patch. The 2 standout examples
follow.

When forking a process:

[   15.271842] [<80417630>] __update_cache+0xcc/0x188
[   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
[   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
[   15.289028] [<80429f80>] do_fork+0xe4/0x420
[   15.293747] [<80413808>] handle_sys+0x128/0x14c

When exec'ing an ELF binary:

[   14.445964] [<80417630>] __update_cache+0xcc/0x188
[   14.451369] [<80538d88>] move_page_tables+0x414/0x498
[   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
[   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
[   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
[   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
[   14.480324] [<8055f938>] do_execve+0x38/0x44
[   14.485137] [<80413808>] handle_sys+0x128/0x14c

These code paths write into a page, call flush_dcache_page then call
set_pte_at without flush_icache_page inbetween. The end result is that
the icache can become corrupted & userland processes may execute
unexpected or invalid code, typically resulting in a reserved
instruction exception, a trap or a segfault.

Fix this race condition fully by performing any cache maintenance
required to keep the icache & dcache in sync in set_pte_at, before the
page is made valid. This has the added bonus of ensuring the cache
maintenance always happens in one location, rather than being duplicated
in flush_icache_page & update_mmu_cache. It also matches the way other
architectures solve the same problem (see arm, ia64 & powerpc).

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: stable <stable@vger.kernel.org> # v4.1+
Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")

---

 arch/mips/include/asm/cacheflush.h |  6 ------
 arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
 arch/mips/mm/cache.c               | 19 +++----------------
 3 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index 7e9f468..34ed22e 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
 	unsigned long start, unsigned long end);
 extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
 extern void __flush_dcache_page(struct page *page);
-extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 static inline void flush_dcache_page(struct page *page)
@@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 static inline void flush_icache_page(struct vm_area_struct *vma,
 	struct page *page)
 {
-	if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
-	    Page_dcache_dirty(page)) {
-		__flush_icache_page(vma, page);
-		ClearPageDcacheDirty(page);
-	}
 }
 
 extern void (*flush_icache_range)(unsigned long start, unsigned long end);
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 9a4fe01..65bf2c0 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -127,10 +127,14 @@ do {									\
 	}								\
 } while(0)
 
+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pteval);
+
 #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 
 #define pte_none(pte)		(!(((pte).pte_high) & ~_PAGE_GLOBAL))
 #define pte_present(pte)	((pte).pte_low & _PAGE_PRESENT)
+#define pte_no_exec(pte)	((pte).pte_low & _PAGE_NO_EXEC)
 
 static inline void set_pte(pte_t *ptep, pte_t pte)
 {
@@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 			buddy->pte_high |= _PAGE_GLOBAL;
 	}
 }
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
 
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
@@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
 
 #define pte_none(pte)		(!(pte_val(pte) & ~_PAGE_GLOBAL))
 #define pte_present(pte)	(pte_val(pte) & _PAGE_PRESENT)
+#define pte_no_exec(pte)	(pte_val(pte) & _PAGE_NO_EXEC)
 
 /*
  * Certain architectures need to do special things when pte's
@@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	}
 #endif
 }
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
 
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
@@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
 }
 #endif
 
+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pteval)
+{
+	extern void __update_cache(unsigned long address, pte_t pte);
+
+	if (!pte_present(pteval))
+		goto cache_sync_done;
+
+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
+		goto cache_sync_done;
+
+	__update_cache(addr, pteval);
+cache_sync_done:
+	set_pte(ptep, pteval);
+}
+
 /*
  * (pmds are folded into puds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
@@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 
 extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
 	pte_t pte);
-extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
-	pte_t pte);
 
 static inline void update_mmu_cache(struct vm_area_struct *vma,
 	unsigned long address, pte_t *ptep)
 {
 	pte_t pte = *ptep;
 	__update_tlb(vma, address, pte);
-	__update_cache(vma, address, pte);
 }
 
 static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 8befa55..bf04c6c 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
 
 EXPORT_SYMBOL(__flush_anon_page);
 
-void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
-{
-	unsigned long addr;
-
-	if (PageHighMem(page))
-		return;
-
-	addr = (unsigned long) page_address(page);
-	flush_data_cache_page(addr);
-}
-EXPORT_SYMBOL_GPL(__flush_icache_page);
-
-void __update_cache(struct vm_area_struct *vma, unsigned long address,
-	pte_t pte)
+void __update_cache(unsigned long address, pte_t pte)
 {
 	struct page *page;
 	unsigned long pfn, addr;
-	int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
+	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
 
 	pfn = pte_pfn(pte);
 	if (unlikely(!pfn_valid(pfn)))
 		return;
 	page = pfn_to_page(pfn);
-	if (page_mapping(page) && Page_dcache_dirty(page)) {
+	if (Page_dcache_dirty(page)) {
 		if (PageHighMem(page))
 			addr = (unsigned long)kmap_atomic(page);
 		else
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-01  2:37   ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01  2:37 UTC (permalink / raw)
  To: linux-mips
  Cc: Paul Burton, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

It's possible for pages to become visible prior to update_mmu_cache
running if a thread within the same address space preempts the current
thread or runs simultaneously on another CPU. That is, the following
scenario is possible:

    CPU0                            CPU1

    write to page
    flush_dcache_page
    flush_icache_page
    set_pte_at
                                    map page
    update_mmu_cache

If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
& visible, and update_mmu_cache where the dcache flush occurs then CPU1s
icache will fill from stale data (unless it fills from the dcache, in
which case all is good, but most MIPS CPUs don't have this property).
Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
attempted to fix that by performing the dcache flush in
flush_icache_page such that it occurs before the set_pte_at call makes
the page visible. However it has the problem that not all code that
writes to pages exposed to userland call flush_icache_page. There are
many callers of set_pte_at under mm/ and only 2 of them do call
flush_icache_page. Thus the race window between a page becoming visible
& being coherent between the icache & dcache remains open in some cases.

To illustrate some of the cases, a WARN was added to __update_cache with
this patch applied that triggered in cases where a page about to be
flushed from the dcache was not the last page provided to
flush_icache_page. That is, backtraces were obtained for cases in which
the race window is left open without this patch. The 2 standout examples
follow.

When forking a process:

[   15.271842] [<80417630>] __update_cache+0xcc/0x188
[   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
[   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
[   15.289028] [<80429f80>] do_fork+0xe4/0x420
[   15.293747] [<80413808>] handle_sys+0x128/0x14c

When exec'ing an ELF binary:

[   14.445964] [<80417630>] __update_cache+0xcc/0x188
[   14.451369] [<80538d88>] move_page_tables+0x414/0x498
[   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
[   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
[   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
[   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
[   14.480324] [<8055f938>] do_execve+0x38/0x44
[   14.485137] [<80413808>] handle_sys+0x128/0x14c

These code paths write into a page, call flush_dcache_page then call
set_pte_at without flush_icache_page inbetween. The end result is that
the icache can become corrupted & userland processes may execute
unexpected or invalid code, typically resulting in a reserved
instruction exception, a trap or a segfault.

Fix this race condition fully by performing any cache maintenance
required to keep the icache & dcache in sync in set_pte_at, before the
page is made valid. This has the added bonus of ensuring the cache
maintenance always happens in one location, rather than being duplicated
in flush_icache_page & update_mmu_cache. It also matches the way other
architectures solve the same problem (see arm, ia64 & powerpc).

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: stable <stable@vger.kernel.org> # v4.1+
Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")

---

 arch/mips/include/asm/cacheflush.h |  6 ------
 arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
 arch/mips/mm/cache.c               | 19 +++----------------
 3 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index 7e9f468..34ed22e 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
 	unsigned long start, unsigned long end);
 extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
 extern void __flush_dcache_page(struct page *page);
-extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 static inline void flush_dcache_page(struct page *page)
@@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 static inline void flush_icache_page(struct vm_area_struct *vma,
 	struct page *page)
 {
-	if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
-	    Page_dcache_dirty(page)) {
-		__flush_icache_page(vma, page);
-		ClearPageDcacheDirty(page);
-	}
 }
 
 extern void (*flush_icache_range)(unsigned long start, unsigned long end);
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 9a4fe01..65bf2c0 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -127,10 +127,14 @@ do {									\
 	}								\
 } while(0)
 
+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pteval);
+
 #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 
 #define pte_none(pte)		(!(((pte).pte_high) & ~_PAGE_GLOBAL))
 #define pte_present(pte)	((pte).pte_low & _PAGE_PRESENT)
+#define pte_no_exec(pte)	((pte).pte_low & _PAGE_NO_EXEC)
 
 static inline void set_pte(pte_t *ptep, pte_t pte)
 {
@@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 			buddy->pte_high |= _PAGE_GLOBAL;
 	}
 }
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
 
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
@@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
 
 #define pte_none(pte)		(!(pte_val(pte) & ~_PAGE_GLOBAL))
 #define pte_present(pte)	(pte_val(pte) & _PAGE_PRESENT)
+#define pte_no_exec(pte)	(pte_val(pte) & _PAGE_NO_EXEC)
 
 /*
  * Certain architectures need to do special things when pte's
@@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	}
 #endif
 }
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
 
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
@@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
 }
 #endif
 
+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pteval)
+{
+	extern void __update_cache(unsigned long address, pte_t pte);
+
+	if (!pte_present(pteval))
+		goto cache_sync_done;
+
+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
+		goto cache_sync_done;
+
+	__update_cache(addr, pteval);
+cache_sync_done:
+	set_pte(ptep, pteval);
+}
+
 /*
  * (pmds are folded into puds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
@@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 
 extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
 	pte_t pte);
-extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
-	pte_t pte);
 
 static inline void update_mmu_cache(struct vm_area_struct *vma,
 	unsigned long address, pte_t *ptep)
 {
 	pte_t pte = *ptep;
 	__update_tlb(vma, address, pte);
-	__update_cache(vma, address, pte);
 }
 
 static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 8befa55..bf04c6c 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
 
 EXPORT_SYMBOL(__flush_anon_page);
 
-void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
-{
-	unsigned long addr;
-
-	if (PageHighMem(page))
-		return;
-
-	addr = (unsigned long) page_address(page);
-	flush_data_cache_page(addr);
-}
-EXPORT_SYMBOL_GPL(__flush_icache_page);
-
-void __update_cache(struct vm_area_struct *vma, unsigned long address,
-	pte_t pte)
+void __update_cache(unsigned long address, pte_t pte)
 {
 	struct page *page;
 	unsigned long pfn, addr;
-	int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
+	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
 
 	pfn = pte_pfn(pte);
 	if (unlikely(!pfn_valid(pfn)))
 		return;
 	page = pfn_to_page(pfn);
-	if (page_mapping(page) && Page_dcache_dirty(page)) {
+	if (Page_dcache_dirty(page)) {
 		if (PageHighMem(page))
 			addr = (unsigned long)kmap_atomic(page);
 		else
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-01  9:44     ` Lars Persson
  0 siblings, 0 replies; 45+ messages in thread
From: Lars Persson @ 2016-03-01  9:44 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Hi Paul

I think this patch will introduce a performance regression on MIPSes 
without RI/XI hardware support.

There will be redundant cache flushes because pte_no_exec() always 
evaluates to false without RI/XI. Is it possible to find an unused bit 
in the pte_t to store the X-bit also for non-RIXI cpus ?

BR,
  Lars

On 03/01/2016 03:37 AM, Paul Burton wrote:
> It's possible for pages to become visible prior to update_mmu_cache
> running if a thread within the same address space preempts the current
> thread or runs simultaneously on another CPU. That is, the following
> scenario is possible:
>
>      CPU0                            CPU1
>
>      write to page
>      flush_dcache_page
>      flush_icache_page
>      set_pte_at
>                                      map page
>      update_mmu_cache
>
> If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
> & visible, and update_mmu_cache where the dcache flush occurs then CPU1s
> icache will fill from stale data (unless it fills from the dcache, in
> which case all is good, but most MIPS CPUs don't have this property).
> Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> attempted to fix that by performing the dcache flush in
> flush_icache_page such that it occurs before the set_pte_at call makes
> the page visible. However it has the problem that not all code that
> writes to pages exposed to userland call flush_icache_page. There are
> many callers of set_pte_at under mm/ and only 2 of them do call
> flush_icache_page. Thus the race window between a page becoming visible
> & being coherent between the icache & dcache remains open in some cases.
>
> To illustrate some of the cases, a WARN was added to __update_cache with
> this patch applied that triggered in cases where a page about to be
> flushed from the dcache was not the last page provided to
> flush_icache_page. That is, backtraces were obtained for cases in which
> the race window is left open without this patch. The 2 standout examples
> follow.
>
> When forking a process:
>
> [   15.271842] [<80417630>] __update_cache+0xcc/0x188
> [   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
> [   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
> [   15.289028] [<80429f80>] do_fork+0xe4/0x420
> [   15.293747] [<80413808>] handle_sys+0x128/0x14c
>
> When exec'ing an ELF binary:
>
> [   14.445964] [<80417630>] __update_cache+0xcc/0x188
> [   14.451369] [<80538d88>] move_page_tables+0x414/0x498
> [   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
> [   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
> [   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
> [   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
> [   14.480324] [<8055f938>] do_execve+0x38/0x44
> [   14.485137] [<80413808>] handle_sys+0x128/0x14c
>
> These code paths write into a page, call flush_dcache_page then call
> set_pte_at without flush_icache_page inbetween. The end result is that
> the icache can become corrupted & userland processes may execute
> unexpected or invalid code, typically resulting in a reserved
> instruction exception, a trap or a segfault.
>
> Fix this race condition fully by performing any cache maintenance
> required to keep the icache & dcache in sync in set_pte_at, before the
> page is made valid. This has the added bonus of ensuring the cache
> maintenance always happens in one location, rather than being duplicated
> in flush_icache_page & update_mmu_cache. It also matches the way other
> architectures solve the same problem (see arm, ia64 & powerpc).
>
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
> Cc: Lars Persson <lars.persson@axis.com>
> Cc: stable <stable@vger.kernel.org> # v4.1+
> Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
>
> ---
>
>   arch/mips/include/asm/cacheflush.h |  6 ------
>   arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
>   arch/mips/mm/cache.c               | 19 +++----------------
>   3 files changed, 24 insertions(+), 27 deletions(-)
>
> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> index 7e9f468..34ed22e 100644
> --- a/arch/mips/include/asm/cacheflush.h
> +++ b/arch/mips/include/asm/cacheflush.h
> @@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
>   	unsigned long start, unsigned long end);
>   extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
>   extern void __flush_dcache_page(struct page *page);
> -extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
>
>   #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>   static inline void flush_dcache_page(struct page *page)
> @@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
>   static inline void flush_icache_page(struct vm_area_struct *vma,
>   	struct page *page)
>   {
> -	if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
> -	    Page_dcache_dirty(page)) {
> -		__flush_icache_page(vma, page);
> -		ClearPageDcacheDirty(page);
> -	}
>   }
>
>   extern void (*flush_icache_range)(unsigned long start, unsigned long end);
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index 9a4fe01..65bf2c0 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -127,10 +127,14 @@ do {									\
>   	}								\
>   } while(0)
>
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pteval);
> +
>   #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
>
>   #define pte_none(pte)		(!(((pte).pte_high) & ~_PAGE_GLOBAL))
>   #define pte_present(pte)	((pte).pte_low & _PAGE_PRESENT)
> +#define pte_no_exec(pte)	((pte).pte_low & _PAGE_NO_EXEC)
>
>   static inline void set_pte(pte_t *ptep, pte_t pte)
>   {
> @@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>   			buddy->pte_high |= _PAGE_GLOBAL;
>   	}
>   }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
>
>   static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>   {
> @@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>
>   #define pte_none(pte)		(!(pte_val(pte) & ~_PAGE_GLOBAL))
>   #define pte_present(pte)	(pte_val(pte) & _PAGE_PRESENT)
> +#define pte_no_exec(pte)	(pte_val(pte) & _PAGE_NO_EXEC)
>
>   /*
>    * Certain architectures need to do special things when pte's
> @@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>   	}
>   #endif
>   }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
>
>   static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>   {
> @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>   }
>   #endif
>
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pteval)
> +{
> +	extern void __update_cache(unsigned long address, pte_t pte);
> +
> +	if (!pte_present(pteval))
> +		goto cache_sync_done;
> +
> +	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> +		goto cache_sync_done;
> +
> +	__update_cache(addr, pteval);
> +cache_sync_done:
> +	set_pte(ptep, pteval);
> +}
> +
>   /*
>    * (pmds are folded into puds so this doesn't get actually called,
>    * but the define is needed for a generic inline function.)
> @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>
>   extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
>   	pte_t pte);
> -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -	pte_t pte);
>
>   static inline void update_mmu_cache(struct vm_area_struct *vma,
>   	unsigned long address, pte_t *ptep)
>   {
>   	pte_t pte = *ptep;
>   	__update_tlb(vma, address, pte);
> -	__update_cache(vma, address, pte);
>   }
>
>   static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 8befa55..bf04c6c 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
>
>   EXPORT_SYMBOL(__flush_anon_page);
>
> -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> -{
> -	unsigned long addr;
> -
> -	if (PageHighMem(page))
> -		return;
> -
> -	addr = (unsigned long) page_address(page);
> -	flush_data_cache_page(addr);
> -}
> -EXPORT_SYMBOL_GPL(__flush_icache_page);
> -
> -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -	pte_t pte)
> +void __update_cache(unsigned long address, pte_t pte)
>   {
>   	struct page *page;
>   	unsigned long pfn, addr;
> -	int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> +	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
>
>   	pfn = pte_pfn(pte);
>   	if (unlikely(!pfn_valid(pfn)))
>   		return;
>   	page = pfn_to_page(pfn);
> -	if (page_mapping(page) && Page_dcache_dirty(page)) {
> +	if (Page_dcache_dirty(page)) {
>   		if (PageHighMem(page))
>   			addr = (unsigned long)kmap_atomic(page);
>   		else
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-01  9:44     ` Lars Persson
  0 siblings, 0 replies; 45+ messages in thread
From: Lars Persson @ 2016-03-01  9:44 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Hi Paul

I think this patch will introduce a performance regression on MIPSes 
without RI/XI hardware support.

There will be redundant cache flushes because pte_no_exec() always 
evaluates to false without RI/XI. Is it possible to find an unused bit 
in the pte_t to store the X-bit also for non-RIXI cpus ?

BR,
  Lars

On 03/01/2016 03:37 AM, Paul Burton wrote:
> It's possible for pages to become visible prior to update_mmu_cache
> running if a thread within the same address space preempts the current
> thread or runs simultaneously on another CPU. That is, the following
> scenario is possible:
>
>      CPU0                            CPU1
>
>      write to page
>      flush_dcache_page
>      flush_icache_page
>      set_pte_at
>                                      map page
>      update_mmu_cache
>
> If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
> & visible, and update_mmu_cache where the dcache flush occurs then CPU1s
> icache will fill from stale data (unless it fills from the dcache, in
> which case all is good, but most MIPS CPUs don't have this property).
> Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> attempted to fix that by performing the dcache flush in
> flush_icache_page such that it occurs before the set_pte_at call makes
> the page visible. However it has the problem that not all code that
> writes to pages exposed to userland call flush_icache_page. There are
> many callers of set_pte_at under mm/ and only 2 of them do call
> flush_icache_page. Thus the race window between a page becoming visible
> & being coherent between the icache & dcache remains open in some cases.
>
> To illustrate some of the cases, a WARN was added to __update_cache with
> this patch applied that triggered in cases where a page about to be
> flushed from the dcache was not the last page provided to
> flush_icache_page. That is, backtraces were obtained for cases in which
> the race window is left open without this patch. The 2 standout examples
> follow.
>
> When forking a process:
>
> [   15.271842] [<80417630>] __update_cache+0xcc/0x188
> [   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
> [   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
> [   15.289028] [<80429f80>] do_fork+0xe4/0x420
> [   15.293747] [<80413808>] handle_sys+0x128/0x14c
>
> When exec'ing an ELF binary:
>
> [   14.445964] [<80417630>] __update_cache+0xcc/0x188
> [   14.451369] [<80538d88>] move_page_tables+0x414/0x498
> [   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
> [   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
> [   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
> [   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
> [   14.480324] [<8055f938>] do_execve+0x38/0x44
> [   14.485137] [<80413808>] handle_sys+0x128/0x14c
>
> These code paths write into a page, call flush_dcache_page then call
> set_pte_at without flush_icache_page inbetween. The end result is that
> the icache can become corrupted & userland processes may execute
> unexpected or invalid code, typically resulting in a reserved
> instruction exception, a trap or a segfault.
>
> Fix this race condition fully by performing any cache maintenance
> required to keep the icache & dcache in sync in set_pte_at, before the
> page is made valid. This has the added bonus of ensuring the cache
> maintenance always happens in one location, rather than being duplicated
> in flush_icache_page & update_mmu_cache. It also matches the way other
> architectures solve the same problem (see arm, ia64 & powerpc).
>
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
> Cc: Lars Persson <lars.persson@axis.com>
> Cc: stable <stable@vger.kernel.org> # v4.1+
> Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
>
> ---
>
>   arch/mips/include/asm/cacheflush.h |  6 ------
>   arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
>   arch/mips/mm/cache.c               | 19 +++----------------
>   3 files changed, 24 insertions(+), 27 deletions(-)
>
> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> index 7e9f468..34ed22e 100644
> --- a/arch/mips/include/asm/cacheflush.h
> +++ b/arch/mips/include/asm/cacheflush.h
> @@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
>   	unsigned long start, unsigned long end);
>   extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
>   extern void __flush_dcache_page(struct page *page);
> -extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
>
>   #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>   static inline void flush_dcache_page(struct page *page)
> @@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
>   static inline void flush_icache_page(struct vm_area_struct *vma,
>   	struct page *page)
>   {
> -	if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
> -	    Page_dcache_dirty(page)) {
> -		__flush_icache_page(vma, page);
> -		ClearPageDcacheDirty(page);
> -	}
>   }
>
>   extern void (*flush_icache_range)(unsigned long start, unsigned long end);
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index 9a4fe01..65bf2c0 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -127,10 +127,14 @@ do {									\
>   	}								\
>   } while(0)
>
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pteval);
> +
>   #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
>
>   #define pte_none(pte)		(!(((pte).pte_high) & ~_PAGE_GLOBAL))
>   #define pte_present(pte)	((pte).pte_low & _PAGE_PRESENT)
> +#define pte_no_exec(pte)	((pte).pte_low & _PAGE_NO_EXEC)
>
>   static inline void set_pte(pte_t *ptep, pte_t pte)
>   {
> @@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>   			buddy->pte_high |= _PAGE_GLOBAL;
>   	}
>   }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
>
>   static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>   {
> @@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>
>   #define pte_none(pte)		(!(pte_val(pte) & ~_PAGE_GLOBAL))
>   #define pte_present(pte)	(pte_val(pte) & _PAGE_PRESENT)
> +#define pte_no_exec(pte)	(pte_val(pte) & _PAGE_NO_EXEC)
>
>   /*
>    * Certain architectures need to do special things when pte's
> @@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>   	}
>   #endif
>   }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
>
>   static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>   {
> @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>   }
>   #endif
>
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pteval)
> +{
> +	extern void __update_cache(unsigned long address, pte_t pte);
> +
> +	if (!pte_present(pteval))
> +		goto cache_sync_done;
> +
> +	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> +		goto cache_sync_done;
> +
> +	__update_cache(addr, pteval);
> +cache_sync_done:
> +	set_pte(ptep, pteval);
> +}
> +
>   /*
>    * (pmds are folded into puds so this doesn't get actually called,
>    * but the define is needed for a generic inline function.)
> @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>
>   extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
>   	pte_t pte);
> -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -	pte_t pte);
>
>   static inline void update_mmu_cache(struct vm_area_struct *vma,
>   	unsigned long address, pte_t *ptep)
>   {
>   	pte_t pte = *ptep;
>   	__update_tlb(vma, address, pte);
> -	__update_cache(vma, address, pte);
>   }
>
>   static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 8befa55..bf04c6c 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
>
>   EXPORT_SYMBOL(__flush_anon_page);
>
> -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> -{
> -	unsigned long addr;
> -
> -	if (PageHighMem(page))
> -		return;
> -
> -	addr = (unsigned long) page_address(page);
> -	flush_data_cache_page(addr);
> -}
> -EXPORT_SYMBOL_GPL(__flush_icache_page);
> -
> -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -	pte_t pte)
> +void __update_cache(unsigned long address, pte_t pte)
>   {
>   	struct page *page;
>   	unsigned long pfn, addr;
> -	int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> +	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
>
>   	pfn = pte_pfn(pte);
>   	if (unlikely(!pfn_valid(pfn)))
>   		return;
>   	page = pfn_to_page(pfn);
> -	if (page_mapping(page) && Page_dcache_dirty(page)) {
> +	if (Page_dcache_dirty(page)) {
>   		if (PageHighMem(page))
>   			addr = (unsigned long)kmap_atomic(page);
>   		else
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-01 17:13     ` David Daney
  0 siblings, 0 replies; 45+ messages in thread
From: David Daney @ 2016-03-01 17:13 UTC (permalink / raw)
  To: Paul Burton, Ralf Baechle
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On 02/29/2016 06:37 PM, Paul Burton wrote:
[...]
> @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>   }
>   #endif
>
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pteval)
> +{
> +	extern void __update_cache(unsigned long address, pte_t pte);
> +
> +	if (!pte_present(pteval))
> +		goto cache_sync_done;
> +
> +	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> +		goto cache_sync_done;
> +
> +	__update_cache(addr, pteval);
> +cache_sync_done:
> +	set_pte(ptep, pteval);
> +}
> +

This seems crazy.  I don't think any other architecture does this type 
of work in set_pte_at().

Can you look into finding a better way?

What if you ...


>   /*
>    * (pmds are folded into puds so this doesn't get actually called,
>    * but the define is needed for a generic inline function.)
> @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>
>   extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
>   	pte_t pte);
> -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -	pte_t pte);
>
>   static inline void update_mmu_cache(struct vm_area_struct *vma,
>   	unsigned long address, pte_t *ptep)
>   {
>   	pte_t pte = *ptep;
>   	__update_tlb(vma, address, pte);
> -	__update_cache(vma, address, pte);

... Reversed the order of these two operations?

>   }
>
>   static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 8befa55..bf04c6c 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
>
>   EXPORT_SYMBOL(__flush_anon_page);
>
> -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> -{
> -	unsigned long addr;
> -
> -	if (PageHighMem(page))
> -		return;
> -
> -	addr = (unsigned long) page_address(page);
> -	flush_data_cache_page(addr);
> -}
> -EXPORT_SYMBOL_GPL(__flush_icache_page);
> -
> -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -	pte_t pte)
> +void __update_cache(unsigned long address, pte_t pte)
>   {
>   	struct page *page;
>   	unsigned long pfn, addr;
> -	int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> +	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
>
>   	pfn = pte_pfn(pte);
>   	if (unlikely(!pfn_valid(pfn)))
>   		return;
>   	page = pfn_to_page(pfn);
> -	if (page_mapping(page) && Page_dcache_dirty(page)) {
> +	if (Page_dcache_dirty(page)) {
>   		if (PageHighMem(page))
>   			addr = (unsigned long)kmap_atomic(page);
>   		else
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-01 17:13     ` David Daney
  0 siblings, 0 replies; 45+ messages in thread
From: David Daney @ 2016-03-01 17:13 UTC (permalink / raw)
  To: Paul Burton, Ralf Baechle
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On 02/29/2016 06:37 PM, Paul Burton wrote:
[...]
> @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>   }
>   #endif
>
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pteval)
> +{
> +	extern void __update_cache(unsigned long address, pte_t pte);
> +
> +	if (!pte_present(pteval))
> +		goto cache_sync_done;
> +
> +	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> +		goto cache_sync_done;
> +
> +	__update_cache(addr, pteval);
> +cache_sync_done:
> +	set_pte(ptep, pteval);
> +}
> +

This seems crazy.  I don't think any other architecture does this type 
of work in set_pte_at().

Can you look into finding a better way?

What if you ...


>   /*
>    * (pmds are folded into puds so this doesn't get actually called,
>    * but the define is needed for a generic inline function.)
> @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>
>   extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
>   	pte_t pte);
> -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -	pte_t pte);
>
>   static inline void update_mmu_cache(struct vm_area_struct *vma,
>   	unsigned long address, pte_t *ptep)
>   {
>   	pte_t pte = *ptep;
>   	__update_tlb(vma, address, pte);
> -	__update_cache(vma, address, pte);

... Reversed the order of these two operations?

>   }
>
>   static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 8befa55..bf04c6c 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
>
>   EXPORT_SYMBOL(__flush_anon_page);
>
> -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> -{
> -	unsigned long addr;
> -
> -	if (PageHighMem(page))
> -		return;
> -
> -	addr = (unsigned long) page_address(page);
> -	flush_data_cache_page(addr);
> -}
> -EXPORT_SYMBOL_GPL(__flush_icache_page);
> -
> -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -	pte_t pte)
> +void __update_cache(unsigned long address, pte_t pte)
>   {
>   	struct page *page;
>   	unsigned long pfn, addr;
> -	int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> +	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
>
>   	pfn = pte_pfn(pte);
>   	if (unlikely(!pfn_valid(pfn)))
>   		return;
>   	page = pfn_to_page(pfn);
> -	if (page_mapping(page) && Page_dcache_dirty(page)) {
> +	if (Page_dcache_dirty(page)) {
>   		if (PageHighMem(page))
>   			addr = (unsigned long)kmap_atomic(page);
>   		else
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-01 17:19       ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01 17:19 UTC (permalink / raw)
  To: David Daney
  Cc: Ralf Baechle, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Tue, Mar 01, 2016 at 09:13:23AM -0800, David Daney wrote:
> On 02/29/2016 06:37 PM, Paul Burton wrote:
> [...]
> >@@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> >  }
> >  #endif
> >
> >+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> >+			      pte_t *ptep, pte_t pteval)
> >+{
> >+	extern void __update_cache(unsigned long address, pte_t pte);
> >+
> >+	if (!pte_present(pteval))
> >+		goto cache_sync_done;
> >+
> >+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> >+		goto cache_sync_done;
> >+
> >+	__update_cache(addr, pteval);
> >+cache_sync_done:
> >+	set_pte(ptep, pteval);
> >+}
> >+
> 
> This seems crazy.

Perhaps, but also correct...

> I don't think any other architecture does this type of work in set_pte_at().

Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
for architectures that all do the same sort of thing in set_pte_at.

> Can you look into finding a better way?

Not that I can see.

> What if you ...
> 
> 
> >  /*
> >   * (pmds are folded into puds so this doesn't get actually called,
> >   * but the define is needed for a generic inline function.)
> >@@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> >
> >  extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> >  	pte_t pte);
> >-extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> >-	pte_t pte);
> >
> >  static inline void update_mmu_cache(struct vm_area_struct *vma,
> >  	unsigned long address, pte_t *ptep)
> >  {
> >  	pte_t pte = *ptep;
> >  	__update_tlb(vma, address, pte);
> >-	__update_cache(vma, address, pte);
> 
> ... Reversed the order of these two operations?

It would make no difference. The window for the race exists between
flush_dcache_page & set_pte_at. update_mmu_cache isn't called until
later than set_pte_at, so cannot possibly avoid the race. The commit
message walks through where the race exists - I don't think you've read
it.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-01 17:19       ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-01 17:19 UTC (permalink / raw)
  To: David Daney
  Cc: Ralf Baechle, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Tue, Mar 01, 2016 at 09:13:23AM -0800, David Daney wrote:
> On 02/29/2016 06:37 PM, Paul Burton wrote:
> [...]
> >@@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> >  }
> >  #endif
> >
> >+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> >+			      pte_t *ptep, pte_t pteval)
> >+{
> >+	extern void __update_cache(unsigned long address, pte_t pte);
> >+
> >+	if (!pte_present(pteval))
> >+		goto cache_sync_done;
> >+
> >+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> >+		goto cache_sync_done;
> >+
> >+	__update_cache(addr, pteval);
> >+cache_sync_done:
> >+	set_pte(ptep, pteval);
> >+}
> >+
> 
> This seems crazy.

Perhaps, but also correct...

> I don't think any other architecture does this type of work in set_pte_at().

Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
for architectures that all do the same sort of thing in set_pte_at.

> Can you look into finding a better way?

Not that I can see.

> What if you ...
> 
> 
> >  /*
> >   * (pmds are folded into puds so this doesn't get actually called,
> >   * but the define is needed for a generic inline function.)
> >@@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> >
> >  extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> >  	pte_t pte);
> >-extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> >-	pte_t pte);
> >
> >  static inline void update_mmu_cache(struct vm_area_struct *vma,
> >  	unsigned long address, pte_t *ptep)
> >  {
> >  	pte_t pte = *ptep;
> >  	__update_tlb(vma, address, pte);
> >-	__update_cache(vma, address, pte);
> 
> ... Reversed the order of these two operations?

It would make no difference. The window for the race exists between
flush_dcache_page & set_pte_at. update_mmu_cache isn't called until
later than set_pte_at, so cannot possibly avoid the race. The commit
message walks through where the race exists - I don't think you've read
it.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/4] MIPS cache & highmem fixes
@ 2016-03-02 11:39   ` Harvey Hunt
  0 siblings, 0 replies; 45+ messages in thread
From: Harvey Hunt @ 2016-03-02 11:39 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Lars Persson, Steven J. Hill, Huacai Chen, David Daney,
	Aneesh Kumar K.V, linux-kernel, Jerome Marchand, Ralf Baechle,
	Kirill A. Shutemov, Andrew Morton

Hi Paul,

On 01/03/16 02:37, Paul Burton wrote:
> This series fixes up a few issues with our current cache maintenance
> code, some specific to highmem & some not. It fixes an issue with icache
> corruption seen on the pistachio SoC & Ci40, and icache corruption seen
> using highmem on MIPS Malta boards with a P5600 CPU.
>
> Applies atop v4.5-rc6. It would be great to squeeze these in for v4.5,
> but I recognise it's quite late in the cycle & this brokenness has been
> around for a while so won't object to v4.6.
>
> Thanks,
>      Paul
>
> Paul Burton (4):
>    MIPS: Flush dcache for flush_kernel_dcache_page
>    MIPS: Flush highmem pages in __flush_dcache_page
>    MIPS: Handle highmem pages in __update_cache
>    MIPS: Sync icache & dcache in set_pte_at
>
>   arch/mips/include/asm/cacheflush.h |  7 +------
>   arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++-----
>   arch/mips/mm/cache.c               | 41 +++++++++++++++++++-------------------
>   3 files changed, 43 insertions(+), 31 deletions(-)
>

I tested this patchset on a Ci20 (using highmem) and I can successfully 
boot to userland from an rfs on NAND, which I couldn't do before.

Tested-by: Harvey Hunt <harvey.hunt@imgtec.com>

Thanks,

Harvey

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/4] MIPS cache & highmem fixes
@ 2016-03-02 11:39   ` Harvey Hunt
  0 siblings, 0 replies; 45+ messages in thread
From: Harvey Hunt @ 2016-03-02 11:39 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Lars Persson, Steven J. Hill, Huacai Chen, David Daney,
	Aneesh Kumar K.V, linux-kernel, Jerome Marchand, Ralf Baechle,
	Kirill A. Shutemov, Andrew Morton

Hi Paul,

On 01/03/16 02:37, Paul Burton wrote:
> This series fixes up a few issues with our current cache maintenance
> code, some specific to highmem & some not. It fixes an issue with icache
> corruption seen on the pistachio SoC & Ci40, and icache corruption seen
> using highmem on MIPS Malta boards with a P5600 CPU.
>
> Applies atop v4.5-rc6. It would be great to squeeze these in for v4.5,
> but I recognise it's quite late in the cycle & this brokenness has been
> around for a while so won't object to v4.6.
>
> Thanks,
>      Paul
>
> Paul Burton (4):
>    MIPS: Flush dcache for flush_kernel_dcache_page
>    MIPS: Flush highmem pages in __flush_dcache_page
>    MIPS: Handle highmem pages in __update_cache
>    MIPS: Sync icache & dcache in set_pte_at
>
>   arch/mips/include/asm/cacheflush.h |  7 +------
>   arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++-----
>   arch/mips/mm/cache.c               | 41 +++++++++++++++++++-------------------
>   3 files changed, 43 insertions(+), 31 deletions(-)
>

I tested this patchset on a Ci20 (using highmem) and I can successfully 
boot to userland from an rfs on NAND, which I couldn't do before.

Tested-by: Harvey Hunt <harvey.hunt@imgtec.com>

Thanks,

Harvey

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
  2016-03-01 17:19       ` Paul Burton
  (?)
@ 2016-03-02 14:12       ` Ralf Baechle
  2016-03-02 14:24           ` Paul Burton
  -1 siblings, 1 reply; 45+ messages in thread
From: Ralf Baechle @ 2016-03-02 14:12 UTC (permalink / raw)
  To: Paul Burton
  Cc: David Daney, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Tue, Mar 01, 2016 at 05:19:40PM +0000, Paul Burton wrote:

> > >+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > >+			      pte_t *ptep, pte_t pteval)
> > >+{
> > >+	extern void __update_cache(unsigned long address, pte_t pte);
> > >+
> > >+	if (!pte_present(pteval))
> > >+		goto cache_sync_done;
> > >+
> > >+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> > >+		goto cache_sync_done;
> > >+
> > >+	__update_cache(addr, pteval);
> > >+cache_sync_done:
> > >+	set_pte(ptep, pteval);
> > >+}
> > >+
> > 
> > This seems crazy.
> 
> Perhaps, but also correct...
> 
> > I don't think any other architecture does this type of work in set_pte_at().
> 
> Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
> for architectures that all do the same sort of thing in set_pte_at.
> 
> > Can you look into finding a better way?
> 
> Not that I can see.

FYI, this is the original commit message adding set_pte_at().  The commit
predates Linus' git history but is in the various history trees, for example
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=ae3d0a847f4b38812241e4a5dc3371965c752a8c

Or for your convenience:

commit ae3d0a847f4b38812241e4a5dc3371965c752a8c
Author: David S. Miller <davem@nuts.davemloft.net>
Date:   Tue Feb 22 23:42:56 2005 -0800

    [MM]: Add set_pte_at() which takes 'mm' and 'addr' args.
    
    I'm taking a slightly different approach this time around so things
    are easier to integrate.  Here is the first patch which builds the
    infrastructure.  Basically:
    
    1) Add set_pte_at() which is set_pte() with 'mm' and 'addr' arguments
       added.  All generic code uses set_pte_at().
    
       Most platforms simply get this define:
        #define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
    
       I chose this method over simply changing all set_pte() call sites
       because many platforms implement this in assembler and it would
       take forever to preserve the build and stabilize things if modifying
       that was necessary.
    
       Soon, with platform maintainer's help, we can kill of set_pte() entirely.
       To be honest, there are only a handful of set_pte() call sites in the
       arch specific code.
    
       Actually, in this patch ppc64 is completely set_pte() free and does not
       define it.
    
    2) pte_clear() gets 'mm' and 'addr' arguments now.
       This had a cascading effect on many ptep_test_and_*() routines.  Specifically:
       a) ptep_test_and_clear_{young,dirty}() now take 'vma' and 'address' args.
       b) ptep_get_and_clear now take 'mm' and 'address' args.
       c) ptep_mkdirty was deleted, unused by any code.
       d) ptep_set_wrprotect now takes 'mm' and 'address' args.
    
    I've tested this patch as follows:
    
    1) compile and run tested on sparc64/SMP
    2) compile tested on:
       a) ppc64/SMP
       b) i386 both with and without PAE enabled
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

Which doesn't specify if the function is meant for cache management nor
is it documented in Documentation/cachetlb.txt.

  Ralf

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-02 14:24           ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-02 14:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: David Daney, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Wed, Mar 02, 2016 at 03:12:03PM +0100, Ralf Baechle wrote:
> On Tue, Mar 01, 2016 at 05:19:40PM +0000, Paul Burton wrote:
> 
> > > >+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > > >+			      pte_t *ptep, pte_t pteval)
> > > >+{
> > > >+	extern void __update_cache(unsigned long address, pte_t pte);
> > > >+
> > > >+	if (!pte_present(pteval))
> > > >+		goto cache_sync_done;
> > > >+
> > > >+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> > > >+		goto cache_sync_done;
> > > >+
> > > >+	__update_cache(addr, pteval);
> > > >+cache_sync_done:
> > > >+	set_pte(ptep, pteval);
> > > >+}
> > > >+
> > > 
> > > This seems crazy.
> > 
> > Perhaps, but also correct...
> > 
> > > I don't think any other architecture does this type of work in set_pte_at().
> > 
> > Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
> > for architectures that all do the same sort of thing in set_pte_at.
> > 
> > > Can you look into finding a better way?
> > 
> > Not that I can see.
> 
> FYI, this is the original commit message adding set_pte_at().  The commit
> predates Linus' git history but is in the various history trees, for example
> https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=ae3d0a847f4b38812241e4a5dc3371965c752a8c
> 
> Or for your convenience:
> 
> commit ae3d0a847f4b38812241e4a5dc3371965c752a8c
> Author: David S. Miller <davem@nuts.davemloft.net>
> Date:   Tue Feb 22 23:42:56 2005 -0800
> 
>     [MM]: Add set_pte_at() which takes 'mm' and 'addr' args.
>     
>     I'm taking a slightly different approach this time around so things
>     are easier to integrate.  Here is the first patch which builds the
>     infrastructure.  Basically:
>     
>     1) Add set_pte_at() which is set_pte() with 'mm' and 'addr' arguments
>        added.  All generic code uses set_pte_at().
>     
>        Most platforms simply get this define:
>         #define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
>     
>        I chose this method over simply changing all set_pte() call sites
>        because many platforms implement this in assembler and it would
>        take forever to preserve the build and stabilize things if modifying
>        that was necessary.
>     
>        Soon, with platform maintainer's help, we can kill of set_pte() entirely.
>        To be honest, there are only a handful of set_pte() call sites in the
>        arch specific code.
>     
>        Actually, in this patch ppc64 is completely set_pte() free and does not
>        define it.
>     
>     2) pte_clear() gets 'mm' and 'addr' arguments now.
>        This had a cascading effect on many ptep_test_and_*() routines.  Specifically:
>        a) ptep_test_and_clear_{young,dirty}() now take 'vma' and 'address' args.
>        b) ptep_get_and_clear now take 'mm' and 'address' args.
>        c) ptep_mkdirty was deleted, unused by any code.
>        d) ptep_set_wrprotect now takes 'mm' and 'address' args.
>     
>     I've tested this patch as follows:
>     
>     1) compile and run tested on sparc64/SMP
>     2) compile tested on:
>        a) ppc64/SMP
>        b) i386 both with and without PAE enabled
>     
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Which doesn't specify if the function is meant for cache management nor
> is it documented in Documentation/cachetlb.txt.
> 
>   Ralf

Hi Ralf,

It is, however, used in such a way by others & seems to me like the only
correct way to implement the lazy cache flushing. The alternative would
be to adjust all generic code to ensure flush_icache_page gets called
before set_pte_at, which is far more invasive.

Since you mention Documentation/cachetlb.txt that does indicate that
flush_icache_page should be removed some time & prior to this patch
we're making use of it, so I hardly think the content of that document
can be taken as defining what's acceptable.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-02 14:24           ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-02 14:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: David Daney, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Wed, Mar 02, 2016 at 03:12:03PM +0100, Ralf Baechle wrote:
> On Tue, Mar 01, 2016 at 05:19:40PM +0000, Paul Burton wrote:
> 
> > > >+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > > >+			      pte_t *ptep, pte_t pteval)
> > > >+{
> > > >+	extern void __update_cache(unsigned long address, pte_t pte);
> > > >+
> > > >+	if (!pte_present(pteval))
> > > >+		goto cache_sync_done;
> > > >+
> > > >+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> > > >+		goto cache_sync_done;
> > > >+
> > > >+	__update_cache(addr, pteval);
> > > >+cache_sync_done:
> > > >+	set_pte(ptep, pteval);
> > > >+}
> > > >+
> > > 
> > > This seems crazy.
> > 
> > Perhaps, but also correct...
> > 
> > > I don't think any other architecture does this type of work in set_pte_at().
> > 
> > Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
> > for architectures that all do the same sort of thing in set_pte_at.
> > 
> > > Can you look into finding a better way?
> > 
> > Not that I can see.
> 
> FYI, this is the original commit message adding set_pte_at().  The commit
> predates Linus' git history but is in the various history trees, for example
> https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=ae3d0a847f4b38812241e4a5dc3371965c752a8c
> 
> Or for your convenience:
> 
> commit ae3d0a847f4b38812241e4a5dc3371965c752a8c
> Author: David S. Miller <davem@nuts.davemloft.net>
> Date:   Tue Feb 22 23:42:56 2005 -0800
> 
>     [MM]: Add set_pte_at() which takes 'mm' and 'addr' args.
>     
>     I'm taking a slightly different approach this time around so things
>     are easier to integrate.  Here is the first patch which builds the
>     infrastructure.  Basically:
>     
>     1) Add set_pte_at() which is set_pte() with 'mm' and 'addr' arguments
>        added.  All generic code uses set_pte_at().
>     
>        Most platforms simply get this define:
>         #define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
>     
>        I chose this method over simply changing all set_pte() call sites
>        because many platforms implement this in assembler and it would
>        take forever to preserve the build and stabilize things if modifying
>        that was necessary.
>     
>        Soon, with platform maintainer's help, we can kill of set_pte() entirely.
>        To be honest, there are only a handful of set_pte() call sites in the
>        arch specific code.
>     
>        Actually, in this patch ppc64 is completely set_pte() free and does not
>        define it.
>     
>     2) pte_clear() gets 'mm' and 'addr' arguments now.
>        This had a cascading effect on many ptep_test_and_*() routines.  Specifically:
>        a) ptep_test_and_clear_{young,dirty}() now take 'vma' and 'address' args.
>        b) ptep_get_and_clear now take 'mm' and 'address' args.
>        c) ptep_mkdirty was deleted, unused by any code.
>        d) ptep_set_wrprotect now takes 'mm' and 'address' args.
>     
>     I've tested this patch as follows:
>     
>     1) compile and run tested on sparc64/SMP
>     2) compile tested on:
>        a) ppc64/SMP
>        b) i386 both with and without PAE enabled
>     
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Which doesn't specify if the function is meant for cache management nor
> is it documented in Documentation/cachetlb.txt.
> 
>   Ralf

Hi Ralf,

It is, however, used in such a way by others & seems to me like the only
correct way to implement the lazy cache flushing. The alternative would
be to adjust all generic code to ensure flush_icache_page gets called
before set_pte_at, which is far more invasive.

Since you mention Documentation/cachetlb.txt that does indicate that
flush_icache_page should be removed some time & prior to this patch
we're making use of it, so I hardly think the content of that document
can be taken as defining what's acceptable.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-03  3:03     ` Leonid Yegoshin
  0 siblings, 0 replies; 45+ messages in thread
From: Leonid Yegoshin @ 2016-03-03  3:03 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Paul Burton wrote:

> It is, however, used in such a way by others & seems to me like the only
> correct way to implement the lazy cache flushing. The alternative would
> be to adjust all generic code to ensure flush_icache_page gets called
> before set_pte_at

... which is an exact case right now. Both calls of flush_icache_page() are:

1) do_swap_page()

         flush_icache_page(vma, page);
         if (pte_swp_soft_dirty(orig_pte))
                 pte = pte_mksoft_dirty(pte);
         set_pte_at(mm, address, page_table, pte);
...

2) do_set_pte()

         flush_icache_page(vma, page);
         entry = mk_pte(page, vma->vm_page_prot);
         if (write)
                 entry = maybe_mkwrite(pte_mkdirty(entry), vma);
         if (anon) {
                 inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
                 page_add_new_anon_rmap(page, vma, address);
         } else {
                 inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
                 page_add_file_rmap(page);
         }
         set_pte_at(vma->vm_mm, address, pte, entry);

         /* no need to invalidate: a not-present page won't be cached */
         update_mmu_cache(vma, address, pte);
....

You should only be sure that flush_icache_page() completes a lazy 
D-cache flush.

- Leonid.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-03  3:03     ` Leonid Yegoshin
  0 siblings, 0 replies; 45+ messages in thread
From: Leonid Yegoshin @ 2016-03-03  3:03 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Paul Burton wrote:

> It is, however, used in such a way by others & seems to me like the only
> correct way to implement the lazy cache flushing. The alternative would
> be to adjust all generic code to ensure flush_icache_page gets called
> before set_pte_at

... which is an exact case right now. Both calls of flush_icache_page() are:

1) do_swap_page()

         flush_icache_page(vma, page);
         if (pte_swp_soft_dirty(orig_pte))
                 pte = pte_mksoft_dirty(pte);
         set_pte_at(mm, address, page_table, pte);
...

2) do_set_pte()

         flush_icache_page(vma, page);
         entry = mk_pte(page, vma->vm_page_prot);
         if (write)
                 entry = maybe_mkwrite(pte_mkdirty(entry), vma);
         if (anon) {
                 inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
                 page_add_new_anon_rmap(page, vma, address);
         } else {
                 inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
                 page_add_file_rmap(page);
         }
         set_pte_at(vma->vm_mm, address, pte, entry);

         /* no need to invalidate: a not-present page won't be cached */
         update_mmu_cache(vma, address, pte);
....

You should only be sure that flush_icache_page() completes a lazy 
D-cache flush.

- Leonid.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 10:37       ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-04 10:37 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Hi Leonid,

On Wed, Mar 02, 2016 at 07:03:35PM -0800, Leonid Yegoshin wrote:
> Paul Burton wrote:
> >It is, however, used in such a way by others & seems to me like the only
> >correct way to implement the lazy cache flushing. The alternative would
> >be to adjust all generic code to ensure flush_icache_page gets called
> >before set_pte_at
> 
> ... which is an exact case right now.

No, it isn't. I included call traces for 2 cases where this is not the
case right in the commit message.

> Both calls of flush_icache_page() are:
> 
> 1) do_swap_page()
> 
>         flush_icache_page(vma, page);
>         if (pte_swp_soft_dirty(orig_pte))
>                 pte = pte_mksoft_dirty(pte);
>         set_pte_at(mm, address, page_table, pte);
> ...
> 
> 2) do_set_pte()
> 
>         flush_icache_page(vma, page);
>         entry = mk_pte(page, vma->vm_page_prot);
>         if (write)
>                 entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>         if (anon) {
>                 inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
>                 page_add_new_anon_rmap(page, vma, address);
>         } else {
>                 inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
>                 page_add_file_rmap(page);
>         }
>         set_pte_at(vma->vm_mm, address, pte, entry);
> 
>         /* no need to invalidate: a not-present page won't be cached */
>         update_mmu_cache(vma, address, pte);
> ....
> 
> You should only be sure that flush_icache_page() completes a lazy D-cache
> flush.
> 
> - Leonid.

Well of course that's fine in those 2 cases. Of course both callers of
flush_icache_page call flush_icache_page - that's a meaningless
tautology. You're grepping for the wrong thing.

The problem is that there are other code paths which dirty pages (ie.
call flush_dcache_page), then get as far as set_pte_at *without* calling
flush_icache_page. Again - I included call traces from 2 paths that hit
this right in the commit message.

So:

  - flush_icache_page isn't always called before we *need* to writeback
    the page from the dcache. This is demonstrably the case (again, see
    the commit message), and causes bugs when using UBIFS on boards
    using the pistachio SoC at least.

  - flush_icache_page is indicated as something that should go away in
    Documentation/cachetlb.txt. Why do you feel we should make use of
    it?

  - Other architectures (at least arm, ia64 & powerpc which may not be
    an exhaustive list) handle this in set_pte_at too. Doing it in the
    same way can only be good for consistency.

So flush_icache_page is insufficient, apparently disliked & not the way
any other architectures solve the exact same problem (again, because it
does *not* cover all cases).

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 10:37       ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-04 10:37 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Hi Leonid,

On Wed, Mar 02, 2016 at 07:03:35PM -0800, Leonid Yegoshin wrote:
> Paul Burton wrote:
> >It is, however, used in such a way by others & seems to me like the only
> >correct way to implement the lazy cache flushing. The alternative would
> >be to adjust all generic code to ensure flush_icache_page gets called
> >before set_pte_at
> 
> ... which is an exact case right now.

No, it isn't. I included call traces for 2 cases where this is not the
case right in the commit message.

> Both calls of flush_icache_page() are:
> 
> 1) do_swap_page()
> 
>         flush_icache_page(vma, page);
>         if (pte_swp_soft_dirty(orig_pte))
>                 pte = pte_mksoft_dirty(pte);
>         set_pte_at(mm, address, page_table, pte);
> ...
> 
> 2) do_set_pte()
> 
>         flush_icache_page(vma, page);
>         entry = mk_pte(page, vma->vm_page_prot);
>         if (write)
>                 entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>         if (anon) {
>                 inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
>                 page_add_new_anon_rmap(page, vma, address);
>         } else {
>                 inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
>                 page_add_file_rmap(page);
>         }
>         set_pte_at(vma->vm_mm, address, pte, entry);
> 
>         /* no need to invalidate: a not-present page won't be cached */
>         update_mmu_cache(vma, address, pte);
> ....
> 
> You should only be sure that flush_icache_page() completes a lazy D-cache
> flush.
> 
> - Leonid.

Well of course that's fine in those 2 cases. Of course both callers of
flush_icache_page call flush_icache_page - that's a meaningless
tautology. You're grepping for the wrong thing.

The problem is that there are other code paths which dirty pages (ie.
call flush_dcache_page), then get as far as set_pte_at *without* calling
flush_icache_page. Again - I included call traces from 2 paths that hit
this right in the commit message.

So:

  - flush_icache_page isn't always called before we *need* to writeback
    the page from the dcache. This is demonstrably the case (again, see
    the commit message), and causes bugs when using UBIFS on boards
    using the pistachio SoC at least.

  - flush_icache_page is indicated as something that should go away in
    Documentation/cachetlb.txt. Why do you feel we should make use of
    it?

  - Other architectures (at least arm, ia64 & powerpc which may not be
    an exhaustive list) handle this in set_pte_at too. Doing it in the
    same way can only be good for consistency.

So flush_icache_page is insufficient, apparently disliked & not the way
any other architectures solve the exact same problem (again, because it
does *not* cover all cases).

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/4] MIPS: Flush dcache for flush_kernel_dcache_page
  2016-03-01  2:37   ` Paul Burton
  (?)
@ 2016-03-04 15:09   ` Lars Persson
  2016-03-29  8:29     ` Ralf Baechle
  -1 siblings, 1 reply; 45+ messages in thread
From: Lars Persson @ 2016-03-04 15:09 UTC (permalink / raw)
  To: Paul Burton; +Cc: linux-mips, Ralf Baechle, linux-kernel



> 1 mars 2016 kl. 03:38 Paul Burton <paul.burton@imgtec.com>:
> 
> The flush_kernel_dcache_page function was previously essentially a nop.
> This is incorrect for MIPS, where if a page has been modified & either
> it aliases or it's executable & the icache doesn't fill from dcache then
> the content needs to be written back from dcache to the next level of
> the cache hierarchy (which is shared with the icache).
> 
> Implement this by simply calling flush_dcache_page, treating this
> kmapped cache flush function (flush_kernel_dcache_page) exactly the same
> as its non-kmapped counterpart (flush_dcache_page).
> 
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> ---
> 
> arch/mips/include/asm/cacheflush.h | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> index 723229f..7e9f468 100644
> --- a/arch/mips/include/asm/cacheflush.h
> +++ b/arch/mips/include/asm/cacheflush.h
> @@ -132,6 +132,7 @@ static inline void kunmap_noncoherent(void)
> static inline void flush_kernel_dcache_page(struct page *page)
> {
>    BUG_ON(cpu_has_dc_aliases && PageHighMem(page));
> +    flush_dcache_page(page);

Should we use instead __flush_dcache_page() that flushes immediately for mapped pages ? Steven J Hill's old patch set for highmem had done it like this.

> }
> 
> /*
> -- 
> 2.7.1
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
  2016-03-04 10:37       ` Paul Burton
  (?)
@ 2016-03-04 15:20       ` Lars Persson
  -1 siblings, 0 replies; 45+ messages in thread
From: Lars Persson @ 2016-03-04 15:20 UTC (permalink / raw)
  To: Paul Burton
  Cc: Leonid Yegoshin, linux-mips, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Hi Paul

I tested your patch set and compared with our in house tree:

Indeed the vanilla kernel cache handling + the first three patches from your set fails booting from UBIFS. Applying the fourth patch resolves the problem.

Now consider this. Our tree runs with Steven J. Hill's original patch set that fixed HIGHMEM for non-DMA block devices and we can boot ok with UBIFS. The cache flushing happens inside flush_icache_page as on the vanilla kernel.

I would suggest that you take a close look on Steven's old patch set and figure out what he did different. You will hopefully find that we can avoid flushing inside set_pte_at.

BR
 Lars

> 4 mars 2016 kl. 11:37 skrev Paul Burton <paul.burton@imgtec.com>:
> 
> Hi Leonid,
> 
>> On Wed, Mar 02, 2016 at 07:03:35PM -0800, Leonid Yegoshin wrote:
>> Paul Burton wrote:
>>> It is, however, used in such a way by others & seems to me like the only
>>> correct way to implement the lazy cache flushing. The alternative would
>>> be to adjust all generic code to ensure flush_icache_page gets called
>>> before set_pte_at
>> 
>> ... which is an exact case right now.
> 
> No, it isn't. I included call traces for 2 cases where this is not the
> case right in the commit message.
> 
>> Both calls of flush_icache_page() are:
>> 
>> 1) do_swap_page()
>> 
>>        flush_icache_page(vma, page);
>>        if (pte_swp_soft_dirty(orig_pte))
>>                pte = pte_mksoft_dirty(pte);
>>        set_pte_at(mm, address, page_table, pte);
>> ...
>> 
>> 2) do_set_pte()
>> 
>>        flush_icache_page(vma, page);
>>        entry = mk_pte(page, vma->vm_page_prot);
>>        if (write)
>>                entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>>        if (anon) {
>>                inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
>>                page_add_new_anon_rmap(page, vma, address);
>>        } else {
>>                inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
>>                page_add_file_rmap(page);
>>        }
>>        set_pte_at(vma->vm_mm, address, pte, entry);
>> 
>>        /* no need to invalidate: a not-present page won't be cached */
>>        update_mmu_cache(vma, address, pte);
>> ....
>> 
>> You should only be sure that flush_icache_page() completes a lazy D-cache
>> flush.
>> 
>> - Leonid.
> 
> Well of course that's fine in those 2 cases. Of course both callers of
> flush_icache_page call flush_icache_page - that's a meaningless
> tautology. You're grepping for the wrong thing.
> 
> The problem is that there are other code paths which dirty pages (ie.
> call flush_dcache_page), then get as far as set_pte_at *without* calling
> flush_icache_page. Again - I included call traces from 2 paths that hit
> this right in the commit message.
> 
> So:
> 
>  - flush_icache_page isn't always called before we *need* to writeback
>    the page from the dcache. This is demonstrably the case (again, see
>    the commit message), and causes bugs when using UBIFS on boards
>    using the pistachio SoC at least.
> 
>  - flush_icache_page is indicated as something that should go away in
>    Documentation/cachetlb.txt. Why do you feel we should make use of
>    it?
> 
>  - Other architectures (at least arm, ia64 & powerpc which may not be
>    an exhaustive list) handle this in set_pte_at too. Doing it in the
>    same way can only be good for consistency.
> 
> So flush_icache_page is insufficient, apparently disliked & not the way
> any other architectures solve the exact same problem (again, because it
> does *not* cover all cases).
> 
> Thanks,
>    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 17:43         ` David Daney
  0 siblings, 0 replies; 45+ messages in thread
From: David Daney @ 2016-03-04 17:43 UTC (permalink / raw)
  To: Paul Burton
  Cc: Ralf Baechle, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On 03/01/2016 09:19 AM, Paul Burton wrote:
> On Tue, Mar 01, 2016 at 09:13:23AM -0800, David Daney wrote:
>> On 02/29/2016 06:37 PM, Paul Burton wrote:
>> [...]
>>> @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>>>   }
>>>   #endif
>>>
>>> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>> +			      pte_t *ptep, pte_t pteval)
>>> +{
>>> +	extern void __update_cache(unsigned long address, pte_t pte);
>>> +
>>> +	if (!pte_present(pteval))
>>> +		goto cache_sync_done;
>>> +
>>> +	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
>>> +		goto cache_sync_done;
>>> +
>>> +	__update_cache(addr, pteval);
>>> +cache_sync_done:
>>> +	set_pte(ptep, pteval);
>>> +}
>>> +
>>
>> This seems crazy.
>
> Perhaps, but also correct...
>
>> I don't think any other architecture does this type of work in set_pte_at().
>
> Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
> for architectures that all do the same sort of thing in set_pte_at.
>
>> Can you look into finding a better way?
>
> Not that I can see.
>
>> What if you ...
>>
>>
>>>   /*
>>>    * (pmds are folded into puds so this doesn't get actually called,
>>>    * but the define is needed for a generic inline function.)
>>> @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>>>
>>>   extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
>>>   	pte_t pte);
>>> -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
>>> -	pte_t pte);
>>>
>>>   static inline void update_mmu_cache(struct vm_area_struct *vma,
>>>   	unsigned long address, pte_t *ptep)
>>>   {
>>>   	pte_t pte = *ptep;
>>>   	__update_tlb(vma, address, pte);
>>> -	__update_cache(vma, address, pte);
>>
>> ... Reversed the order of these two operations?
>
> It would make no difference. The window for the race exists between
> flush_dcache_page & set_pte_at. update_mmu_cache isn't called until
> later than set_pte_at, so cannot possibly avoid the race. The commit
> message walks through where the race exists - I don't think you've read
> it.


I think the code that calls set_pte_at() should be fixed.

If cache maintenance is needed before modifying the page tables, that is 
explicitly done in the calling code.

In migrate.c (remove_migration_pte, similar in do_swap_page) we have:
    .
    .
    .
	flush_dcache_page(new);
	set_pte_at(mm, addr, ptep, pte);
    .
    .
    .

Similar in huge_memory.c (unfreeze_page_vma, freeze_page_vma, etc.)

The point being, the callers have the knowledge about what is changing 
and should make sure they do the right thing to keep the caches 
consistent.  The job of set_pte_at() is to manipulate the page tables, 
nothing else.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 17:43         ` David Daney
  0 siblings, 0 replies; 45+ messages in thread
From: David Daney @ 2016-03-04 17:43 UTC (permalink / raw)
  To: Paul Burton
  Cc: Ralf Baechle, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On 03/01/2016 09:19 AM, Paul Burton wrote:
> On Tue, Mar 01, 2016 at 09:13:23AM -0800, David Daney wrote:
>> On 02/29/2016 06:37 PM, Paul Burton wrote:
>> [...]
>>> @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>>>   }
>>>   #endif
>>>
>>> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>> +			      pte_t *ptep, pte_t pteval)
>>> +{
>>> +	extern void __update_cache(unsigned long address, pte_t pte);
>>> +
>>> +	if (!pte_present(pteval))
>>> +		goto cache_sync_done;
>>> +
>>> +	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
>>> +		goto cache_sync_done;
>>> +
>>> +	__update_cache(addr, pteval);
>>> +cache_sync_done:
>>> +	set_pte(ptep, pteval);
>>> +}
>>> +
>>
>> This seems crazy.
>
> Perhaps, but also correct...
>
>> I don't think any other architecture does this type of work in set_pte_at().
>
> Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
> for architectures that all do the same sort of thing in set_pte_at.
>
>> Can you look into finding a better way?
>
> Not that I can see.
>
>> What if you ...
>>
>>
>>>   /*
>>>    * (pmds are folded into puds so this doesn't get actually called,
>>>    * but the define is needed for a generic inline function.)
>>> @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>>>
>>>   extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
>>>   	pte_t pte);
>>> -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
>>> -	pte_t pte);
>>>
>>>   static inline void update_mmu_cache(struct vm_area_struct *vma,
>>>   	unsigned long address, pte_t *ptep)
>>>   {
>>>   	pte_t pte = *ptep;
>>>   	__update_tlb(vma, address, pte);
>>> -	__update_cache(vma, address, pte);
>>
>> ... Reversed the order of these two operations?
>
> It would make no difference. The window for the race exists between
> flush_dcache_page & set_pte_at. update_mmu_cache isn't called until
> later than set_pte_at, so cannot possibly avoid the race. The commit
> message walks through where the race exists - I don't think you've read
> it.


I think the code that calls set_pte_at() should be fixed.

If cache maintenance is needed before modifying the page tables, that is 
explicitly done in the calling code.

In migrate.c (remove_migration_pte, similar in do_swap_page) we have:
    .
    .
    .
	flush_dcache_page(new);
	set_pte_at(mm, addr, ptep, pte);
    .
    .
    .

Similar in huge_memory.c (unfreeze_page_vma, freeze_page_vma, etc.)

The point being, the callers have the knowledge about what is changing 
and should make sure they do the right thing to keep the caches 
consistent.  The job of set_pte_at() is to manipulate the page tables, 
nothing else.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 17:47           ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-04 17:47 UTC (permalink / raw)
  To: David Daney
  Cc: Ralf Baechle, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Fri, Mar 04, 2016 at 09:43:54AM -0800, David Daney wrote:
> On 03/01/2016 09:19 AM, Paul Burton wrote:
> >On Tue, Mar 01, 2016 at 09:13:23AM -0800, David Daney wrote:
> >>On 02/29/2016 06:37 PM, Paul Burton wrote:
> >>[...]
> >>>@@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> >>>  }
> >>>  #endif
> >>>
> >>>+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> >>>+			      pte_t *ptep, pte_t pteval)
> >>>+{
> >>>+	extern void __update_cache(unsigned long address, pte_t pte);
> >>>+
> >>>+	if (!pte_present(pteval))
> >>>+		goto cache_sync_done;
> >>>+
> >>>+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> >>>+		goto cache_sync_done;
> >>>+
> >>>+	__update_cache(addr, pteval);
> >>>+cache_sync_done:
> >>>+	set_pte(ptep, pteval);
> >>>+}
> >>>+
> >>
> >>This seems crazy.
> >
> >Perhaps, but also correct...
> >
> >>I don't think any other architecture does this type of work in set_pte_at().
> >
> >Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
> >for architectures that all do the same sort of thing in set_pte_at.
> >
> >>Can you look into finding a better way?
> >
> >Not that I can see.
> >
> >>What if you ...
> >>
> >>
> >>>  /*
> >>>   * (pmds are folded into puds so this doesn't get actually called,
> >>>   * but the define is needed for a generic inline function.)
> >>>@@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> >>>
> >>>  extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> >>>  	pte_t pte);
> >>>-extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> >>>-	pte_t pte);
> >>>
> >>>  static inline void update_mmu_cache(struct vm_area_struct *vma,
> >>>  	unsigned long address, pte_t *ptep)
> >>>  {
> >>>  	pte_t pte = *ptep;
> >>>  	__update_tlb(vma, address, pte);
> >>>-	__update_cache(vma, address, pte);
> >>
> >>... Reversed the order of these two operations?
> >
> >It would make no difference. The window for the race exists between
> >flush_dcache_page & set_pte_at. update_mmu_cache isn't called until
> >later than set_pte_at, so cannot possibly avoid the race. The commit
> >message walks through where the race exists - I don't think you've read
> >it.
> 
> 
> I think the code that calls set_pte_at() should be fixed.
> 
> If cache maintenance is needed before modifying the page tables, that is
> explicitly done in the calling code.
> 
> In migrate.c (remove_migration_pte, similar in do_swap_page) we have:
>    .
>    .
>    .
> 	flush_dcache_page(new);
> 	set_pte_at(mm, addr, ptep, pte);
>    .
>    .
>    .
> 
> Similar in huge_memory.c (unfreeze_page_vma, freeze_page_vma, etc.)
> 
> The point being, the callers have the knowledge about what is changing and
> should make sure they do the right thing to keep the caches consistent.  The
> job of set_pte_at() is to manipulate the page tables, nothing else.

...but if we do the flush in flush_dcache_page then we abandon the lazy
flushing.

Why do you want MIPS to be different to every other widely used
architecture that has this problem? set_pte_at clearly is not used only
to manipulate page tables, no matter what you might like.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 17:47           ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-04 17:47 UTC (permalink / raw)
  To: David Daney
  Cc: Ralf Baechle, linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Fri, Mar 04, 2016 at 09:43:54AM -0800, David Daney wrote:
> On 03/01/2016 09:19 AM, Paul Burton wrote:
> >On Tue, Mar 01, 2016 at 09:13:23AM -0800, David Daney wrote:
> >>On 02/29/2016 06:37 PM, Paul Burton wrote:
> >>[...]
> >>>@@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> >>>  }
> >>>  #endif
> >>>
> >>>+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> >>>+			      pte_t *ptep, pte_t pteval)
> >>>+{
> >>>+	extern void __update_cache(unsigned long address, pte_t pte);
> >>>+
> >>>+	if (!pte_present(pteval))
> >>>+		goto cache_sync_done;
> >>>+
> >>>+	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> >>>+		goto cache_sync_done;
> >>>+
> >>>+	__update_cache(addr, pteval);
> >>>+cache_sync_done:
> >>>+	set_pte(ptep, pteval);
> >>>+}
> >>>+
> >>
> >>This seems crazy.
> >
> >Perhaps, but also correct...
> >
> >>I don't think any other architecture does this type of work in set_pte_at().
> >
> >Yes they do. As mentioned in the commit message see arm, ia64 or powerpc
> >for architectures that all do the same sort of thing in set_pte_at.
> >
> >>Can you look into finding a better way?
> >
> >Not that I can see.
> >
> >>What if you ...
> >>
> >>
> >>>  /*
> >>>   * (pmds are folded into puds so this doesn't get actually called,
> >>>   * but the define is needed for a generic inline function.)
> >>>@@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> >>>
> >>>  extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> >>>  	pte_t pte);
> >>>-extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> >>>-	pte_t pte);
> >>>
> >>>  static inline void update_mmu_cache(struct vm_area_struct *vma,
> >>>  	unsigned long address, pte_t *ptep)
> >>>  {
> >>>  	pte_t pte = *ptep;
> >>>  	__update_tlb(vma, address, pte);
> >>>-	__update_cache(vma, address, pte);
> >>
> >>... Reversed the order of these two operations?
> >
> >It would make no difference. The window for the race exists between
> >flush_dcache_page & set_pte_at. update_mmu_cache isn't called until
> >later than set_pte_at, so cannot possibly avoid the race. The commit
> >message walks through where the race exists - I don't think you've read
> >it.
> 
> 
> I think the code that calls set_pte_at() should be fixed.
> 
> If cache maintenance is needed before modifying the page tables, that is
> explicitly done in the calling code.
> 
> In migrate.c (remove_migration_pte, similar in do_swap_page) we have:
>    .
>    .
>    .
> 	flush_dcache_page(new);
> 	set_pte_at(mm, addr, ptep, pte);
>    .
>    .
>    .
> 
> Similar in huge_memory.c (unfreeze_page_vma, freeze_page_vma, etc.)
> 
> The point being, the callers have the knowledge about what is changing and
> should make sure they do the right thing to keep the caches consistent.  The
> job of set_pte_at() is to manipulate the page tables, nothing else.

...but if we do the flush in flush_dcache_page then we abandon the lazy
flushing.

Why do you want MIPS to be different to every other widely used
architecture that has this problem? set_pte_at clearly is not used only
to manipulate page tables, no matter what you might like.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
  2016-03-01  2:37   ` Paul Burton
                     ` (3 preceding siblings ...)
  (?)
@ 2016-03-04 19:02   ` Lars Persson
  2016-03-05  0:21     ` Paul Burton
  -1 siblings, 1 reply; 45+ messages in thread
From: Lars Persson @ 2016-03-04 19:02 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Hi

Some further thoughts on the matter. You have so far not showed a valid example of a race condition. The two examples you give in the commit message are for a _single_ thread existing in the address space (fork and execve).

BR,
 Lars

> 1 mars 2016 kl. 03:39 skrev Paul Burton <paul.burton@imgtec.com>:
> 
> It's possible for pages to become visible prior to update_mmu_cache
> running if a thread within the same address space preempts the current
> thread or runs simultaneously on another CPU. That is, the following
> scenario is possible:
> 
>    CPU0                            CPU1
> 
>    write to page
>    flush_dcache_page
>    flush_icache_page
>    set_pte_at
>                                    map page
>    update_mmu_cache
> 
> If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
> & visible, and update_mmu_cache where the dcache flush occurs then CPU1s
> icache will fill from stale data (unless it fills from the dcache, in
> which case all is good, but most MIPS CPUs don't have this property).
> Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> attempted to fix that by performing the dcache flush in
> flush_icache_page such that it occurs before the set_pte_at call makes
> the page visible. However it has the problem that not all code that
> writes to pages exposed to userland call flush_icache_page. There are
> many callers of set_pte_at under mm/ and only 2 of them do call
> flush_icache_page. Thus the race window between a page becoming visible
> & being coherent between the icache & dcache remains open in some cases.
> 
> To illustrate some of the cases, a WARN was added to __update_cache with
> this patch applied that triggered in cases where a page about to be
> flushed from the dcache was not the last page provided to
> flush_icache_page. That is, backtraces were obtained for cases in which
> the race window is left open without this patch. The 2 standout examples
> follow.
> 
> When forking a process:
> 
> [   15.271842] [<80417630>] __update_cache+0xcc/0x188
> [   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
> [   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
> [   15.289028] [<80429f80>] do_fork+0xe4/0x420
> [   15.293747] [<80413808>] handle_sys+0x128/0x14c
> 
> When exec'ing an ELF binary:
> 
> [   14.445964] [<80417630>] __update_cache+0xcc/0x188
> [   14.451369] [<80538d88>] move_page_tables+0x414/0x498
> [   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
> [   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
> [   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
> [   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
> [   14.480324] [<8055f938>] do_execve+0x38/0x44
> [   14.485137] [<80413808>] handle_sys+0x128/0x14c
> 
> These code paths write into a page, call flush_dcache_page then call
> set_pte_at without flush_icache_page inbetween. The end result is that
> the icache can become corrupted & userland processes may execute
> unexpected or invalid code, typically resulting in a reserved
> instruction exception, a trap or a segfault.
> 
> Fix this race condition fully by performing any cache maintenance
> required to keep the icache & dcache in sync in set_pte_at, before the
> page is made valid. This has the added bonus of ensuring the cache
> maintenance always happens in one location, rather than being duplicated
> in flush_icache_page & update_mmu_cache. It also matches the way other
> architectures solve the same problem (see arm, ia64 & powerpc).
> 
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
> Cc: Lars Persson <lars.persson@axis.com>
> Cc: stable <stable@vger.kernel.org> # v4.1+
> Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> 
> ---
> 
> arch/mips/include/asm/cacheflush.h |  6 ------
> arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
> arch/mips/mm/cache.c               | 19 +++----------------
> 3 files changed, 24 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> index 7e9f468..34ed22e 100644
> --- a/arch/mips/include/asm/cacheflush.h
> +++ b/arch/mips/include/asm/cacheflush.h
> @@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
>    unsigned long start, unsigned long end);
> extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> extern void __flush_dcache_page(struct page *page);
> -extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
> 
> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> static inline void flush_dcache_page(struct page *page)
> @@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
> static inline void flush_icache_page(struct vm_area_struct *vma,
>    struct page *page)
> {
> -    if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
> -        Page_dcache_dirty(page)) {
> -        __flush_icache_page(vma, page);
> -        ClearPageDcacheDirty(page);
> -    }
> }
> 
> extern void (*flush_icache_range)(unsigned long start, unsigned long end);
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index 9a4fe01..65bf2c0 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -127,10 +127,14 @@ do {                                    \
>    }                                \
> } while(0)
> 
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +                  pte_t *ptep, pte_t pteval);
> +
> #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
> 
> #define pte_none(pte)        (!(((pte).pte_high) & ~_PAGE_GLOBAL))
> #define pte_present(pte)    ((pte).pte_low & _PAGE_PRESENT)
> +#define pte_no_exec(pte)    ((pte).pte_low & _PAGE_NO_EXEC)
> 
> static inline void set_pte(pte_t *ptep, pte_t pte)
> {
> @@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>            buddy->pte_high |= _PAGE_GLOBAL;
>    }
> }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> 
> static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> {
> @@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> 
> #define pte_none(pte)        (!(pte_val(pte) & ~_PAGE_GLOBAL))
> #define pte_present(pte)    (pte_val(pte) & _PAGE_PRESENT)
> +#define pte_no_exec(pte)    (pte_val(pte) & _PAGE_NO_EXEC)
> 
> /*
>  * Certain architectures need to do special things when pte's
> @@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>    }
> #endif
> }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> 
> static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> {
> @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> }
> #endif
> 
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +                  pte_t *ptep, pte_t pteval)
> +{
> +    extern void __update_cache(unsigned long address, pte_t pte);
> +
> +    if (!pte_present(pteval))
> +        goto cache_sync_done;
> +
> +    if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> +        goto cache_sync_done;
> +
> +    __update_cache(addr, pteval);
> +cache_sync_done:
> +    set_pte(ptep, pteval);
> +}
> +
> /*
>  * (pmds are folded into puds so this doesn't get actually called,
>  * but the define is needed for a generic inline function.)
> @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> 
> extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
>    pte_t pte);
> -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -    pte_t pte);
> 
> static inline void update_mmu_cache(struct vm_area_struct *vma,
>    unsigned long address, pte_t *ptep)
> {
>    pte_t pte = *ptep;
>    __update_tlb(vma, address, pte);
> -    __update_cache(vma, address, pte);
> }
> 
> static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 8befa55..bf04c6c 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
> 
> EXPORT_SYMBOL(__flush_anon_page);
> 
> -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> -{
> -    unsigned long addr;
> -
> -    if (PageHighMem(page))
> -        return;
> -
> -    addr = (unsigned long) page_address(page);
> -    flush_data_cache_page(addr);
> -}
> -EXPORT_SYMBOL_GPL(__flush_icache_page);
> -
> -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> -    pte_t pte)
> +void __update_cache(unsigned long address, pte_t pte)
> {
>    struct page *page;
>    unsigned long pfn, addr;
> -    int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> +    int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
> 
>    pfn = pte_pfn(pte);
>    if (unlikely(!pfn_valid(pfn)))
>        return;
>    page = pfn_to_page(pfn);
> -    if (page_mapping(page) && Page_dcache_dirty(page)) {
> +    if (Page_dcache_dirty(page)) {
>        if (PageHighMem(page))
>            addr = (unsigned long)kmap_atomic(page);
>        else
> -- 
> 2.7.1
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 21:01         ` Leonid Yegoshin
  0 siblings, 0 replies; 45+ messages in thread
From: Leonid Yegoshin @ 2016-03-04 21:01 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Paul,

On 03/04/2016 02:37 AM, Paul Burton wrote:
> The problem is that there are other code paths which dirty pages (ie.
> call flush_dcache_page), then get as far as set_pte_at *without* calling
> flush_icache_page.

There is no problem with lazy flush of D-cache and PTE update. D-cache 
works in PHYSICAL ADDRESSes and is independent from that is set in PTE 
and TLB. You can even don't flush D-cache during page-fault, keep stuff 
in D-cache and switch any PTE/TLB in any CPU - anything is coherent... 
if we forget for a moment about cache aliasing, non-coherent I-cache and 
DMA.

Cache aliasing is not compatible with SMP (until some public inter-L1 
protocol is changed to accommodate virtual address).

I-cache incoherency is taken into account in flush_icache_page(), 
including a completion of lazy D-cache flush at that point.

DMA ... well it is a complicated issue now but a simplest rule - 
complete all flush before/after DMA and it is done in 
dma_cache_wback_inv/dma_cache_inv (but I intentionally put out of issue 
the speculative access which complicates an issue).

So, you fight a wrong enemy.


>   Again - I included call traces from 2 paths that hit
> this right in the commit message.
>
> So:
>
>    - flush_icache_page isn't always called before we *need* to writeback
>      the page from the dcache. This is demonstrably the case (again, see
>      the commit message), and causes bugs when using UBIFS on boards
>      using the pistachio SoC at least.

You didn't prove that it is because of your model. You referenced to 
flush_icache_page absence in different places but that places don't 
prepare a code page:

     - forking process actually doesn't copy any page but set it 
temporary non-writable, actual copy is done in page fault
     - load_elf_binary doesn't load any code pages and again leaves that 
process to page fault.

>
>    - flush_icache_page is indicated as something that should go away in
>      Documentation/cachetlb.txt. Why do you feel we should make use of
>      it?

I think it is a wrong mark in doc.

I suspect that your problem with UBIFS is because any DMA-ed code page 
should be flushed from DCache during page fault in flush_icache_page. 
Current master branch code does it only if it is "dirty" but I suspect 
that UBIFS may not do it in some place (doesn't mark it as 
"dirty-by-kernel"). Besides that non-DMA-ed pages can be written by 
device driver which has no access to that bit. For this reason in my 
3.10 branch there is a kernel boot parameter option "mips_non_DMA_FS" 
which enforces D-cache flush regardless it is dirty or not.

I again call you to look into 
https://git.linux-mips.org/cgit/yegoshin/mips.git, branch 
android-linux-mti-3.10 to understand that stuff.

- Leonid.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 21:01         ` Leonid Yegoshin
  0 siblings, 0 replies; 45+ messages in thread
From: Leonid Yegoshin @ 2016-03-04 21:01 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Paul,

On 03/04/2016 02:37 AM, Paul Burton wrote:
> The problem is that there are other code paths which dirty pages (ie.
> call flush_dcache_page), then get as far as set_pte_at *without* calling
> flush_icache_page.

There is no problem with lazy flush of D-cache and PTE update. D-cache 
works in PHYSICAL ADDRESSes and is independent from that is set in PTE 
and TLB. You can even don't flush D-cache during page-fault, keep stuff 
in D-cache and switch any PTE/TLB in any CPU - anything is coherent... 
if we forget for a moment about cache aliasing, non-coherent I-cache and 
DMA.

Cache aliasing is not compatible with SMP (until some public inter-L1 
protocol is changed to accommodate virtual address).

I-cache incoherency is taken into account in flush_icache_page(), 
including a completion of lazy D-cache flush at that point.

DMA ... well it is a complicated issue now but a simplest rule - 
complete all flush before/after DMA and it is done in 
dma_cache_wback_inv/dma_cache_inv (but I intentionally put out of issue 
the speculative access which complicates an issue).

So, you fight a wrong enemy.


>   Again - I included call traces from 2 paths that hit
> this right in the commit message.
>
> So:
>
>    - flush_icache_page isn't always called before we *need* to writeback
>      the page from the dcache. This is demonstrably the case (again, see
>      the commit message), and causes bugs when using UBIFS on boards
>      using the pistachio SoC at least.

You didn't prove that it is because of your model. You referenced to 
flush_icache_page absence in different places but that places don't 
prepare a code page:

     - forking process actually doesn't copy any page but set it 
temporary non-writable, actual copy is done in page fault
     - load_elf_binary doesn't load any code pages and again leaves that 
process to page fault.

>
>    - flush_icache_page is indicated as something that should go away in
>      Documentation/cachetlb.txt. Why do you feel we should make use of
>      it?

I think it is a wrong mark in doc.

I suspect that your problem with UBIFS is because any DMA-ed code page 
should be flushed from DCache during page fault in flush_icache_page. 
Current master branch code does it only if it is "dirty" but I suspect 
that UBIFS may not do it in some place (doesn't mark it as 
"dirty-by-kernel"). Besides that non-DMA-ed pages can be written by 
device driver which has no access to that bit. For this reason in my 
3.10 branch there is a kernel boot parameter option "mips_non_DMA_FS" 
which enforces D-cache flush regardless it is dirty or not.

I again call you to look into 
https://git.linux-mips.org/cgit/yegoshin/mips.git, branch 
android-linux-mti-3.10 to understand that stuff.

- Leonid.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 21:21           ` Leonid Yegoshin
  0 siblings, 0 replies; 45+ messages in thread
From: Leonid Yegoshin @ 2016-03-04 21:21 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Correct:

On 03/04/2016 01:01 PM, Leonid Yegoshin wrote:
>
> I suspect that your problem with UBIFS is because any DMA-ed code page 
> should be flushed from DCache during page fault in flush_icache_page.
I suspect that your problem with UBIFS is because any non-DMA-ed code...

Sorry for mistype.

- Leonid.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-04 21:21           ` Leonid Yegoshin
  0 siblings, 0 replies; 45+ messages in thread
From: Leonid Yegoshin @ 2016-03-04 21:21 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

Correct:

On 03/04/2016 01:01 PM, Leonid Yegoshin wrote:
>
> I suspect that your problem with UBIFS is because any DMA-ed code page 
> should be flushed from DCache during page fault in flush_icache_page.
I suspect that your problem with UBIFS is because any non-DMA-ed code...

Sorry for mistype.

- Leonid.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
  2016-03-04 19:02   ` [PATCH 4/4] " Lars Persson
@ 2016-03-05  0:21     ` Paul Burton
  2016-03-05  0:27       ` Paul Burton
  0 siblings, 1 reply; 45+ messages in thread
From: Paul Burton @ 2016-03-05  0:21 UTC (permalink / raw)
  To: Lars Persson
  Cc: linux-mips, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

On Fri, Mar 04, 2016 at 07:02:24PM +0000, Lars Persson wrote:
> Hi
> 
> Some further thoughts on the matter. You have so far not showed a
> valid example of a race condition. The two examples you give in the
> commit message are for a _single_ thread existing in the address space
> (fork and execve).

Hi Lars,

Neither fork nor exec are limited to a single thread existing in the
address space - I'm not sure what you're saying? fork by its very
definition results in 2.

Thanks,
    Paul

> BR,
>  Lars
> 
> > 1 mars 2016 kl. 03:39 skrev Paul Burton <paul.burton@imgtec.com>:
> > 
> > It's possible for pages to become visible prior to update_mmu_cache
> > running if a thread within the same address space preempts the current
> > thread or runs simultaneously on another CPU. That is, the following
> > scenario is possible:
> > 
> >    CPU0                            CPU1
> > 
> >    write to page
> >    flush_dcache_page
> >    flush_icache_page
> >    set_pte_at
> >                                    map page
> >    update_mmu_cache
> > 
> > If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
> > & visible, and update_mmu_cache where the dcache flush occurs then CPU1s
> > icache will fill from stale data (unless it fills from the dcache, in
> > which case all is good, but most MIPS CPUs don't have this property).
> > Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> > attempted to fix that by performing the dcache flush in
> > flush_icache_page such that it occurs before the set_pte_at call makes
> > the page visible. However it has the problem that not all code that
> > writes to pages exposed to userland call flush_icache_page. There are
> > many callers of set_pte_at under mm/ and only 2 of them do call
> > flush_icache_page. Thus the race window between a page becoming visible
> > & being coherent between the icache & dcache remains open in some cases.
> > 
> > To illustrate some of the cases, a WARN was added to __update_cache with
> > this patch applied that triggered in cases where a page about to be
> > flushed from the dcache was not the last page provided to
> > flush_icache_page. That is, backtraces were obtained for cases in which
> > the race window is left open without this patch. The 2 standout examples
> > follow.
> > 
> > When forking a process:
> > 
> > [   15.271842] [<80417630>] __update_cache+0xcc/0x188
> > [   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
> > [   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
> > [   15.289028] [<80429f80>] do_fork+0xe4/0x420
> > [   15.293747] [<80413808>] handle_sys+0x128/0x14c
> > 
> > When exec'ing an ELF binary:
> > 
> > [   14.445964] [<80417630>] __update_cache+0xcc/0x188
> > [   14.451369] [<80538d88>] move_page_tables+0x414/0x498
> > [   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
> > [   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
> > [   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
> > [   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
> > [   14.480324] [<8055f938>] do_execve+0x38/0x44
> > [   14.485137] [<80413808>] handle_sys+0x128/0x14c
> > 
> > These code paths write into a page, call flush_dcache_page then call
> > set_pte_at without flush_icache_page inbetween. The end result is that
> > the icache can become corrupted & userland processes may execute
> > unexpected or invalid code, typically resulting in a reserved
> > instruction exception, a trap or a segfault.
> > 
> > Fix this race condition fully by performing any cache maintenance
> > required to keep the icache & dcache in sync in set_pte_at, before the
> > page is made valid. This has the added bonus of ensuring the cache
> > maintenance always happens in one location, rather than being duplicated
> > in flush_icache_page & update_mmu_cache. It also matches the way other
> > architectures solve the same problem (see arm, ia64 & powerpc).
> > 
> > Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> > Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
> > Cc: Lars Persson <lars.persson@axis.com>
> > Cc: stable <stable@vger.kernel.org> # v4.1+
> > Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> > 
> > ---
> > 
> > arch/mips/include/asm/cacheflush.h |  6 ------
> > arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
> > arch/mips/mm/cache.c               | 19 +++----------------
> > 3 files changed, 24 insertions(+), 27 deletions(-)
> > 
> > diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> > index 7e9f468..34ed22e 100644
> > --- a/arch/mips/include/asm/cacheflush.h
> > +++ b/arch/mips/include/asm/cacheflush.h
> > @@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
> >    unsigned long start, unsigned long end);
> > extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> > extern void __flush_dcache_page(struct page *page);
> > -extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
> > 
> > #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> > static inline void flush_dcache_page(struct page *page)
> > @@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
> > static inline void flush_icache_page(struct vm_area_struct *vma,
> >    struct page *page)
> > {
> > -    if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
> > -        Page_dcache_dirty(page)) {
> > -        __flush_icache_page(vma, page);
> > -        ClearPageDcacheDirty(page);
> > -    }
> > }
> > 
> > extern void (*flush_icache_range)(unsigned long start, unsigned long end);
> > diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> > index 9a4fe01..65bf2c0 100644
> > --- a/arch/mips/include/asm/pgtable.h
> > +++ b/arch/mips/include/asm/pgtable.h
> > @@ -127,10 +127,14 @@ do {                                    \
> >    }                                \
> > } while(0)
> > 
> > +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > +                  pte_t *ptep, pte_t pteval);
> > +
> > #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
> > 
> > #define pte_none(pte)        (!(((pte).pte_high) & ~_PAGE_GLOBAL))
> > #define pte_present(pte)    ((pte).pte_low & _PAGE_PRESENT)
> > +#define pte_no_exec(pte)    ((pte).pte_low & _PAGE_NO_EXEC)
> > 
> > static inline void set_pte(pte_t *ptep, pte_t pte)
> > {
> > @@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
> >            buddy->pte_high |= _PAGE_GLOBAL;
> >    }
> > }
> > -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> > 
> > static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> > {
> > @@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> > 
> > #define pte_none(pte)        (!(pte_val(pte) & ~_PAGE_GLOBAL))
> > #define pte_present(pte)    (pte_val(pte) & _PAGE_PRESENT)
> > +#define pte_no_exec(pte)    (pte_val(pte) & _PAGE_NO_EXEC)
> > 
> > /*
> >  * Certain architectures need to do special things when pte's
> > @@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
> >    }
> > #endif
> > }
> > -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> > 
> > static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> > {
> > @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> > }
> > #endif
> > 
> > +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > +                  pte_t *ptep, pte_t pteval)
> > +{
> > +    extern void __update_cache(unsigned long address, pte_t pte);
> > +
> > +    if (!pte_present(pteval))
> > +        goto cache_sync_done;
> > +
> > +    if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> > +        goto cache_sync_done;
> > +
> > +    __update_cache(addr, pteval);
> > +cache_sync_done:
> > +    set_pte(ptep, pteval);
> > +}
> > +
> > /*
> >  * (pmds are folded into puds so this doesn't get actually called,
> >  * but the define is needed for a generic inline function.)
> > @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> > 
> > extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> >    pte_t pte);
> > -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> > -    pte_t pte);
> > 
> > static inline void update_mmu_cache(struct vm_area_struct *vma,
> >    unsigned long address, pte_t *ptep)
> > {
> >    pte_t pte = *ptep;
> >    __update_tlb(vma, address, pte);
> > -    __update_cache(vma, address, pte);
> > }
> > 
> > static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> > index 8befa55..bf04c6c 100644
> > --- a/arch/mips/mm/cache.c
> > +++ b/arch/mips/mm/cache.c
> > @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
> > 
> > EXPORT_SYMBOL(__flush_anon_page);
> > 
> > -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> > -{
> > -    unsigned long addr;
> > -
> > -    if (PageHighMem(page))
> > -        return;
> > -
> > -    addr = (unsigned long) page_address(page);
> > -    flush_data_cache_page(addr);
> > -}
> > -EXPORT_SYMBOL_GPL(__flush_icache_page);
> > -
> > -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> > -    pte_t pte)
> > +void __update_cache(unsigned long address, pte_t pte)
> > {
> >    struct page *page;
> >    unsigned long pfn, addr;
> > -    int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> > +    int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
> > 
> >    pfn = pte_pfn(pte);
> >    if (unlikely(!pfn_valid(pfn)))
> >        return;
> >    page = pfn_to_page(pfn);
> > -    if (page_mapping(page) && Page_dcache_dirty(page)) {
> > +    if (Page_dcache_dirty(page)) {
> >        if (PageHighMem(page))
> >            addr = (unsigned long)kmap_atomic(page);
> >        else
> > -- 
> > 2.7.1
> > 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at
  2016-03-05  0:21     ` Paul Burton
@ 2016-03-05  0:27       ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-05  0:27 UTC (permalink / raw)
  To: Lars Persson
  Cc: linux-mips, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

On Sat, Mar 05, 2016 at 12:21:54AM +0000, Paul Burton wrote:
> On Fri, Mar 04, 2016 at 07:02:24PM +0000, Lars Persson wrote:
> > Hi
> > 
> > Some further thoughts on the matter. You have so far not showed a
> > valid example of a race condition. The two examples you give in the
> > commit message are for a _single_ thread existing in the address space
> > (fork and execve).
> 
> Hi Lars,
> 
> Neither fork nor exec are limited to a single thread existing in the
> address space - I'm not sure what you're saying? fork by its very
> definition results in 2.

Ok, exec kinda is (it's late...). Still, fork clearly isn't.

Thanks,
    Paul

> Thanks,
>     Paul
> 
> > BR,
> >  Lars
> > 
> > > 1 mars 2016 kl. 03:39 skrev Paul Burton <paul.burton@imgtec.com>:
> > > 
> > > It's possible for pages to become visible prior to update_mmu_cache
> > > running if a thread within the same address space preempts the current
> > > thread or runs simultaneously on another CPU. That is, the following
> > > scenario is possible:
> > > 
> > >    CPU0                            CPU1
> > > 
> > >    write to page
> > >    flush_dcache_page
> > >    flush_icache_page
> > >    set_pte_at
> > >                                    map page
> > >    update_mmu_cache
> > > 
> > > If CPU1 maps the page in between CPU0's set_pte_at, which marks it valid
> > > & visible, and update_mmu_cache where the dcache flush occurs then CPU1s
> > > icache will fill from stale data (unless it fills from the dcache, in
> > > which case all is good, but most MIPS CPUs don't have this property).
> > > Commit 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> > > attempted to fix that by performing the dcache flush in
> > > flush_icache_page such that it occurs before the set_pte_at call makes
> > > the page visible. However it has the problem that not all code that
> > > writes to pages exposed to userland call flush_icache_page. There are
> > > many callers of set_pte_at under mm/ and only 2 of them do call
> > > flush_icache_page. Thus the race window between a page becoming visible
> > > & being coherent between the icache & dcache remains open in some cases.
> > > 
> > > To illustrate some of the cases, a WARN was added to __update_cache with
> > > this patch applied that triggered in cases where a page about to be
> > > flushed from the dcache was not the last page provided to
> > > flush_icache_page. That is, backtraces were obtained for cases in which
> > > the race window is left open without this patch. The 2 standout examples
> > > follow.
> > > 
> > > When forking a process:
> > > 
> > > [   15.271842] [<80417630>] __update_cache+0xcc/0x188
> > > [   15.277274] [<80530394>] copy_page_range+0x56c/0x6ac
> > > [   15.282861] [<8042936c>] copy_process.part.54+0xd40/0x17ac
> > > [   15.289028] [<80429f80>] do_fork+0xe4/0x420
> > > [   15.293747] [<80413808>] handle_sys+0x128/0x14c
> > > 
> > > When exec'ing an ELF binary:
> > > 
> > > [   14.445964] [<80417630>] __update_cache+0xcc/0x188
> > > [   14.451369] [<80538d88>] move_page_tables+0x414/0x498
> > > [   14.457075] [<8055d848>] setup_arg_pages+0x220/0x318
> > > [   14.462685] [<805b0f38>] load_elf_binary+0x530/0x12a0
> > > [   14.468374] [<8055ec3c>] search_binary_handler+0xbc/0x214
> > > [   14.474444] [<8055f6c0>] do_execveat_common+0x43c/0x67c
> > > [   14.480324] [<8055f938>] do_execve+0x38/0x44
> > > [   14.485137] [<80413808>] handle_sys+0x128/0x14c
> > > 
> > > These code paths write into a page, call flush_dcache_page then call
> > > set_pte_at without flush_icache_page inbetween. The end result is that
> > > the icache can become corrupted & userland processes may execute
> > > unexpected or invalid code, typically resulting in a reserved
> > > instruction exception, a trap or a segfault.
> > > 
> > > Fix this race condition fully by performing any cache maintenance
> > > required to keep the icache & dcache in sync in set_pte_at, before the
> > > page is made valid. This has the added bonus of ensuring the cache
> > > maintenance always happens in one location, rather than being duplicated
> > > in flush_icache_page & update_mmu_cache. It also matches the way other
> > > architectures solve the same problem (see arm, ia64 & powerpc).
> > > 
> > > Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> > > Reported-by: Ionela Voinescu <ionela.voinescu@imgtec.com>
> > > Cc: Lars Persson <lars.persson@axis.com>
> > > Cc: stable <stable@vger.kernel.org> # v4.1+
> > > Fixes: 4d46a67a3eb8 ("MIPS: Fix race condition in lazy cache flushing.")
> > > 
> > > ---
> > > 
> > > arch/mips/include/asm/cacheflush.h |  6 ------
> > > arch/mips/include/asm/pgtable.h    | 26 +++++++++++++++++++++-----
> > > arch/mips/mm/cache.c               | 19 +++----------------
> > > 3 files changed, 24 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> > > index 7e9f468..34ed22e 100644
> > > --- a/arch/mips/include/asm/cacheflush.h
> > > +++ b/arch/mips/include/asm/cacheflush.h
> > > @@ -51,7 +51,6 @@ extern void (*flush_cache_range)(struct vm_area_struct *vma,
> > >    unsigned long start, unsigned long end);
> > > extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> > > extern void __flush_dcache_page(struct page *page);
> > > -extern void __flush_icache_page(struct vm_area_struct *vma, struct page *page);
> > > 
> > > #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> > > static inline void flush_dcache_page(struct page *page)
> > > @@ -77,11 +76,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
> > > static inline void flush_icache_page(struct vm_area_struct *vma,
> > >    struct page *page)
> > > {
> > > -    if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
> > > -        Page_dcache_dirty(page)) {
> > > -        __flush_icache_page(vma, page);
> > > -        ClearPageDcacheDirty(page);
> > > -    }
> > > }
> > > 
> > > extern void (*flush_icache_range)(unsigned long start, unsigned long end);
> > > diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> > > index 9a4fe01..65bf2c0 100644
> > > --- a/arch/mips/include/asm/pgtable.h
> > > +++ b/arch/mips/include/asm/pgtable.h
> > > @@ -127,10 +127,14 @@ do {                                    \
> > >    }                                \
> > > } while(0)
> > > 
> > > +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > > +                  pte_t *ptep, pte_t pteval);
> > > +
> > > #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
> > > 
> > > #define pte_none(pte)        (!(((pte).pte_high) & ~_PAGE_GLOBAL))
> > > #define pte_present(pte)    ((pte).pte_low & _PAGE_PRESENT)
> > > +#define pte_no_exec(pte)    ((pte).pte_low & _PAGE_NO_EXEC)
> > > 
> > > static inline void set_pte(pte_t *ptep, pte_t pte)
> > > {
> > > @@ -148,7 +152,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
> > >            buddy->pte_high |= _PAGE_GLOBAL;
> > >    }
> > > }
> > > -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> > > 
> > > static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> > > {
> > > @@ -166,6 +169,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> > > 
> > > #define pte_none(pte)        (!(pte_val(pte) & ~_PAGE_GLOBAL))
> > > #define pte_present(pte)    (pte_val(pte) & _PAGE_PRESENT)
> > > +#define pte_no_exec(pte)    (pte_val(pte) & _PAGE_NO_EXEC)
> > > 
> > > /*
> > >  * Certain architectures need to do special things when pte's
> > > @@ -218,7 +222,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
> > >    }
> > > #endif
> > > }
> > > -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> > > 
> > > static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> > > {
> > > @@ -234,6 +237,22 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
> > > }
> > > #endif
> > > 
> > > +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > > +                  pte_t *ptep, pte_t pteval)
> > > +{
> > > +    extern void __update_cache(unsigned long address, pte_t pte);
> > > +
> > > +    if (!pte_present(pteval))
> > > +        goto cache_sync_done;
> > > +
> > > +    if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> > > +        goto cache_sync_done;
> > > +
> > > +    __update_cache(addr, pteval);
> > > +cache_sync_done:
> > > +    set_pte(ptep, pteval);
> > > +}
> > > +
> > > /*
> > >  * (pmds are folded into puds so this doesn't get actually called,
> > >  * but the define is needed for a generic inline function.)
> > > @@ -430,15 +449,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> > > 
> > > extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> > >    pte_t pte);
> > > -extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
> > > -    pte_t pte);
> > > 
> > > static inline void update_mmu_cache(struct vm_area_struct *vma,
> > >    unsigned long address, pte_t *ptep)
> > > {
> > >    pte_t pte = *ptep;
> > >    __update_tlb(vma, address, pte);
> > > -    __update_cache(vma, address, pte);
> > > }
> > > 
> > > static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > > diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> > > index 8befa55..bf04c6c 100644
> > > --- a/arch/mips/mm/cache.c
> > > +++ b/arch/mips/mm/cache.c
> > > @@ -125,30 +125,17 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
> > > 
> > > EXPORT_SYMBOL(__flush_anon_page);
> > > 
> > > -void __flush_icache_page(struct vm_area_struct *vma, struct page *page)
> > > -{
> > > -    unsigned long addr;
> > > -
> > > -    if (PageHighMem(page))
> > > -        return;
> > > -
> > > -    addr = (unsigned long) page_address(page);
> > > -    flush_data_cache_page(addr);
> > > -}
> > > -EXPORT_SYMBOL_GPL(__flush_icache_page);
> > > -
> > > -void __update_cache(struct vm_area_struct *vma, unsigned long address,
> > > -    pte_t pte)
> > > +void __update_cache(unsigned long address, pte_t pte)
> > > {
> > >    struct page *page;
> > >    unsigned long pfn, addr;
> > > -    int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;
> > > +    int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
> > > 
> > >    pfn = pte_pfn(pte);
> > >    if (unlikely(!pfn_valid(pfn)))
> > >        return;
> > >    page = pfn_to_page(pfn);
> > > -    if (page_mapping(page) && Page_dcache_dirty(page)) {
> > > +    if (Page_dcache_dirty(page)) {
> > >        if (PageHighMem(page))
> > >            addr = (unsigned long)kmap_atomic(page);
> > >        else
> > > -- 
> > > 2.7.1
> > > 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-05  1:06           ` Leonid Yegoshin
  0 siblings, 0 replies; 45+ messages in thread
From: Leonid Yegoshin @ 2016-03-05  1:06 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

On Sat, Mar 05, 2016 at 12:21:54AM +0000, Paul Burton wrote:

> > On Fri, Mar 04, 2016 at 07:02:24PM +0000, Lars Persson wrote:
> > > Hi
> > >
> > > Some further thoughts on the matter. You have so far not showed a
> > > valid example of a race condition. The two examples you give in the
> > > commit message are for a _single_ thread existing in the address space
> > > (fork and execve).
> >
> > Hi Lars,
> >
> > Neither fork nor exec are limited to a single thread existing in the
> > address space - I'm not sure what you're saying? fork by its very
> > definition results in 2.
>
> Ok, exec kinda is (it's late...). Still, fork clearly isn't.
Again - fork doesn't copy any user page. Copy_page_range just 
manipulates PTE tree.

So, no user page cache flushes are needed on MIPS during fork, at least now.

- Leonid.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [4/4] MIPS: Sync icache & dcache in set_pte_at
@ 2016-03-05  1:06           ` Leonid Yegoshin
  0 siblings, 0 replies; 45+ messages in thread
From: Leonid Yegoshin @ 2016-03-05  1:06 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	Steven J. Hill, David Daney, Huacai Chen, Aneesh Kumar K.V,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov,
	Ralf Baechle

On Sat, Mar 05, 2016 at 12:21:54AM +0000, Paul Burton wrote:

> > On Fri, Mar 04, 2016 at 07:02:24PM +0000, Lars Persson wrote:
> > > Hi
> > >
> > > Some further thoughts on the matter. You have so far not showed a
> > > valid example of a race condition. The two examples you give in the
> > > commit message are for a _single_ thread existing in the address space
> > > (fork and execve).
> >
> > Hi Lars,
> >
> > Neither fork nor exec are limited to a single thread existing in the
> > address space - I'm not sure what you're saying? fork by its very
> > definition results in 2.
>
> Ok, exec kinda is (it's late...). Still, fork clearly isn't.
Again - fork doesn't copy any user page. Copy_page_range just 
manipulates PTE tree.

So, no user page cache flushes are needed on MIPS during fork, at least now.

- Leonid.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/4] MIPS: Flush dcache for flush_kernel_dcache_page
  2016-03-04 15:09   ` Lars Persson
@ 2016-03-29  8:29     ` Ralf Baechle
  0 siblings, 0 replies; 45+ messages in thread
From: Ralf Baechle @ 2016-03-29  8:29 UTC (permalink / raw)
  To: Lars Persson; +Cc: Paul Burton, Steven J. Hill, linux-mips, linux-kernel

On Fri, Mar 04, 2016 at 03:09:00PM +0000, Lars Persson wrote:

> > 1 mars 2016 kl. 03:38 Paul Burton <paul.burton@imgtec.com>:
> > 
> > The flush_kernel_dcache_page function was previously essentially a nop.
> > This is incorrect for MIPS, where if a page has been modified & either
> > it aliases or it's executable & the icache doesn't fill from dcache then
> > the content needs to be written back from dcache to the next level of
> > the cache hierarchy (which is shared with the icache).
> > 
> > Implement this by simply calling flush_dcache_page, treating this
> > kmapped cache flush function (flush_kernel_dcache_page) exactly the same
> > as its non-kmapped counterpart (flush_dcache_page).
> > 
> > Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> > ---
> > 
> > arch/mips/include/asm/cacheflush.h | 1 +
> > 1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> > index 723229f..7e9f468 100644
> > --- a/arch/mips/include/asm/cacheflush.h
> > +++ b/arch/mips/include/asm/cacheflush.h
> > @@ -132,6 +132,7 @@ static inline void kunmap_noncoherent(void)
> > static inline void flush_kernel_dcache_page(struct page *page)
> > {
> >    BUG_ON(cpu_has_dc_aliases && PageHighMem(page));
> > +    flush_dcache_page(page);
> 
> Should we use instead __flush_dcache_page() that flushes immediately for mapped pages ? Steven J Hill's old patch set for highmem had done it like this.

Delayed flushing should be ok for lowmem where each page has a permanent
virtual address.  With highmem the temporary address assigned by the kmap_*
function may change so the flush needs to be performed immediately.
Special case highmem without cache aliases - the exact virtual address
doesn't matter, so this should be fine.

Cache flushes are expensive so delaying if possible is always a good thing.

Steven's patches afair were trying to tackle the highmem with aliases case
so an immediately flush was required.

  Ralf

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/4] MIPS: Flush highmem pages in __flush_dcache_page
  2016-03-01  2:37   ` Paul Burton
  (?)
@ 2016-03-29  8:35   ` Ralf Baechle
  2016-03-29  8:55       ` Paul Burton
  -1 siblings, 1 reply; 45+ messages in thread
From: Ralf Baechle @ 2016-03-29  8:35 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, Lars Persson, linux-kernel, Andrew Morton,
	Kirill A. Shutemov

On Tue, Mar 01, 2016 at 02:37:57AM +0000, Paul Burton wrote:

> When flush_dcache_page is called on an executable page, that page is
> about to be provided to userland & we can presume that the icache
> contains no valid entries for its address range. However if the icache
> does not fill from the dcache then we cannot presume that the pages
> content has been written back as far as the memories that the dcache
> will fill from (ie. L2 or further out).
> 
> This was being done for lowmem pages, but not for highmem which can lead
> to icache corruption. Fix this by mapping highmem pages & flushing their
> content from the dcache in __flush_dcache_page before providing the page
> to userland, just as is done for lowmem pages.
> 
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> Cc: Lars Persson <lars.persson@axis.com>
> ---
> 
>  arch/mips/mm/cache.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 3f159ca..5a67d8c 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -16,6 +16,7 @@
>  #include <linux/mm.h>
>  
>  #include <asm/cacheflush.h>
> +#include <asm/highmem.h>
>  #include <asm/processor.h>
>  #include <asm/cpu.h>
>  #include <asm/cpu-features.h>
> @@ -83,8 +84,6 @@ void __flush_dcache_page(struct page *page)
>  	struct address_space *mapping = page_mapping(page);
>  	unsigned long addr;
>  
> -	if (PageHighMem(page))
> -		return;
>  	if (mapping && !mapping_mapped(mapping)) {
>  		SetPageDcacheDirty(page);
>  		return;
> @@ -95,8 +94,15 @@ void __flush_dcache_page(struct page *page)
>  	 * case is for exec env/arg pages and those are %99 certainly going to
>  	 * get faulted into the tlb (and thus flushed) anyways.
>  	 */
> -	addr = (unsigned long) page_address(page);
> +	if (PageHighMem(page))
> +		addr = (unsigned long)kmap_atomic(page);
> +	else
> +		addr = (unsigned long)page_address(page);
> +
>  	flush_data_cache_page(addr);
> +
> +	if (PageHighMem(page))
> +		__kunmap_atomic((void *)addr);
>  }
>  
>  EXPORT_SYMBOL(__flush_dcache_page);

I don't see how this should work with cache aliases.  If the page is unmapped
kmap_atomic will pick a deterministic address only under some circumstances,
kmap won't.  As the result the wrong cache way will be flushed out, I think.

  Ralf

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/4] MIPS: Handle highmem pages in __update_cache
  2016-03-01  2:37   ` Paul Burton
  (?)
@ 2016-03-29  8:39   ` Ralf Baechle
  -1 siblings, 0 replies; 45+ messages in thread
From: Ralf Baechle @ 2016-03-29  8:39 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-mips, Lars Persson, stable # v4 . 1+,
	linux-kernel, Andrew Morton, Jerome Marchand, Kirill A. Shutemov

On Tue, Mar 01, 2016 at 02:37:58AM +0000, Paul Burton wrote:

> The following patch will expose __update_cache to highmem pages. Handle
> them by mapping them in for the duration of the cache maintenance, just
> like in __flush_dcache_page. The code for that isn't shared because we
> need the page address in __update_cache so sharing became messy. Given
> that the entirity is an extra 5 lines, just duplicate it.
> 
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> Cc: Lars Persson <lars.persson@axis.com>
> Cc: stable <stable@vger.kernel.org> # v4.1+
> ---
> 
>  arch/mips/mm/cache.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 5a67d8c..8befa55 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -149,9 +149,17 @@ void __update_cache(struct vm_area_struct *vma, unsigned long address,
>  		return;
>  	page = pfn_to_page(pfn);
>  	if (page_mapping(page) && Page_dcache_dirty(page)) {
> -		addr = (unsigned long) page_address(page);
> +		if (PageHighMem(page))
> +			addr = (unsigned long)kmap_atomic(page);
> +		else
> +			addr = (unsigned long)page_address(page);
> +
>  		if (exec || pages_do_alias(addr, address & PAGE_MASK))
>  			flush_data_cache_page(addr);
> +
> +		if (PageHighMem(page))
> +			__kunmap_atomic((void *)addr);
> +
>  		ClearPageDcacheDirty(page);
>  	}
>  }

Yet again this is betting the house on getting the right virtual address
from kmap_atomic.

  Ralf

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/4] MIPS: Flush highmem pages in __flush_dcache_page
@ 2016-03-29  8:55       ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-29  8:55 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: linux-mips, Lars Persson, linux-kernel, Andrew Morton,
	Kirill A. Shutemov

On Tue, Mar 29, 2016 at 10:35:43AM +0200, Ralf Baechle wrote:
> On Tue, Mar 01, 2016 at 02:37:57AM +0000, Paul Burton wrote:
> 
> > When flush_dcache_page is called on an executable page, that page is
> > about to be provided to userland & we can presume that the icache
> > contains no valid entries for its address range. However if the icache
> > does not fill from the dcache then we cannot presume that the pages
> > content has been written back as far as the memories that the dcache
> > will fill from (ie. L2 or further out).
> > 
> > This was being done for lowmem pages, but not for highmem which can lead
> > to icache corruption. Fix this by mapping highmem pages & flushing their
> > content from the dcache in __flush_dcache_page before providing the page
> > to userland, just as is done for lowmem pages.
> > 
> > Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> > Cc: Lars Persson <lars.persson@axis.com>
> > ---
> > 
> >  arch/mips/mm/cache.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> > index 3f159ca..5a67d8c 100644
> > --- a/arch/mips/mm/cache.c
> > +++ b/arch/mips/mm/cache.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/mm.h>
> >  
> >  #include <asm/cacheflush.h>
> > +#include <asm/highmem.h>
> >  #include <asm/processor.h>
> >  #include <asm/cpu.h>
> >  #include <asm/cpu-features.h>
> > @@ -83,8 +84,6 @@ void __flush_dcache_page(struct page *page)
> >  	struct address_space *mapping = page_mapping(page);
> >  	unsigned long addr;
> >  
> > -	if (PageHighMem(page))
> > -		return;
> >  	if (mapping && !mapping_mapped(mapping)) {
> >  		SetPageDcacheDirty(page);
> >  		return;
> > @@ -95,8 +94,15 @@ void __flush_dcache_page(struct page *page)
> >  	 * case is for exec env/arg pages and those are %99 certainly going to
> >  	 * get faulted into the tlb (and thus flushed) anyways.
> >  	 */
> > -	addr = (unsigned long) page_address(page);
> > +	if (PageHighMem(page))
> > +		addr = (unsigned long)kmap_atomic(page);
> > +	else
> > +		addr = (unsigned long)page_address(page);
> > +
> >  	flush_data_cache_page(addr);
> > +
> > +	if (PageHighMem(page))
> > +		__kunmap_atomic((void *)addr);
> >  }
> >  
> >  EXPORT_SYMBOL(__flush_dcache_page);
> 
> I don't see how this should work with cache aliases.  If the page is unmapped
> kmap_atomic will pick a deterministic address only under some circumstances,
> kmap won't.  As the result the wrong cache way will be flushed out, I think.
> 
>   Ralf

Hi Ralf,

None of the systems I tested this on have cache aliases, and highmem on
systems with cache aliases is currently unsupported to the extent that
we BUG_ON such cases in flush_kernel_dcache_page.

So hopefully you won't require that this code making highmem without
aliases also fix the currently-unsupported highmem with aliases case?

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/4] MIPS: Flush highmem pages in __flush_dcache_page
@ 2016-03-29  8:55       ` Paul Burton
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Burton @ 2016-03-29  8:55 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: linux-mips, Lars Persson, linux-kernel, Andrew Morton,
	Kirill A. Shutemov

On Tue, Mar 29, 2016 at 10:35:43AM +0200, Ralf Baechle wrote:
> On Tue, Mar 01, 2016 at 02:37:57AM +0000, Paul Burton wrote:
> 
> > When flush_dcache_page is called on an executable page, that page is
> > about to be provided to userland & we can presume that the icache
> > contains no valid entries for its address range. However if the icache
> > does not fill from the dcache then we cannot presume that the pages
> > content has been written back as far as the memories that the dcache
> > will fill from (ie. L2 or further out).
> > 
> > This was being done for lowmem pages, but not for highmem which can lead
> > to icache corruption. Fix this by mapping highmem pages & flushing their
> > content from the dcache in __flush_dcache_page before providing the page
> > to userland, just as is done for lowmem pages.
> > 
> > Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> > Cc: Lars Persson <lars.persson@axis.com>
> > ---
> > 
> >  arch/mips/mm/cache.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> > index 3f159ca..5a67d8c 100644
> > --- a/arch/mips/mm/cache.c
> > +++ b/arch/mips/mm/cache.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/mm.h>
> >  
> >  #include <asm/cacheflush.h>
> > +#include <asm/highmem.h>
> >  #include <asm/processor.h>
> >  #include <asm/cpu.h>
> >  #include <asm/cpu-features.h>
> > @@ -83,8 +84,6 @@ void __flush_dcache_page(struct page *page)
> >  	struct address_space *mapping = page_mapping(page);
> >  	unsigned long addr;
> >  
> > -	if (PageHighMem(page))
> > -		return;
> >  	if (mapping && !mapping_mapped(mapping)) {
> >  		SetPageDcacheDirty(page);
> >  		return;
> > @@ -95,8 +94,15 @@ void __flush_dcache_page(struct page *page)
> >  	 * case is for exec env/arg pages and those are %99 certainly going to
> >  	 * get faulted into the tlb (and thus flushed) anyways.
> >  	 */
> > -	addr = (unsigned long) page_address(page);
> > +	if (PageHighMem(page))
> > +		addr = (unsigned long)kmap_atomic(page);
> > +	else
> > +		addr = (unsigned long)page_address(page);
> > +
> >  	flush_data_cache_page(addr);
> > +
> > +	if (PageHighMem(page))
> > +		__kunmap_atomic((void *)addr);
> >  }
> >  
> >  EXPORT_SYMBOL(__flush_dcache_page);
> 
> I don't see how this should work with cache aliases.  If the page is unmapped
> kmap_atomic will pick a deterministic address only under some circumstances,
> kmap won't.  As the result the wrong cache way will be flushed out, I think.
> 
>   Ralf

Hi Ralf,

None of the systems I tested this on have cache aliases, and highmem on
systems with cache aliases is currently unsupported to the extent that
we BUG_ON such cases in flush_kernel_dcache_page.

So hopefully you won't require that this code making highmem without
aliases also fix the currently-unsupported highmem with aliases case?

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-03-29  8:55 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-01  2:37 [PATCH 0/4] MIPS cache & highmem fixes Paul Burton
2016-03-01  2:37 ` Paul Burton
2016-03-01  2:37 ` [PATCH 1/4] MIPS: Flush dcache for flush_kernel_dcache_page Paul Burton
2016-03-01  2:37   ` Paul Burton
2016-03-04 15:09   ` Lars Persson
2016-03-29  8:29     ` Ralf Baechle
2016-03-01  2:37 ` [PATCH 2/4] MIPS: Flush highmem pages in __flush_dcache_page Paul Burton
2016-03-01  2:37   ` Paul Burton
2016-03-29  8:35   ` Ralf Baechle
2016-03-29  8:55     ` Paul Burton
2016-03-29  8:55       ` Paul Burton
2016-03-01  2:37 ` [PATCH 3/4] MIPS: Handle highmem pages in __update_cache Paul Burton
2016-03-01  2:37   ` Paul Burton
2016-03-29  8:39   ` Ralf Baechle
2016-03-01  2:37 ` [PATCH 4/4] MIPS: Sync icache & dcache in set_pte_at Paul Burton
2016-03-01  2:37   ` Paul Burton
2016-03-01  9:44   ` Lars Persson
2016-03-01  9:44     ` Lars Persson
2016-03-01 17:13   ` David Daney
2016-03-01 17:13     ` David Daney
2016-03-01 17:19     ` Paul Burton
2016-03-01 17:19       ` Paul Burton
2016-03-02 14:12       ` Ralf Baechle
2016-03-02 14:24         ` Paul Burton
2016-03-02 14:24           ` Paul Burton
2016-03-04 17:43       ` David Daney
2016-03-04 17:43         ` David Daney
2016-03-04 17:47         ` Paul Burton
2016-03-04 17:47           ` Paul Burton
2016-03-03  3:03   ` [4/4] " Leonid Yegoshin
2016-03-03  3:03     ` Leonid Yegoshin
2016-03-04 10:37     ` Paul Burton
2016-03-04 10:37       ` Paul Burton
2016-03-04 15:20       ` Lars Persson
2016-03-04 21:01       ` Leonid Yegoshin
2016-03-04 21:01         ` Leonid Yegoshin
2016-03-04 21:21         ` Leonid Yegoshin
2016-03-04 21:21           ` Leonid Yegoshin
2016-03-05  1:06         ` Leonid Yegoshin
2016-03-05  1:06           ` Leonid Yegoshin
2016-03-04 19:02   ` [PATCH 4/4] " Lars Persson
2016-03-05  0:21     ` Paul Burton
2016-03-05  0:27       ` Paul Burton
2016-03-02 11:39 ` [PATCH 0/4] MIPS cache & highmem fixes Harvey Hunt
2016-03-02 11:39   ` Harvey Hunt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.