* [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU
@ 2015-12-14 13:27 Ashok Kumar
2015-12-14 13:27 ` [RFC PATCH 1/2] arm64: Defer dcache flush in __cpu_copy_user_page Ashok Kumar
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Ashok Kumar @ 2015-12-14 13:27 UTC (permalink / raw)
To: linux-arm-kernel
For keeping I and D coherent, dcache flush till PoU(Point of Unification)
should be sufficient instead of doing till PoC(Point of coherence).
In SoC with more levels of cache, there could be a performance hit in doing
flush till PoC as __flush_dcache_area does both flush and invalidate.
Introduced new API __flush_dcache_area_pou which does only clean till PoU.
Also deferred dcache flush in __cpu_copy_user_page to __sync_icache_dcache.
May I know why I/D sync is needed in __cpu_copy_user_page? My understanding
is that any self modifying code in userspace is supposed to take care of the
coherency using the respective cache flush system call.
Ashok Kumar (2):
arm64: Defer dcache flush in __cpu_copy_user_page
arm64: Use PoU cache instr for I/D coherency
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/mm/cache.S | 22 ++++++++++++++++++++++
arch/arm64/mm/copypage.c | 3 ++-
arch/arm64/mm/flush.c | 13 +++++++++----
4 files changed, 34 insertions(+), 5 deletions(-)
--
2.1.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 1/2] arm64: Defer dcache flush in __cpu_copy_user_page
2015-12-14 13:27 [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU Ashok Kumar
@ 2015-12-14 13:27 ` Ashok Kumar
2015-12-14 15:18 ` Catalin Marinas
2015-12-14 13:27 ` [RFC PATCH 2/2] arm64: Use PoU cache instr for I/D coherency Ashok Kumar
2015-12-14 15:11 ` [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU Catalin Marinas
2 siblings, 1 reply; 9+ messages in thread
From: Ashok Kumar @ 2015-12-14 13:27 UTC (permalink / raw)
To: linux-arm-kernel
Defer dcache flushing to __sync_icache_dcache by calling
flush_dcache_page which clears PG_dcache_clean flag.
Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
---
arch/arm64/mm/copypage.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index 13bbc3be..22e4cb4 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -24,8 +24,9 @@
void __cpu_copy_user_page(void *kto, const void *kfrom, unsigned long vaddr)
{
+ struct page *page = virt_to_page(kto);
copy_page(kto, kfrom);
- __flush_dcache_area(kto, PAGE_SIZE);
+ flush_dcache_page(page);
}
EXPORT_SYMBOL_GPL(__cpu_copy_user_page);
--
2.1.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH 2/2] arm64: Use PoU cache instr for I/D coherency
2015-12-14 13:27 [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU Ashok Kumar
2015-12-14 13:27 ` [RFC PATCH 1/2] arm64: Defer dcache flush in __cpu_copy_user_page Ashok Kumar
@ 2015-12-14 13:27 ` Ashok Kumar
2015-12-14 14:04 ` Mark Rutland
2015-12-14 15:11 ` [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU Catalin Marinas
2 siblings, 1 reply; 9+ messages in thread
From: Ashok Kumar @ 2015-12-14 13:27 UTC (permalink / raw)
To: linux-arm-kernel
In systems with three levels of cache(PoU at L1 and PoC at L3),
PoC cache flush instructions flushes L2 and L3 caches which could affect
performance.
For cache flushes for I and D coherency, PoU should suffice.
So changing all I and D coherency related cache flushes to PoU.
Introduced a new __flush_dcache_area_pou API for dcache flush till PoU
Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
---
arch/arm64/include/asm/cacheflush.h | 1 +
arch/arm64/mm/cache.S | 22 ++++++++++++++++++++++
arch/arm64/mm/flush.c | 13 +++++++++----
3 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index c75b8d0..e4b13f7 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -68,6 +68,7 @@
extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
extern void flush_icache_range(unsigned long start, unsigned long end);
extern void __flush_dcache_area(void *addr, size_t len);
+extern void __flush_dcache_area_pou(void *addr, size_t len);
extern long __flush_cache_user_range(unsigned long start, unsigned long end);
static inline void flush_cache_mm(struct mm_struct *mm)
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index eb48d5d..037293c 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -101,6 +101,28 @@ ENTRY(__flush_dcache_area)
ENDPROC(__flush_dcache_area)
/*
+ * __flush_dcache_area_pou(kaddr, size)
+ *
+ * Ensure that the data held in the page kaddr is written back to the
+ * page in question till Point of Unification.
+ *
+ * - kaddr - kernel address
+ * - size - size in question
+ */
+ENTRY(__flush_dcache_area_pou)
+ dcache_line_size x2, x3
+ add x1, x0, x1
+ sub x3, x2, #1
+ bic x0, x0, x3
+1: dc cvau, x0 // clean D line till PoU
+ add x0, x0, x2
+ cmp x0, x1
+ b.lo 1b
+ dsb sy
+ ret
+ENDPROC(__flush_dcache_area_pou)
+
+/*
* __inval_cache_range(start, end)
* - start - start address of region
* - end - end address of region
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index c26b804..6235af6 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -41,7 +41,7 @@ static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
if (vma->vm_flags & VM_EXEC) {
unsigned long addr = (unsigned long)kaddr;
if (icache_is_aliasing()) {
- __flush_dcache_area(kaddr, len);
+ __flush_dcache_area_pou(kaddr, len);
__flush_icache_all();
} else {
flush_icache_range(addr, addr + len);
@@ -75,9 +75,14 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)
return;
if (!test_and_set_bit(PG_dcache_clean, &page->flags)) {
- __flush_dcache_area(page_address(page),
- PAGE_SIZE << compound_order(page));
- __flush_icache_all();
+ if (icache_is_aliasing()) {
+ __flush_dcache_area_pou(page_address(page),
+ PAGE_SIZE << compound_order(page));
+ __flush_icache_all();
+ } else
+ flush_icache_range(page_address(page),
+ page_address(page) +
+ (PAGE_SIZE << compound_order(page)));
} else if (icache_is_aivivt()) {
__flush_icache_all();
}
--
2.1.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH 2/2] arm64: Use PoU cache instr for I/D coherency
2015-12-14 13:27 ` [RFC PATCH 2/2] arm64: Use PoU cache instr for I/D coherency Ashok Kumar
@ 2015-12-14 14:04 ` Mark Rutland
2015-12-14 16:48 ` Ashok Kumar
0 siblings, 1 reply; 9+ messages in thread
From: Mark Rutland @ 2015-12-14 14:04 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
On Mon, Dec 14, 2015 at 05:27:44AM -0800, Ashok Kumar wrote:
> In systems with three levels of cache(PoU at L1 and PoC at L3),
> PoC cache flush instructions flushes L2 and L3 caches which could affect
> performance.
> For cache flushes for I and D coherency, PoU should suffice.
> So changing all I and D coherency related cache flushes to PoU.
>
> Introduced a new __flush_dcache_area_pou API for dcache flush till PoU
>
> Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
> ---
> arch/arm64/include/asm/cacheflush.h | 1 +
> arch/arm64/mm/cache.S | 22 ++++++++++++++++++++++
> arch/arm64/mm/flush.c | 13 +++++++++----
> 3 files changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index c75b8d0..e4b13f7 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -68,6 +68,7 @@
> extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
> extern void flush_icache_range(unsigned long start, unsigned long end);
> extern void __flush_dcache_area(void *addr, size_t len);
> +extern void __flush_dcache_area_pou(void *addr, size_t len);
> extern long __flush_cache_user_range(unsigned long start, unsigned long end);
>
> static inline void flush_cache_mm(struct mm_struct *mm)
> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index eb48d5d..037293c 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -101,6 +101,28 @@ ENTRY(__flush_dcache_area)
> ENDPROC(__flush_dcache_area)
>
> /*
> + * __flush_dcache_area_pou(kaddr, size)
> + *
> + * Ensure that the data held in the page kaddr is written back to the
> + * page in question till Point of Unification.
> + *
> + * - kaddr - kernel address
> + * - size - size in question
> + */
I think it would be better to call this __clean_dcache_area_pou, to make
it clean that there's no invalidate (i.e. it can only be used to push
data out to the PoU).
> +ENTRY(__flush_dcache_area_pou)
> + dcache_line_size x2, x3
> + add x1, x0, x1
> + sub x3, x2, #1
> + bic x0, x0, x3
> +1: dc cvau, x0 // clean D line till PoU
> + add x0, x0, x2
> + cmp x0, x1
> + b.lo 1b
> + dsb sy
> + ret
> +ENDPROC(__flush_dcache_area_pou)
At the same time we can reduce the domin of that dsb to ish, given all
CPUs will be in the same Inner-Shareable domain.
We could also factor the common logic into a macro, e.g.
/*
* x0 - kaddr
* x1 - size
*/
.macro dcache_by_line_op op, domain
dcache_line_size x2, x3
add x1, x0, x1
sub x3, x2, #1
bic x0, x0, x3
1: dc \op, x0
add x0, x0, x2
cmp x0, x1
b.lo 1b
dsb \domain
.endm
ENTRY(__flush_dcache_area)
dcache_by_line_op civac, sy
ret
ENDPIPROC(__flush_dcache_area)
ENTRY(__clean_dcache_area_pou)
dcache_by_line_op cvau, ish
ret
ENDPIPROC(__clean_dcache_area)
> +
> +/*
> * __inval_cache_range(start, end)
> * - start - start address of region
> * - end - end address of region
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index c26b804..6235af6 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -41,7 +41,7 @@ static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
> if (vma->vm_flags & VM_EXEC) {
> unsigned long addr = (unsigned long)kaddr;
> if (icache_is_aliasing()) {
> - __flush_dcache_area(kaddr, len);
> + __flush_dcache_area_pou(kaddr, len);
> __flush_icache_all();
> } else {
> flush_icache_range(addr, addr + len);
> @@ -75,9 +75,14 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)
> return;
>
> if (!test_and_set_bit(PG_dcache_clean, &page->flags)) {
> - __flush_dcache_area(page_address(page),
> - PAGE_SIZE << compound_order(page));
> - __flush_icache_all();
> + if (icache_is_aliasing()) {
> + __flush_dcache_area_pou(page_address(page),
> + PAGE_SIZE << compound_order(page));
> + __flush_icache_all();
> + } else
> + flush_icache_range(page_address(page),
> + page_address(page) +
> + (PAGE_SIZE << compound_order(page)));
Nit: If one side of an if/else has braces, the other side should too.
Other than those points, this looks good to me.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU
2015-12-14 13:27 [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU Ashok Kumar
2015-12-14 13:27 ` [RFC PATCH 1/2] arm64: Defer dcache flush in __cpu_copy_user_page Ashok Kumar
2015-12-14 13:27 ` [RFC PATCH 2/2] arm64: Use PoU cache instr for I/D coherency Ashok Kumar
@ 2015-12-14 15:11 ` Catalin Marinas
2015-12-14 16:46 ` Ashok Kumar
2 siblings, 1 reply; 9+ messages in thread
From: Catalin Marinas @ 2015-12-14 15:11 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Dec 14, 2015 at 05:27:42AM -0800, Ashok Kumar wrote:
> Also deferred dcache flush in __cpu_copy_user_page to __sync_icache_dcache.
> May I know why I/D sync is needed in __cpu_copy_user_page? My understanding
> is that any self modifying code in userspace is supposed to take care of the
> coherency using the respective cache flush system call.
I think it may have been there for historical reasons on arch/arm
(115b22474eb1 - "ARM: 5794/1: Flush the D-cache during
copy_user_highpage()") and imported in arch/arm64. But looking back at
this, I don't think we need it for two reasons: dynamic linker no longer
relocating symbols in a text page (and cause CoW) and set_pte_at()
already calling __sync_icache_dcache().
--
Catalin
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 1/2] arm64: Defer dcache flush in __cpu_copy_user_page
2015-12-14 13:27 ` [RFC PATCH 1/2] arm64: Defer dcache flush in __cpu_copy_user_page Ashok Kumar
@ 2015-12-14 15:18 ` Catalin Marinas
0 siblings, 0 replies; 9+ messages in thread
From: Catalin Marinas @ 2015-12-14 15:18 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Dec 14, 2015 at 05:27:43AM -0800, Ashok Kumar wrote:
> Defer dcache flushing to __sync_icache_dcache by calling
> flush_dcache_page which clears PG_dcache_clean flag.
>
> Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
> ---
> arch/arm64/mm/copypage.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
> index 13bbc3be..22e4cb4 100644
> --- a/arch/arm64/mm/copypage.c
> +++ b/arch/arm64/mm/copypage.c
> @@ -24,8 +24,9 @@
>
> void __cpu_copy_user_page(void *kto, const void *kfrom, unsigned long vaddr)
> {
> + struct page *page = virt_to_page(kto);
> copy_page(kto, kfrom);
> - __flush_dcache_area(kto, PAGE_SIZE);
> + flush_dcache_page(page);
> }
> EXPORT_SYMBOL_GPL(__cpu_copy_user_page);
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
--
Catalin
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU
2015-12-14 15:11 ` [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU Catalin Marinas
@ 2015-12-14 16:46 ` Ashok Kumar
2015-12-14 17:52 ` Catalin Marinas
0 siblings, 1 reply; 9+ messages in thread
From: Ashok Kumar @ 2015-12-14 16:46 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
Thanks for the review.
On Mon, Dec 14, 2015 at 03:11:18PM +0000, Catalin Marinas wrote:
> On Mon, Dec 14, 2015 at 05:27:42AM -0800, Ashok Kumar wrote:
> > Also deferred dcache flush in __cpu_copy_user_page to __sync_icache_dcache.
> > May I know why I/D sync is needed in __cpu_copy_user_page? My understanding
> > is that any self modifying code in userspace is supposed to take care of the
> > coherency using the respective cache flush system call.
>
> I think it may have been there for historical reasons on arch/arm
> (115b22474eb1 - "ARM: 5794/1: Flush the D-cache during
> copy_user_highpage()") and imported in arch/arm64. But looking back at
> this, I don't think we need it for two reasons: dynamic linker no longer
> relocating symbols in a text page (and cause CoW) and set_pte_at()
> already calling __sync_icache_dcache().
In that case, shall we get rid of the flush in __sync_icache_dcache also
by not calling flush_dcache_page in __cpu_copy_user_page?
>
> --
> Catalin
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 2/2] arm64: Use PoU cache instr for I/D coherency
2015-12-14 14:04 ` Mark Rutland
@ 2015-12-14 16:48 ` Ashok Kumar
0 siblings, 0 replies; 9+ messages in thread
From: Ashok Kumar @ 2015-12-14 16:48 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
Thanks for the review. I will incorporate all the comments and post v2
soon.
On Mon, Dec 14, 2015 at 02:04:46PM +0000, Mark Rutland wrote:
> Hi,
>
> On Mon, Dec 14, 2015 at 05:27:44AM -0800, Ashok Kumar wrote:
> > In systems with three levels of cache(PoU at L1 and PoC at L3),
> > PoC cache flush instructions flushes L2 and L3 caches which could affect
> > performance.
> > For cache flushes for I and D coherency, PoU should suffice.
> > So changing all I and D coherency related cache flushes to PoU.
> >
> > Introduced a new __flush_dcache_area_pou API for dcache flush till PoU
> >
> > Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
> > ---
> > arch/arm64/include/asm/cacheflush.h | 1 +
> > arch/arm64/mm/cache.S | 22 ++++++++++++++++++++++
> > arch/arm64/mm/flush.c | 13 +++++++++----
> > 3 files changed, 32 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> > index c75b8d0..e4b13f7 100644
> > --- a/arch/arm64/include/asm/cacheflush.h
> > +++ b/arch/arm64/include/asm/cacheflush.h
> > @@ -68,6 +68,7 @@
> > extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
> > extern void flush_icache_range(unsigned long start, unsigned long end);
> > extern void __flush_dcache_area(void *addr, size_t len);
> > +extern void __flush_dcache_area_pou(void *addr, size_t len);
> > extern long __flush_cache_user_range(unsigned long start, unsigned long end);
> >
> > static inline void flush_cache_mm(struct mm_struct *mm)
> > diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> > index eb48d5d..037293c 100644
> > --- a/arch/arm64/mm/cache.S
> > +++ b/arch/arm64/mm/cache.S
> > @@ -101,6 +101,28 @@ ENTRY(__flush_dcache_area)
> > ENDPROC(__flush_dcache_area)
> >
> > /*
> > + * __flush_dcache_area_pou(kaddr, size)
> > + *
> > + * Ensure that the data held in the page kaddr is written back to the
> > + * page in question till Point of Unification.
> > + *
> > + * - kaddr - kernel address
> > + * - size - size in question
> > + */
>
> I think it would be better to call this __clean_dcache_area_pou, to make
> it clean that there's no invalidate (i.e. it can only be used to push
> data out to the PoU).
>
> > +ENTRY(__flush_dcache_area_pou)
> > + dcache_line_size x2, x3
> > + add x1, x0, x1
> > + sub x3, x2, #1
> > + bic x0, x0, x3
> > +1: dc cvau, x0 // clean D line till PoU
> > + add x0, x0, x2
> > + cmp x0, x1
> > + b.lo 1b
> > + dsb sy
> > + ret
> > +ENDPROC(__flush_dcache_area_pou)
>
> At the same time we can reduce the domin of that dsb to ish, given all
> CPUs will be in the same Inner-Shareable domain.
>
> We could also factor the common logic into a macro, e.g.
>
> /*
> * x0 - kaddr
> * x1 - size
> */
> .macro dcache_by_line_op op, domain
> dcache_line_size x2, x3
> add x1, x0, x1
> sub x3, x2, #1
> bic x0, x0, x3
> 1: dc \op, x0
> add x0, x0, x2
> cmp x0, x1
> b.lo 1b
> dsb \domain
> .endm
>
> ENTRY(__flush_dcache_area)
> dcache_by_line_op civac, sy
> ret
> ENDPIPROC(__flush_dcache_area)
>
> ENTRY(__clean_dcache_area_pou)
> dcache_by_line_op cvau, ish
> ret
> ENDPIPROC(__clean_dcache_area)
>
> > +
> > +/*
> > * __inval_cache_range(start, end)
> > * - start - start address of region
> > * - end - end address of region
> > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> > index c26b804..6235af6 100644
> > --- a/arch/arm64/mm/flush.c
> > +++ b/arch/arm64/mm/flush.c
> > @@ -41,7 +41,7 @@ static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
> > if (vma->vm_flags & VM_EXEC) {
> > unsigned long addr = (unsigned long)kaddr;
> > if (icache_is_aliasing()) {
> > - __flush_dcache_area(kaddr, len);
> > + __flush_dcache_area_pou(kaddr, len);
> > __flush_icache_all();
> > } else {
> > flush_icache_range(addr, addr + len);
> > @@ -75,9 +75,14 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)
> > return;
> >
> > if (!test_and_set_bit(PG_dcache_clean, &page->flags)) {
> > - __flush_dcache_area(page_address(page),
> > - PAGE_SIZE << compound_order(page));
> > - __flush_icache_all();
> > + if (icache_is_aliasing()) {
> > + __flush_dcache_area_pou(page_address(page),
> > + PAGE_SIZE << compound_order(page));
> > + __flush_icache_all();
> > + } else
> > + flush_icache_range(page_address(page),
> > + page_address(page) +
> > + (PAGE_SIZE << compound_order(page)));
>
> Nit: If one side of an if/else has braces, the other side should too.
>
> Other than those points, this looks good to me.
>
> Thanks,
> Mark.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU
2015-12-14 16:46 ` Ashok Kumar
@ 2015-12-14 17:52 ` Catalin Marinas
0 siblings, 0 replies; 9+ messages in thread
From: Catalin Marinas @ 2015-12-14 17:52 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Dec 14, 2015 at 08:46:03AM -0800, Ashok Kumar wrote:
> On Mon, Dec 14, 2015 at 03:11:18PM +0000, Catalin Marinas wrote:
> > On Mon, Dec 14, 2015 at 05:27:42AM -0800, Ashok Kumar wrote:
> > > Also deferred dcache flush in __cpu_copy_user_page to __sync_icache_dcache.
> > > May I know why I/D sync is needed in __cpu_copy_user_page? My understanding
> > > is that any self modifying code in userspace is supposed to take care of the
> > > coherency using the respective cache flush system call.
> >
> > I think it may have been there for historical reasons on arch/arm
> > (115b22474eb1 - "ARM: 5794/1: Flush the D-cache during
> > copy_user_highpage()") and imported in arch/arm64. But looking back at
> > this, I don't think we need it for two reasons: dynamic linker no longer
> > relocating symbols in a text page (and cause CoW) and set_pte_at()
> > already calling __sync_icache_dcache().
>
> In that case, shall we get rid of the flush in __sync_icache_dcache also
> by not calling flush_dcache_page in __cpu_copy_user_page?
We should still keep it. Assuming that we get CoW mostly on no-exec
pages, the penalty should be minimal since set_pte_at() only cares about
executable mappings. But I can't guarantee that user space doesn't do
any CoW on executable mappings. We could for example have some JIT'ed
code that forked and the child changes some literal pool data (not
instructions) causing a CoW. It would not expect the I/D caches to
become incoherent.
--
Catalin
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-12-14 17:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-14 13:27 [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU Ashok Kumar
2015-12-14 13:27 ` [RFC PATCH 1/2] arm64: Defer dcache flush in __cpu_copy_user_page Ashok Kumar
2015-12-14 15:18 ` Catalin Marinas
2015-12-14 13:27 ` [RFC PATCH 2/2] arm64: Use PoU cache instr for I/D coherency Ashok Kumar
2015-12-14 14:04 ` Mark Rutland
2015-12-14 16:48 ` Ashok Kumar
2015-12-14 15:11 ` [RFC PATCH 0/2] arm64: change PoC D-cache flush to PoU Catalin Marinas
2015-12-14 16:46 ` Ashok Kumar
2015-12-14 17:52 ` Catalin Marinas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.