* [PATCH 00/10] Make use of v7 barrier variants in Linux
@ 2013-06-06 14:28 Will Deacon
2013-06-06 14:28 ` [PATCH 01/10] ARM: mm: remove redundant dsb() prior to range TLB invalidation Will Deacon
` (9 more replies)
0 siblings, 10 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
Hello,
This patch series updates our barrier macros to make use of the
different variants introduced by the v7 architecture. This includes
both access type (store vs load/store) and shareability domain. There
is a dependency on my TLB patches, which I have included in the series
and which were most recently posted here:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-May/169124.html
With these patches applied, I see around 5% improvement on hackbench
scores running on my TC2 with both clusters enabled.
Since these changes have subtle memory-ordering implications, I've
avoiding touching any cache-flushing operations or barrier code that is
used during things like CPU suspend/resume, where the CPU coming up/down
might have bits like actlr.smp clear. Maybe this is overkill, but it
reaches the point of diminishing returns if we start having
implementation-specific barrier options, so I've tried to keep it
general.
All feedback welcome,
Will
Will Deacon (10):
ARM: mm: remove redundant dsb() prior to range TLB invalidation
ARM: tlb: don't perform inner-shareable invalidation for local TLB ops
ARM: tlb: don't bother with barriers for branch predictor maintenance
ARM: tlb: don't perform inner-shareable invalidation for local BP ops
ARM: barrier: allow options to be passed to memory barrier
instructions
ARM: spinlock: use inner-shareable dsb variant prior to sev
instruction
ARM: mm: use inner-shareable barriers for TLB and user cache
operations
ARM: tlb: reduce scope of barrier domains for TLB invalidation
ARM: kvm: use inner-shareable barriers after TLB flushing
ARM: mcpm: use -st dsb option prior to sev instructions
arch/arm/common/mcpm_head.S | 2 +-
arch/arm/common/vlock.S | 4 +-
arch/arm/include/asm/assembler.h | 4 +-
arch/arm/include/asm/barrier.h | 32 ++++++------
arch/arm/include/asm/spinlock.h | 2 +-
arch/arm/include/asm/switch_to.h | 10 ++++
arch/arm/include/asm/tlbflush.h | 105 ++++++++++++++++++++++++++++++++-------
arch/arm/kernel/smp_tlb.c | 10 ++--
arch/arm/kvm/init.S | 2 +-
arch/arm/kvm/interrupts.S | 4 +-
arch/arm/mm/cache-v7.S | 4 +-
arch/arm/mm/context.c | 6 +--
arch/arm/mm/dma-mapping.c | 1 -
arch/arm/mm/proc-v7.S | 2 +-
arch/arm/mm/tlb-v7.S | 8 +--
15 files changed, 134 insertions(+), 62 deletions(-)
--
1.8.2.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 01/10] ARM: mm: remove redundant dsb() prior to range TLB invalidation
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-06 14:28 ` [PATCH 02/10] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops Will Deacon
` (8 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
The kernel TLB range invalidation functions already contain dsb
instructions before and after the maintenance, so there is no need to
introduce additional barriers.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/mm/dma-mapping.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index ef3e0f3..5baabf7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -455,7 +455,6 @@ static void __dma_remap(struct page *page, size_t size, pgprot_t prot)
unsigned end = start + size;
apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
- dsb();
flush_tlb_kernel_range(start, end);
}
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 02/10] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
2013-06-06 14:28 ` [PATCH 01/10] ARM: mm: remove redundant dsb() prior to range TLB invalidation Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-13 17:50 ` Jonathan Austin
2013-06-06 14:28 ` [PATCH 03/10] ARM: tlb: don't bother with barriers for branch predictor maintenance Will Deacon
` (7 subsequent siblings)
9 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
Inner-shareable TLB invalidation is typically more expensive than local
(non-shareable) invalidation, so performing the broadcasting for
local_flush_tlb_* operations is a waste of cycles and needlessly
clobbers entries in the TLBs of other CPUs.
This patch introduces __flush_tlb_* versions for many of the TLB
invalidation functions, which only respect inner-shareable variants of
the invalidation instructions. This allows us to modify the v7 SMP TLB
flags to include *both* inner-shareable and non-shareable operations and
then check the relevant flags depending on whether the operation is
local or not.
This gains us around 0.5% in hackbench scores for a dual-core A15, but I
would expect this to improve as more cores (and clusters) are added to
the equation.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/tlbflush.h | 67 ++++++++++++++++++++++++++++++++++++++---
arch/arm/kernel/smp_tlb.c | 8 ++---
arch/arm/mm/context.c | 6 +---
3 files changed, 68 insertions(+), 13 deletions(-)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index a3625d1..55b5e18 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -167,6 +167,8 @@
#endif
#define v7wbi_tlb_flags_smp (TLB_WB | TLB_BARRIER | \
+ TLB_V6_U_FULL | TLB_V6_U_PAGE | \
+ TLB_V6_U_ASID | \
TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | \
TLB_V7_UIS_ASID | TLB_V7_UIS_BP)
#define v7wbi_tlb_flags_up (TLB_WB | TLB_DCLEAN | TLB_BARRIER | \
@@ -330,6 +332,21 @@ static inline void local_flush_tlb_all(void)
tlb_op(TLB_V4_U_FULL | TLB_V6_U_FULL, "c8, c7, 0", zero);
tlb_op(TLB_V4_D_FULL | TLB_V6_D_FULL, "c8, c6, 0", zero);
tlb_op(TLB_V4_I_FULL | TLB_V6_I_FULL, "c8, c5, 0", zero);
+
+ if (tlb_flag(TLB_BARRIER)) {
+ dsb();
+ isb();
+ }
+}
+
+static inline void __flush_tlb_all(void)
+{
+ const int zero = 0;
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
tlb_op(TLB_V7_UIS_FULL, "c8, c3, 0", zero);
if (tlb_flag(TLB_BARRIER)) {
@@ -348,21 +365,32 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
dsb();
if (possible_tlb_flags & (TLB_V4_U_FULL|TLB_V4_D_FULL|TLB_V4_I_FULL)) {
- if (cpumask_test_cpu(get_cpu(), mm_cpumask(mm))) {
+ if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
tlb_op(TLB_V4_U_FULL, "c8, c7, 0", zero);
tlb_op(TLB_V4_D_FULL, "c8, c6, 0", zero);
tlb_op(TLB_V4_I_FULL, "c8, c5, 0", zero);
}
- put_cpu();
}
tlb_op(TLB_V6_U_ASID, "c8, c7, 2", asid);
tlb_op(TLB_V6_D_ASID, "c8, c6, 2", asid);
tlb_op(TLB_V6_I_ASID, "c8, c5, 2", asid);
+
+ if (tlb_flag(TLB_BARRIER))
+ dsb();
+}
+
+static inline void __flush_tlb_mm(struct mm_struct *mm)
+{
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
#ifdef CONFIG_ARM_ERRATA_720789
- tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", zero);
+ tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", 0);
#else
- tlb_op(TLB_V7_UIS_ASID, "c8, c3, 2", asid);
+ tlb_op(TLB_V7_UIS_ASID, "c8, c3, 2", ASID(mm));
#endif
if (tlb_flag(TLB_BARRIER))
@@ -392,6 +420,21 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", uaddr);
tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", uaddr);
tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", uaddr);
+
+ if (tlb_flag(TLB_BARRIER))
+ dsb();
+}
+
+static inline void
+__flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
+{
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
#ifdef CONFIG_ARM_ERRATA_720789
tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 3", uaddr & PAGE_MASK);
#else
@@ -421,6 +464,22 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", kaddr);
tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", kaddr);
tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", kaddr);
+
+ if (tlb_flag(TLB_BARRIER)) {
+ dsb();
+ isb();
+ }
+}
+
+static inline void __flush_tlb_kernel_page(unsigned long kaddr)
+{
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ kaddr &= PAGE_MASK;
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", kaddr);
if (tlb_flag(TLB_BARRIER)) {
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index 9a52a07..cc299b5 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -135,7 +135,7 @@ void flush_tlb_all(void)
if (tlb_ops_need_broadcast())
on_each_cpu(ipi_flush_tlb_all, NULL, 1);
else
- local_flush_tlb_all();
+ __flush_tlb_all();
broadcast_tlb_a15_erratum();
}
@@ -144,7 +144,7 @@ void flush_tlb_mm(struct mm_struct *mm)
if (tlb_ops_need_broadcast())
on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
else
- local_flush_tlb_mm(mm);
+ __flush_tlb_mm(mm);
broadcast_tlb_mm_a15_erratum(mm);
}
@@ -157,7 +157,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
&ta, 1);
} else
- local_flush_tlb_page(vma, uaddr);
+ __flush_tlb_page(vma, uaddr);
broadcast_tlb_mm_a15_erratum(vma->vm_mm);
}
@@ -168,7 +168,7 @@ void flush_tlb_kernel_page(unsigned long kaddr)
ta.ta_start = kaddr;
on_each_cpu(ipi_flush_tlb_kernel_page, &ta, 1);
} else
- local_flush_tlb_kernel_page(kaddr);
+ __flush_tlb_kernel_page(kaddr);
broadcast_tlb_a15_erratum();
}
diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index 2ac3737..62c1ec5 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -134,10 +134,7 @@ static void flush_context(unsigned int cpu)
}
/* Queue a TLB invalidate and flush the I-cache if necessary. */
- if (!tlb_ops_need_broadcast())
- cpumask_set_cpu(cpu, &tlb_flush_pending);
- else
- cpumask_setall(&tlb_flush_pending);
+ cpumask_setall(&tlb_flush_pending);
if (icache_is_vivt_asid_tagged())
__flush_icache_all();
@@ -215,7 +212,6 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending)) {
local_flush_bp_all();
local_flush_tlb_all();
- dummy_flush_tlb_a15_erratum();
}
atomic64_set(&per_cpu(active_asids, cpu), asid);
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 03/10] ARM: tlb: don't bother with barriers for branch predictor maintenance
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
2013-06-06 14:28 ` [PATCH 01/10] ARM: mm: remove redundant dsb() prior to range TLB invalidation Will Deacon
2013-06-06 14:28 ` [PATCH 02/10] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-06 14:28 ` [PATCH 04/10] ARM: tlb: don't perform inner-shareable invalidation for local BP ops Will Deacon
` (6 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
Branch predictor maintenance is only required when we are either
changing the kernel's view of memory (switching tables completely) or
dealing with ASID rollover.
Both of these use-cases require subsequent TLB invalidation, which has
the relevant barrier instructions to ensure completion and visibility
of the maintenance, so this patch removes the instruction barrier from
[local_]flush_bp_all.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/tlbflush.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 55b5e18..e111027 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -488,6 +488,10 @@ static inline void __flush_tlb_kernel_page(unsigned long kaddr)
}
}
+/*
+ * Branch predictor maintenance is paired with full TLB invalidation, so
+ * there is no need for any barriers here.
+ */
static inline void local_flush_bp_all(void)
{
const int zero = 0;
@@ -497,9 +501,6 @@ static inline void local_flush_bp_all(void)
asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero));
else if (tlb_flag(TLB_V6_BP))
asm("mcr p15, 0, %0, c7, c5, 6" : : "r" (zero));
-
- if (tlb_flag(TLB_BARRIER))
- isb();
}
#ifdef CONFIG_ARM_ERRATA_798181
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 04/10] ARM: tlb: don't perform inner-shareable invalidation for local BP ops
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
` (2 preceding siblings ...)
2013-06-06 14:28 ` [PATCH 03/10] ARM: tlb: don't bother with barriers for branch predictor maintenance Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-06 14:28 ` [PATCH 05/10] ARM: barrier: allow options to be passed to memory barrier instructions Will Deacon
` (5 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
Now that the ASID allocator doesn't require inner-shareable maintenance,
we can convert the local_bp_flush_all function to perform only
non-shareable flushing, in a similar manner to the TLB invalidation
routines.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/tlbflush.h | 13 ++++++++++---
arch/arm/kernel/smp_tlb.c | 2 +-
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index e111027..0bdd5d2d 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -168,7 +168,7 @@
#define v7wbi_tlb_flags_smp (TLB_WB | TLB_BARRIER | \
TLB_V6_U_FULL | TLB_V6_U_PAGE | \
- TLB_V6_U_ASID | \
+ TLB_V6_U_ASID | TLB_V6_BP | \
TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | \
TLB_V7_UIS_ASID | TLB_V7_UIS_BP)
#define v7wbi_tlb_flags_up (TLB_WB | TLB_DCLEAN | TLB_BARRIER | \
@@ -492,14 +492,21 @@ static inline void __flush_tlb_kernel_page(unsigned long kaddr)
* Branch predictor maintenance is paired with full TLB invalidation, so
* there is no need for any barriers here.
*/
-static inline void local_flush_bp_all(void)
+static inline void __flush_bp_all(void)
{
const int zero = 0;
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_V7_UIS_BP))
asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero));
- else if (tlb_flag(TLB_V6_BP))
+}
+
+static inline void local_flush_bp_all(void)
+{
+ const int zero = 0;
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ if (tlb_flag(TLB_V6_BP))
asm("mcr p15, 0, %0, c7, c5, 6" : : "r" (zero));
}
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index cc299b5..5cb5500 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -204,5 +204,5 @@ void flush_bp_all(void)
if (tlb_ops_need_broadcast())
on_each_cpu(ipi_flush_bp_all, NULL, 1);
else
- local_flush_bp_all();
+ __flush_bp_all();
}
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 05/10] ARM: barrier: allow options to be passed to memory barrier instructions
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
` (3 preceding siblings ...)
2013-06-06 14:28 ` [PATCH 04/10] ARM: tlb: don't perform inner-shareable invalidation for local BP ops Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-06 14:28 ` [PATCH 06/10] ARM: spinlock: use inner-shareable dsb variant prior to sev instruction Will Deacon
` (4 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
On ARMv7, the memory barrier instructions take an optional `option'
field which can be used to constrain the effects of a memory barrier
based on shareability and access type.
This patch allows the caller to pass these options if required, and
updates the smp_*() barriers to request inner-shareable barriers,
affecting only stores for the _wmb variant. wmb() is also changed to
use the -st version of dsb.
Reported-by: Albin Tonnerre <albin.tonnerre@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/assembler.h | 4 ++--
arch/arm/include/asm/barrier.h | 32 ++++++++++++++++----------------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 05ee9ee..863b280 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -212,9 +212,9 @@
#ifdef CONFIG_SMP
#if __LINUX_ARM_ARCH__ >= 7
.ifeqs "\mode","arm"
- ALT_SMP(dmb)
+ ALT_SMP(dmb ish)
.else
- ALT_SMP(W(dmb))
+ ALT_SMP(W(dmb) ish)
.endif
#elif __LINUX_ARM_ARCH__ == 6
ALT_SMP(mcr p15, 0, r0, c7, c10, 5) @ dmb
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 8dcd9c7..60f15e2 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -14,27 +14,27 @@
#endif
#if __LINUX_ARM_ARCH__ >= 7
-#define isb() __asm__ __volatile__ ("isb" : : : "memory")
-#define dsb() __asm__ __volatile__ ("dsb" : : : "memory")
-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory")
+#define isb(option) __asm__ __volatile__ ("isb " #option : : : "memory")
+#define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory")
+#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6
-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
+#define isb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
: : "r" (0) : "memory")
-#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
+#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
: : "r" (0) : "memory")
-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
+#define dmb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
: : "r" (0) : "memory")
#elif defined(CONFIG_CPU_FA526)
-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
+#define isb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
: : "r" (0) : "memory")
-#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
+#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
: : "r" (0) : "memory")
-#define dmb() __asm__ __volatile__ ("" : : : "memory")
+#define dmb(x) __asm__ __volatile__ ("" : : : "memory")
#else
-#define isb() __asm__ __volatile__ ("" : : : "memory")
-#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
+#define isb(x) __asm__ __volatile__ ("" : : : "memory")
+#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
: : "r" (0) : "memory")
-#define dmb() __asm__ __volatile__ ("" : : : "memory")
+#define dmb(x) __asm__ __volatile__ ("" : : : "memory")
#endif
#ifdef CONFIG_ARCH_HAS_BARRIERS
@@ -42,7 +42,7 @@
#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb() do { dsb(); outer_sync(); } while (0)
#define rmb() dsb()
-#define wmb() mb()
+#define wmb() do { dsb(st); outer_sync(); } while (0)
#else
#define mb() barrier()
#define rmb() barrier()
@@ -54,9 +54,9 @@
#define smp_rmb() barrier()
#define smp_wmb() barrier()
#else
-#define smp_mb() dmb()
-#define smp_rmb() dmb()
-#define smp_wmb() dmb()
+#define smp_mb() dmb(ish)
+#define smp_rmb() smp_mb()
+#define smp_wmb() dmb(ishst)
#endif
#define read_barrier_depends() do { } while(0)
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 06/10] ARM: spinlock: use inner-shareable dsb variant prior to sev instruction
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
` (4 preceding siblings ...)
2013-06-06 14:28 ` [PATCH 05/10] ARM: barrier: allow options to be passed to memory barrier instructions Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-06 14:28 ` [PATCH 07/10] ARM: mm: use inner-shareable barriers for TLB and user cache operations Will Deacon
` (3 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
When unlocking a spinlock, we use the sev instruction to signal other
CPUs waiting on the lock. Since sev is not a memory access instruction,
we require a dsb in order to ensure that the sev is not issued ahead
of the store placing the lock in an unlocked state.
However, as sev is only concerned with other processors in a
multiprocessor system, we can restrict the scope of the preceding dsb
to the inner-shareable domain. Furthermore, we can restrict the scope to
consider only stores, since there are no independent loads on the unlock
path.
A side-effect of this change is that a spin_unlock operation no longer
forces completion of pending TLB invalidation, something which we rely
on when unlocking runqueues to ensure that CPU migration during TLB
maintenance routines doesn't cause us to continue before the operation
has completed.
This patch adds the -ishst suffix to the ARMv7 definition of dsb_sev()
and adds an inner-shareable dsb to the context-switch path when running
a preemptible, SMP, v7 kernel.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/spinlock.h | 2 +-
arch/arm/include/asm/switch_to.h | 10 ++++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 6220e9f..5a0261e 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static inline void dsb_sev(void)
{
#if __LINUX_ARM_ARCH__ >= 7
__asm__ __volatile__ (
- "dsb\n"
+ "dsb ishst\n"
SEV
);
#else
diff --git a/arch/arm/include/asm/switch_to.h b/arch/arm/include/asm/switch_to.h
index fa09e6b..c99e259 100644
--- a/arch/arm/include/asm/switch_to.h
+++ b/arch/arm/include/asm/switch_to.h
@@ -4,6 +4,16 @@
#include <linux/thread_info.h>
/*
+ * For v7 SMP cores running a preemptible kernel we may be pre-empted
+ * during a TLB maintenance operation, so execute an inner-shareable dsb
+ * to ensure that the maintenance completes in case we migrate to another
+ * CPU.
+ */
+#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) && defined(CONFIG_CPU_V7)
+#define finish_arch_switch(prev) dsb(ish)
+#endif
+
+/*
* switch_to(prev, next) should switch from task `prev' to `next'
* `prev' will never be the same as `next'. schedule() itself
* contains the memory barrier to tell GCC not to cache `current'.
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 07/10] ARM: mm: use inner-shareable barriers for TLB and user cache operations
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
` (5 preceding siblings ...)
2013-06-06 14:28 ` [PATCH 06/10] ARM: spinlock: use inner-shareable dsb variant prior to sev instruction Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-06 14:28 ` [PATCH 08/10] ARM: tlb: reduce scope of barrier domains for TLB invalidation Will Deacon
` (2 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
System-wide barriers aren't required for situations where we only need
to make visibility and ordering guarantees in the inner-shareable domain
(i.e. we are not dealing with devices or potentially incoherent CPUs).
This patch changes the v7 TLB operations, coherent_user_range and
dcache_clean_area functions to user inner-shareable barriers. For cache
maintenance, only the store access type is required to ensure completion.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/mm/cache-v7.S | 4 ++--
arch/arm/mm/proc-v7.S | 2 +-
arch/arm/mm/tlb-v7.S | 8 ++++----
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 15451ee..44f5a6a 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -274,7 +274,7 @@ ENTRY(v7_coherent_user_range)
add r12, r12, r2
cmp r12, r1
blo 1b
- dsb
+ dsb ishst
icache_line_size r2, r3
sub r3, r2, #1
bic r12, r0, r3
@@ -286,7 +286,7 @@ ENTRY(v7_coherent_user_range)
mov r0, #0
ALT_SMP(mcr p15, 0, r0, c7, c1, 6) @ invalidate BTB Inner Shareable
ALT_UP(mcr p15, 0, r0, c7, c5, 6) @ invalidate BTB
- dsb
+ dsb ishst
isb
mov pc, lr
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 2c73a73..d19ddc0 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -82,7 +82,7 @@ ENTRY(cpu_v7_dcache_clean_area)
add r0, r0, r2
subs r1, r1, r2
bhi 1b
- dsb
+ dsb ishst
mov pc, lr
ENDPROC(cpu_v7_dcache_clean_area)
diff --git a/arch/arm/mm/tlb-v7.S b/arch/arm/mm/tlb-v7.S
index ea94765..3553087 100644
--- a/arch/arm/mm/tlb-v7.S
+++ b/arch/arm/mm/tlb-v7.S
@@ -35,7 +35,7 @@
ENTRY(v7wbi_flush_user_tlb_range)
vma_vm_mm r3, r2 @ get vma->vm_mm
mmid r3, r3 @ get vm_mm->context.id
- dsb
+ dsb ish
mov r0, r0, lsr #PAGE_SHIFT @ align address
mov r1, r1, lsr #PAGE_SHIFT
asid r3, r3 @ mask ASID
@@ -56,7 +56,7 @@ ENTRY(v7wbi_flush_user_tlb_range)
add r0, r0, #PAGE_SZ
cmp r0, r1
blo 1b
- dsb
+ dsb ish
mov pc, lr
ENDPROC(v7wbi_flush_user_tlb_range)
@@ -69,7 +69,7 @@ ENDPROC(v7wbi_flush_user_tlb_range)
* - end - end address (exclusive, may not be aligned)
*/
ENTRY(v7wbi_flush_kern_tlb_range)
- dsb
+ dsb ish
mov r0, r0, lsr #PAGE_SHIFT @ align address
mov r1, r1, lsr #PAGE_SHIFT
mov r0, r0, lsl #PAGE_SHIFT
@@ -84,7 +84,7 @@ ENTRY(v7wbi_flush_kern_tlb_range)
add r0, r0, #PAGE_SZ
cmp r0, r1
blo 1b
- dsb
+ dsb ish
isb
mov pc, lr
ENDPROC(v7wbi_flush_kern_tlb_range)
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 08/10] ARM: tlb: reduce scope of barrier domains for TLB invalidation
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
` (6 preceding siblings ...)
2013-06-06 14:28 ` [PATCH 07/10] ARM: mm: use inner-shareable barriers for TLB and user cache operations Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-06 14:28 ` [PATCH 09/10] ARM: kvm: use inner-shareable barriers after TLB flushing Will Deacon
2013-06-06 14:28 ` [PATCH 10/10] ARM: mcpm: use -st dsb option prior to sev instructions Will Deacon
9 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
Our TLB invalidation routines may require a barrier before the
maintenance (in order to ensure pending page table writes are visible to
the hardware walker) and barriers afterwards (in order to ensure
completion of the maintenance and visibility in the instruction stream).
Whilst this is expensive, the cost can be reduced somewhat by reducing
the scope of the barrier instructions:
- The barrier before only needs to apply to stores (pte writes)
- Local ops are required only to affect the non-shareable domain
- Global ops are required only to affect the inner-shareable domain
This patch makes these changes for the TLB flushing code.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/tlbflush.h | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 0bdd5d2d..77350a2 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -327,14 +327,14 @@ static inline void local_flush_tlb_all(void)
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(nshst);
tlb_op(TLB_V4_U_FULL | TLB_V6_U_FULL, "c8, c7, 0", zero);
tlb_op(TLB_V4_D_FULL | TLB_V6_D_FULL, "c8, c6, 0", zero);
tlb_op(TLB_V4_I_FULL | TLB_V6_I_FULL, "c8, c5, 0", zero);
if (tlb_flag(TLB_BARRIER)) {
- dsb();
+ dsb(nsh);
isb();
}
}
@@ -345,12 +345,12 @@ static inline void __flush_tlb_all(void)
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
tlb_op(TLB_V7_UIS_FULL, "c8, c3, 0", zero);
if (tlb_flag(TLB_BARRIER)) {
- dsb();
+ dsb(ish);
isb();
}
}
@@ -362,7 +362,7 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(nshst);
if (possible_tlb_flags & (TLB_V4_U_FULL|TLB_V4_D_FULL|TLB_V4_I_FULL)) {
if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
@@ -377,7 +377,7 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
tlb_op(TLB_V6_I_ASID, "c8, c5, 2", asid);
if (tlb_flag(TLB_BARRIER))
- dsb();
+ dsb(nsh);
}
static inline void __flush_tlb_mm(struct mm_struct *mm)
@@ -385,7 +385,7 @@ static inline void __flush_tlb_mm(struct mm_struct *mm)
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
#ifdef CONFIG_ARM_ERRATA_720789
tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", 0);
@@ -394,7 +394,7 @@ static inline void __flush_tlb_mm(struct mm_struct *mm)
#endif
if (tlb_flag(TLB_BARRIER))
- dsb();
+ dsb(ish);
}
static inline void
@@ -406,7 +406,7 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(nshst);
if (possible_tlb_flags & (TLB_V4_U_PAGE|TLB_V4_D_PAGE|TLB_V4_I_PAGE|TLB_V4_I_FULL) &&
cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) {
@@ -422,7 +422,7 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", uaddr);
if (tlb_flag(TLB_BARRIER))
- dsb();
+ dsb(nsh);
}
static inline void
@@ -433,7 +433,7 @@ __flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
#ifdef CONFIG_ARM_ERRATA_720789
tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 3", uaddr & PAGE_MASK);
@@ -442,7 +442,7 @@ __flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
#endif
if (tlb_flag(TLB_BARRIER))
- dsb();
+ dsb(ish);
}
static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
@@ -453,7 +453,7 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
kaddr &= PAGE_MASK;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(nshst);
tlb_op(TLB_V4_U_PAGE, "c8, c7, 1", kaddr);
tlb_op(TLB_V4_D_PAGE, "c8, c6, 1", kaddr);
@@ -466,7 +466,7 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", kaddr);
if (tlb_flag(TLB_BARRIER)) {
- dsb();
+ dsb(nsh);
isb();
}
}
@@ -478,12 +478,12 @@ static inline void __flush_tlb_kernel_page(unsigned long kaddr)
kaddr &= PAGE_MASK;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", kaddr);
if (tlb_flag(TLB_BARRIER)) {
- dsb();
+ dsb(ish);
isb();
}
}
@@ -517,7 +517,7 @@ static inline void dummy_flush_tlb_a15_erratum(void)
* Dummy TLBIMVAIS. Using the unmapped address 0 and ASID 0.
*/
asm("mcr p15, 0, %0, c8, c3, 1" : : "r" (0));
- dsb();
+ dsb(ish);
}
#else
static inline void dummy_flush_tlb_a15_erratum(void)
@@ -546,7 +546,7 @@ static inline void flush_pmd_entry(void *pmd)
tlb_l2_op(TLB_L2CLEAN_FR, "c15, c9, 1 @ L2 flush_pmd", pmd);
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
}
static inline void clean_pmd_entry(void *pmd)
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 09/10] ARM: kvm: use inner-shareable barriers after TLB flushing
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
` (7 preceding siblings ...)
2013-06-06 14:28 ` [PATCH 08/10] ARM: tlb: reduce scope of barrier domains for TLB invalidation Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-06 14:28 ` [PATCH 10/10] ARM: mcpm: use -st dsb option prior to sev instructions Will Deacon
9 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
When flushing the TLB at PL2 in response to remapping at stage-2 or VMID
rollover, we have a dsb instruction to ensure completion of the command
before continuing.
Since we only care about other processors for TLB invalidation, use the
inner-shareable variant of the dsb instruction instead.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/kvm/init.S | 2 +-
arch/arm/kvm/interrupts.S | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index f048338..1b9844d 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -142,7 +142,7 @@ target: @ We're now in the trampoline code, switch page tables
@ Invalidate the old TLBs
mcr p15, 4, r0, c8, c7, 0 @ TLBIALLH
- dsb
+ dsb ish
eret
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index f7793df..dfb5dcc 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -54,7 +54,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
mcrr p15, 6, r2, r3, c2 @ Write VTTBR
isb
mcr p15, 0, r0, c8, c3, 0 @ TLBIALLIS (rt ignored)
- dsb
+ dsb ish
isb
mov r2, #0
mov r3, #0
@@ -78,7 +78,7 @@ ENTRY(__kvm_flush_vm_context)
mcr p15, 4, r0, c8, c3, 4
/* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
mcr p15, 0, r0, c7, c1, 0
- dsb
+ dsb ish
isb @ Not necessary if followed by eret
bx lr
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 10/10] ARM: mcpm: use -st dsb option prior to sev instructions
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
` (8 preceding siblings ...)
2013-06-06 14:28 ` [PATCH 09/10] ARM: kvm: use inner-shareable barriers after TLB flushing Will Deacon
@ 2013-06-06 14:28 ` Will Deacon
2013-06-07 4:15 ` Nicolas Pitre
9 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2013-06-06 14:28 UTC (permalink / raw)
To: linux-arm-kernel
In a similar manner to our spinlock implementation, mcpm uses sev to
wake up cores waiting on a lock when the lock is unlocked. In order to
ensure that the final write unlocking the lock is visible, a dsb
instruction is executed immediately prior to the sev.
This patch changes these dsbs to use the -st option, since we only
require that the store unlocking the lock is made visible.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/common/mcpm_head.S | 2 +-
arch/arm/common/vlock.S | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/common/mcpm_head.S b/arch/arm/common/mcpm_head.S
index 8178705..5cdf619 100644
--- a/arch/arm/common/mcpm_head.S
+++ b/arch/arm/common/mcpm_head.S
@@ -151,7 +151,7 @@ mcpm_setup_leave:
mov r0, #INBOUND_NOT_COMING_UP
strb r0, [r8, #MCPM_SYNC_CLUSTER_INBOUND]
- dsb
+ dsb st
sev
mov r0, r11
diff --git a/arch/arm/common/vlock.S b/arch/arm/common/vlock.S
index ff19858..8b7df28 100644
--- a/arch/arm/common/vlock.S
+++ b/arch/arm/common/vlock.S
@@ -42,7 +42,7 @@
dmb
mov \rscratch, #0
strb \rscratch, [\rbase, \rcpu]
- dsb
+ dsb st
sev
.endm
@@ -102,7 +102,7 @@ ENTRY(vlock_unlock)
dmb
mov r1, #VLOCK_OWNER_NONE
strb r1, [r0, #VLOCK_OWNER_OFFSET]
- dsb
+ dsb st
sev
bx lr
ENDPROC(vlock_unlock)
--
1.8.2.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 10/10] ARM: mcpm: use -st dsb option prior to sev instructions
2013-06-06 14:28 ` [PATCH 10/10] ARM: mcpm: use -st dsb option prior to sev instructions Will Deacon
@ 2013-06-07 4:15 ` Nicolas Pitre
0 siblings, 0 replies; 14+ messages in thread
From: Nicolas Pitre @ 2013-06-07 4:15 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 6 Jun 2013, Will Deacon wrote:
> In a similar manner to our spinlock implementation, mcpm uses sev to
> wake up cores waiting on a lock when the lock is unlocked. In order to
> ensure that the final write unlocking the lock is visible, a dsb
> instruction is executed immediately prior to the sev.
>
> This patch changes these dsbs to use the -st option, since we only
> require that the store unlocking the lock is made visible.
>
> Reviewed-by: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
> ---
> arch/arm/common/mcpm_head.S | 2 +-
> arch/arm/common/vlock.S | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/common/mcpm_head.S b/arch/arm/common/mcpm_head.S
> index 8178705..5cdf619 100644
> --- a/arch/arm/common/mcpm_head.S
> +++ b/arch/arm/common/mcpm_head.S
> @@ -151,7 +151,7 @@ mcpm_setup_leave:
>
> mov r0, #INBOUND_NOT_COMING_UP
> strb r0, [r8, #MCPM_SYNC_CLUSTER_INBOUND]
> - dsb
> + dsb st
> sev
>
> mov r0, r11
> diff --git a/arch/arm/common/vlock.S b/arch/arm/common/vlock.S
> index ff19858..8b7df28 100644
> --- a/arch/arm/common/vlock.S
> +++ b/arch/arm/common/vlock.S
> @@ -42,7 +42,7 @@
> dmb
> mov \rscratch, #0
> strb \rscratch, [\rbase, \rcpu]
> - dsb
> + dsb st
> sev
> .endm
>
> @@ -102,7 +102,7 @@ ENTRY(vlock_unlock)
> dmb
> mov r1, #VLOCK_OWNER_NONE
> strb r1, [r0, #VLOCK_OWNER_OFFSET]
> - dsb
> + dsb st
> sev
> bx lr
> ENDPROC(vlock_unlock)
> --
> 1.8.2.2
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 02/10] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops
2013-06-06 14:28 ` [PATCH 02/10] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops Will Deacon
@ 2013-06-13 17:50 ` Jonathan Austin
2013-06-18 11:32 ` Will Deacon
0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Austin @ 2013-06-13 17:50 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 06/06/13 15:28, Will Deacon wrote:
> Inner-shareable TLB invalidation is typically more expensive than local
> (non-shareable) invalidation, so performing the broadcasting for
> local_flush_tlb_* operations is a waste of cycles and needlessly
> clobbers entries in the TLBs of other CPUs.
>
> This patch introduces __flush_tlb_* versions for many of the TLB
> invalidation functions, which only respect inner-shareable variants of
> the invalidation instructions. This allows us to modify the v7 SMP TLB
> flags to include *both* inner-shareable and non-shareable operations and
> then check the relevant flags depending on whether the operation is
> local or not.
I think this approach leaves us in trouble for some SMP_ON_UP cores as
the IS versions of the instructions don't exist for them.
Is there something that should be ensuring your new __flush_tlb*
functions don't get called for SMP_ON_UP? If not it looks like we might
need to do some runtime patching with the ALT_SMP/ALT_UP macros...
I've commented on one of the example inline below...
>
> This gains us around 0.5% in hackbench scores for a dual-core A15, but I
> would expect this to improve as more cores (and clusters) are added to
> the equation.
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
> arch/arm/include/asm/tlbflush.h | 67 ++++++++++++++++++++++++++++++++++++++---
> arch/arm/kernel/smp_tlb.c | 8 ++---
> arch/arm/mm/context.c | 6 +---
> 3 files changed, 68 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
> index a3625d1..55b5e18 100644
> --- a/arch/arm/include/asm/tlbflush.h
> +++ b/arch/arm/include/asm/tlbflush.h
> @@ -167,6 +167,8 @@
> #endif
>
> #define v7wbi_tlb_flags_smp (TLB_WB | TLB_BARRIER | \
> + TLB_V6_U_FULL | TLB_V6_U_PAGE | \
> + TLB_V6_U_ASID | \
> TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | \
> TLB_V7_UIS_ASID | TLB_V7_UIS_BP)
> #define v7wbi_tlb_flags_up (TLB_WB | TLB_DCLEAN | TLB_BARRIER | \
> @@ -330,6 +332,21 @@ static inline void local_flush_tlb_all(void)
> tlb_op(TLB_V4_U_FULL | TLB_V6_U_FULL, "c8, c7, 0", zero);
> tlb_op(TLB_V4_D_FULL | TLB_V6_D_FULL, "c8, c6, 0", zero);
> tlb_op(TLB_V4_I_FULL | TLB_V6_I_FULL, "c8, c5, 0", zero);
> +
> + if (tlb_flag(TLB_BARRIER)) {
> + dsb();
> + isb();
> + }
> +}
> +
> +static inline void __flush_tlb_all(void)
> +{
> + const int zero = 0;
> + const unsigned int __tlb_flag = __cpu_tlb_flags;
> +
> + if (tlb_flag(TLB_WB))
> + dsb();
> +
> tlb_op(TLB_V7_UIS_FULL, "c8, c3, 0", zero);
I think we can get away with something similar to what we do in the
cache maintenance case here, using ALT_SMP and ALT_UP to do runtime code
patching and use TLB_V6_U_* for the UP case...
A follow on question is whether we still need to keep the *non* unified
TLB maintenance operations (eg DTLBIALL, ITLBIALL). As far as I can see
looking in to old TRMs, the last ARM CPU that didn't automatically treat
those I/D ops to unified ones was ARM10, so not relevant here...
But - do some of the non-ARM cores exploit the (now deprecated) option
to maintain these separately? Also did I miss some more obscure ARM variant?
Jonny
>
> if (tlb_flag(TLB_BARRIER)) {
> @@ -348,21 +365,32 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
> dsb();
>
> if (possible_tlb_flags & (TLB_V4_U_FULL|TLB_V4_D_FULL|TLB_V4_I_FULL)) {
> - if (cpumask_test_cpu(get_cpu(), mm_cpumask(mm))) {
> + if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
> tlb_op(TLB_V4_U_FULL, "c8, c7, 0", zero);
> tlb_op(TLB_V4_D_FULL, "c8, c6, 0", zero);
> tlb_op(TLB_V4_I_FULL, "c8, c5, 0", zero);
> }
> - put_cpu();
> }
>
> tlb_op(TLB_V6_U_ASID, "c8, c7, 2", asid);
> tlb_op(TLB_V6_D_ASID, "c8, c6, 2", asid);
> tlb_op(TLB_V6_I_ASID, "c8, c5, 2", asid);
> +
> + if (tlb_flag(TLB_BARRIER))
> + dsb();
> +}
> +
> +static inline void __flush_tlb_mm(struct mm_struct *mm)
> +{
> + const unsigned int __tlb_flag = __cpu_tlb_flags;
> +
> + if (tlb_flag(TLB_WB))
> + dsb();
> +
> #ifdef CONFIG_ARM_ERRATA_720789
> - tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", zero);
> + tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", 0);
> #else
> - tlb_op(TLB_V7_UIS_ASID, "c8, c3, 2", asid);
> + tlb_op(TLB_V7_UIS_ASID, "c8, c3, 2", ASID(mm));
> #endif
>
> if (tlb_flag(TLB_BARRIER))
> @@ -392,6 +420,21 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
> tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", uaddr);
> tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", uaddr);
> tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", uaddr);
> +
> + if (tlb_flag(TLB_BARRIER))
> + dsb();
> +}
> +
> +static inline void
> +__flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
> +{
> + const unsigned int __tlb_flag = __cpu_tlb_flags;
> +
> + uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
> +
> + if (tlb_flag(TLB_WB))
> + dsb();
> +
> #ifdef CONFIG_ARM_ERRATA_720789
> tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 3", uaddr & PAGE_MASK);
> #else
> @@ -421,6 +464,22 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
> tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", kaddr);
> tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", kaddr);
> tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", kaddr);
> +
> + if (tlb_flag(TLB_BARRIER)) {
> + dsb();
> + isb();
> + }
> +}
> +
> +static inline void __flush_tlb_kernel_page(unsigned long kaddr)
> +{
> + const unsigned int __tlb_flag = __cpu_tlb_flags;
> +
> + kaddr &= PAGE_MASK;
> +
> + if (tlb_flag(TLB_WB))
> + dsb();
> +
> tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", kaddr);
>
> if (tlb_flag(TLB_BARRIER)) {
> diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
> index 9a52a07..cc299b5 100644
> --- a/arch/arm/kernel/smp_tlb.c
> +++ b/arch/arm/kernel/smp_tlb.c
> @@ -135,7 +135,7 @@ void flush_tlb_all(void)
> if (tlb_ops_need_broadcast())
> on_each_cpu(ipi_flush_tlb_all, NULL, 1);
> else
> - local_flush_tlb_all();
> + __flush_tlb_all();
> broadcast_tlb_a15_erratum();
> }
>
> @@ -144,7 +144,7 @@ void flush_tlb_mm(struct mm_struct *mm)
> if (tlb_ops_need_broadcast())
> on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
> else
> - local_flush_tlb_mm(mm);
> + __flush_tlb_mm(mm);
> broadcast_tlb_mm_a15_erratum(mm);
> }
>
> @@ -157,7 +157,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
> on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
> &ta, 1);
> } else
> - local_flush_tlb_page(vma, uaddr);
> + __flush_tlb_page(vma, uaddr);
> broadcast_tlb_mm_a15_erratum(vma->vm_mm);
> }
>
> @@ -168,7 +168,7 @@ void flush_tlb_kernel_page(unsigned long kaddr)
> ta.ta_start = kaddr;
> on_each_cpu(ipi_flush_tlb_kernel_page, &ta, 1);
> } else
> - local_flush_tlb_kernel_page(kaddr);
> + __flush_tlb_kernel_page(kaddr);
> broadcast_tlb_a15_erratum();
> }
>
> diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
> index 2ac3737..62c1ec5 100644
> --- a/arch/arm/mm/context.c
> +++ b/arch/arm/mm/context.c
> @@ -134,10 +134,7 @@ static void flush_context(unsigned int cpu)
> }
>
> /* Queue a TLB invalidate and flush the I-cache if necessary. */
> - if (!tlb_ops_need_broadcast())
> - cpumask_set_cpu(cpu, &tlb_flush_pending);
> - else
> - cpumask_setall(&tlb_flush_pending);
> + cpumask_setall(&tlb_flush_pending);
>
> if (icache_is_vivt_asid_tagged())
> __flush_icache_all();
> @@ -215,7 +212,6 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending)) {
> local_flush_bp_all();
> local_flush_tlb_all();
> - dummy_flush_tlb_a15_erratum();
> }
>
> atomic64_set(&per_cpu(active_asids, cpu), asid);
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 02/10] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops
2013-06-13 17:50 ` Jonathan Austin
@ 2013-06-18 11:32 ` Will Deacon
0 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2013-06-18 11:32 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Jun 13, 2013 at 06:50:03PM +0100, Jonathan Austin wrote:
> Hi Will,
Hi Jonny,
> On 06/06/13 15:28, Will Deacon wrote:
> > Inner-shareable TLB invalidation is typically more expensive than local
> > (non-shareable) invalidation, so performing the broadcasting for
> > local_flush_tlb_* operations is a waste of cycles and needlessly
> > clobbers entries in the TLBs of other CPUs.
> >
> > This patch introduces __flush_tlb_* versions for many of the TLB
> > invalidation functions, which only respect inner-shareable variants of
> > the invalidation instructions. This allows us to modify the v7 SMP TLB
> > flags to include *both* inner-shareable and non-shareable operations and
> > then check the relevant flags depending on whether the operation is
> > local or not.
>
> I think this approach leaves us in trouble for some SMP_ON_UP cores as
> the IS versions of the instructions don't exist for them.
>
> Is there something that should be ensuring your new __flush_tlb*
> functions don't get called for SMP_ON_UP? If not it looks like we might
> need to do some runtime patching with the ALT_SMP/ALT_UP macros...
Well spotted. Actually, the best fix here is no honour the tlb flags we end
up with, since they get patched by the SMP_ON_UP code (which forces
indirection via MULTI_TLB). We should extract the `meat' of the local_ ops
into __local_ops, which can be inlined directly into the non-local variants
without introducing additional barriers around the invalidation operations.
> A follow on question is whether we still need to keep the *non* unified
> TLB maintenance operations (eg DTLBIALL, ITLBIALL). As far as I can see
> looking in to old TRMs, the last ARM CPU that didn't automatically treat
> those I/D ops to unified ones was ARM10, so not relevant here...
My reading of the 1136 TRM is that there are separate micro-tlbs and a
unified main tlb. I don't see any implication that an operation on the
unified tlb automatically applies to both micro-tlbs, but I've not checked
the rtl (the implication holds the other way around).
Will
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-06-18 11:32 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
2013-06-06 14:28 ` [PATCH 01/10] ARM: mm: remove redundant dsb() prior to range TLB invalidation Will Deacon
2013-06-06 14:28 ` [PATCH 02/10] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops Will Deacon
2013-06-13 17:50 ` Jonathan Austin
2013-06-18 11:32 ` Will Deacon
2013-06-06 14:28 ` [PATCH 03/10] ARM: tlb: don't bother with barriers for branch predictor maintenance Will Deacon
2013-06-06 14:28 ` [PATCH 04/10] ARM: tlb: don't perform inner-shareable invalidation for local BP ops Will Deacon
2013-06-06 14:28 ` [PATCH 05/10] ARM: barrier: allow options to be passed to memory barrier instructions Will Deacon
2013-06-06 14:28 ` [PATCH 06/10] ARM: spinlock: use inner-shareable dsb variant prior to sev instruction Will Deacon
2013-06-06 14:28 ` [PATCH 07/10] ARM: mm: use inner-shareable barriers for TLB and user cache operations Will Deacon
2013-06-06 14:28 ` [PATCH 08/10] ARM: tlb: reduce scope of barrier domains for TLB invalidation Will Deacon
2013-06-06 14:28 ` [PATCH 09/10] ARM: kvm: use inner-shareable barriers after TLB flushing Will Deacon
2013-06-06 14:28 ` [PATCH 10/10] ARM: mcpm: use -st dsb option prior to sev instructions Will Deacon
2013-06-07 4:15 ` Nicolas Pitre
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.