* [PATCH v7 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller @ 2021-02-02 7:16 ` Zhen Lei 0 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei v6 --> v7: 1. Change all readl()/writel() to _relaxed(), add the corresponding description to the code. 2. Delete the unnecessary spinlock protection in l3cache_init(). v5 --> v6: 1. Use raw_spin_lock_irqsave() instead of spin_lock_irqsave() 2. Move the macros defined in cache-kunpeng-l3.h into cache-kunpeng-l3.c, and delete that header file. 3. In some places, replace readl()/writel() with readl_relaxed()/writel_relaxed() to improve performance without affecting functions. 4. Returns 0 instead of an error code when Kunpeng L3 Cache matching failed. Thank you for Arnd's review comments and Russell's help. v4 --> v5: 1. Add SoC macro ARCH_KUNPENG50X, and the Kunpeng L3 cache controller only enabled on that platform. 2. Require the compatible string of the Kunpeng L3 cache controller must have a relevant name on a specific SoC. For example: compatible = "hisilicon,kunpeng509-l3cache", "hisilicon,kunpeng-l3cache"; v3 --> v4: Rename the compatible string from "hisilicon,l3cache" to "hisilicon,kunpeng-l3cache". Then adjust the file name, configuration option name, and description accordingly. v2 --> v3: Add Hisilicon L3 cache controller driver and its document. That's: patch 2-3. v1 --> v2: Discard the middle-tier functions and do silent narrowing cast in the outcache hook functions. For example: -static void l2c220_inv_range(unsigned long start, unsigned long end) +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; v1: Do cast phys_addr_t to unsigned long by adding a middle-tier function. For example: -static void l2c220_inv_range(unsigned long start, unsigned long end) +static void __l2c220_inv_range(unsigned long start, unsigned long end) { ... } +static void l2c220_inv_range(phys_addr_t start, phys_addr_t end) +{ + __l2c220_inv_range(start, end); +} Zhen Lei (4): ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks ARM: hisi: add support for Kunpeng50x SoC dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller ARM: Add support for Hisilicon Kunpeng L3 cache controller .../arm/hisilicon/kunpeng-l3cache.yaml | 40 ++++ arch/arm/include/asm/outercache.h | 6 +- arch/arm/mach-hisi/Kconfig | 6 + arch/arm/mm/Kconfig | 10 + arch/arm/mm/Makefile | 1 + arch/arm/mm/cache-feroceon-l2.c | 15 +- arch/arm/mm/cache-kunpeng-l3.c | 178 ++++++++++++++++++ arch/arm/mm/cache-l2x0.c | 50 +++-- arch/arm/mm/cache-tauros2.c | 15 +- arch/arm/mm/cache-uniphier.c | 6 +- arch/arm/mm/cache-xsc3l2.c | 12 +- 11 files changed, 310 insertions(+), 29 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml create mode 100644 arch/arm/mm/cache-kunpeng-l3.c -- 2.26.0.106.g9fadedd ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v7 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller @ 2021-02-02 7:16 ` Zhen Lei 0 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei v6 --> v7: 1. Change all readl()/writel() to _relaxed(), add the corresponding description to the code. 2. Delete the unnecessary spinlock protection in l3cache_init(). v5 --> v6: 1. Use raw_spin_lock_irqsave() instead of spin_lock_irqsave() 2. Move the macros defined in cache-kunpeng-l3.h into cache-kunpeng-l3.c, and delete that header file. 3. In some places, replace readl()/writel() with readl_relaxed()/writel_relaxed() to improve performance without affecting functions. 4. Returns 0 instead of an error code when Kunpeng L3 Cache matching failed. Thank you for Arnd's review comments and Russell's help. v4 --> v5: 1. Add SoC macro ARCH_KUNPENG50X, and the Kunpeng L3 cache controller only enabled on that platform. 2. Require the compatible string of the Kunpeng L3 cache controller must have a relevant name on a specific SoC. For example: compatible = "hisilicon,kunpeng509-l3cache", "hisilicon,kunpeng-l3cache"; v3 --> v4: Rename the compatible string from "hisilicon,l3cache" to "hisilicon,kunpeng-l3cache". Then adjust the file name, configuration option name, and description accordingly. v2 --> v3: Add Hisilicon L3 cache controller driver and its document. That's: patch 2-3. v1 --> v2: Discard the middle-tier functions and do silent narrowing cast in the outcache hook functions. For example: -static void l2c220_inv_range(unsigned long start, unsigned long end) +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; v1: Do cast phys_addr_t to unsigned long by adding a middle-tier function. For example: -static void l2c220_inv_range(unsigned long start, unsigned long end) +static void __l2c220_inv_range(unsigned long start, unsigned long end) { ... } +static void l2c220_inv_range(phys_addr_t start, phys_addr_t end) +{ + __l2c220_inv_range(start, end); +} Zhen Lei (4): ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks ARM: hisi: add support for Kunpeng50x SoC dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller ARM: Add support for Hisilicon Kunpeng L3 cache controller .../arm/hisilicon/kunpeng-l3cache.yaml | 40 ++++ arch/arm/include/asm/outercache.h | 6 +- arch/arm/mach-hisi/Kconfig | 6 + arch/arm/mm/Kconfig | 10 + arch/arm/mm/Makefile | 1 + arch/arm/mm/cache-feroceon-l2.c | 15 +- arch/arm/mm/cache-kunpeng-l3.c | 178 ++++++++++++++++++ arch/arm/mm/cache-l2x0.c | 50 +++-- arch/arm/mm/cache-tauros2.c | 15 +- arch/arm/mm/cache-uniphier.c | 6 +- arch/arm/mm/cache-xsc3l2.c | 12 +- 11 files changed, 310 insertions(+), 29 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml create mode 100644 arch/arm/mm/cache-kunpeng-l3.c -- 2.26.0.106.g9fadedd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v7 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks 2021-02-02 7:16 ` Zhen Lei @ 2021-02-02 7:16 ` Zhen Lei -1 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei The outercache of some Hisilicon SOCs support physical addresses wider than 32-bits. The unsigned long datatype is not sufficient for mapping physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE: use phys_addr_t instead of unsigned long in outercache functions") has already modified the outercache functions. But the parameters of the outercache hooks are not changed. This patch use phys_addr_t instead of unsigned long in outercache hooks: inv_range, clean_range, flush_range. To ensure the outercache that does not support LPAE works properly, do cast phys_addr_t to unsigned long by adding a group of temporary variables. For example: -static void l2c220_inv_range(unsigned long start, unsigned long end) +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; Note that the outercache functions have been doing this cast before this patch. So now, the cast is just moved into the outercache hook functions. No functional change. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/outercache.h | 6 ++-- arch/arm/mm/cache-feroceon-l2.c | 15 ++++++++-- arch/arm/mm/cache-l2x0.c | 50 ++++++++++++++++++++++--------- arch/arm/mm/cache-tauros2.c | 15 ++++++++-- arch/arm/mm/cache-uniphier.c | 6 ++-- arch/arm/mm/cache-xsc3l2.c | 12 ++++++-- 6 files changed, 75 insertions(+), 29 deletions(-) diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h index 3364637755e86aa..4cee1ea0c15449a 100644 --- a/arch/arm/include/asm/outercache.h +++ b/arch/arm/include/asm/outercache.h @@ -14,9 +14,9 @@ struct l2x0_regs; struct outer_cache_fns { - void (*inv_range)(unsigned long, unsigned long); - void (*clean_range)(unsigned long, unsigned long); - void (*flush_range)(unsigned long, unsigned long); + void (*inv_range)(phys_addr_t, phys_addr_t); + void (*clean_range)(phys_addr_t, phys_addr_t); + void (*flush_range)(phys_addr_t, phys_addr_t); void (*flush_all)(void); void (*disable)(void); #ifdef CONFIG_OUTER_CACHE_SYNC diff --git a/arch/arm/mm/cache-feroceon-l2.c b/arch/arm/mm/cache-feroceon-l2.c index 5c1b7a7b9af6300..10f909744d5e963 100644 --- a/arch/arm/mm/cache-feroceon-l2.c +++ b/arch/arm/mm/cache-feroceon-l2.c @@ -168,8 +168,11 @@ static unsigned long calc_range_end(unsigned long start, unsigned long end) return range_end; } -static void feroceon_l2_inv_range(unsigned long start, unsigned long end) +static void feroceon_l2_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + /* * Clean and invalidate partial first cache line. */ @@ -198,8 +201,11 @@ static void feroceon_l2_inv_range(unsigned long start, unsigned long end) dsb(); } -static void feroceon_l2_clean_range(unsigned long start, unsigned long end) +static void feroceon_l2_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + /* * If L2 is forced to WT, the L2 will always be clean and we * don't need to do anything here. @@ -217,8 +223,11 @@ static void feroceon_l2_clean_range(unsigned long start, unsigned long end) dsb(); } -static void feroceon_l2_flush_range(unsigned long start, unsigned long end) +static void feroceon_l2_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + start &= ~(CACHE_LINE_SIZE - 1); end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1); while (start != end) { diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c index 43d91bfd2360086..cdaddd772b09ede 100644 --- a/arch/arm/mm/cache-l2x0.c +++ b/arch/arm/mm/cache-l2x0.c @@ -184,8 +184,10 @@ static void __l2c210_op_pa_range(void __iomem *reg, unsigned long start, } } -static void l2c210_inv_range(unsigned long start, unsigned long end) +static void l2c210_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; if (start & (CACHE_LINE_SIZE - 1)) { @@ -203,8 +205,10 @@ static void l2c210_inv_range(unsigned long start, unsigned long end) __l2c210_cache_sync(base); } -static void l2c210_clean_range(unsigned long start, unsigned long end) +static void l2c210_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; start &= ~(CACHE_LINE_SIZE - 1); @@ -212,8 +216,10 @@ static void l2c210_clean_range(unsigned long start, unsigned long end) __l2c210_cache_sync(base); } -static void l2c210_flush_range(unsigned long start, unsigned long end) +static void l2c210_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; start &= ~(CACHE_LINE_SIZE - 1); @@ -304,8 +310,10 @@ static unsigned long l2c220_op_pa_range(void __iomem *reg, unsigned long start, return flags; } -static void l2c220_inv_range(unsigned long start, unsigned long end) +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; unsigned long flags; @@ -331,8 +339,10 @@ static void l2c220_inv_range(unsigned long start, unsigned long end) raw_spin_unlock_irqrestore(&l2x0_lock, flags); } -static void l2c220_clean_range(unsigned long start, unsigned long end) +static void l2c220_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; unsigned long flags; @@ -350,8 +360,10 @@ static void l2c220_clean_range(unsigned long start, unsigned long end) raw_spin_unlock_irqrestore(&l2x0_lock, flags); } -static void l2c220_flush_range(unsigned long start, unsigned long end) +static void l2c220_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; unsigned long flags; @@ -464,8 +476,10 @@ static const struct l2c_init_data l2c220_data = { * Affects: store buffer * store buffer is not automatically drained. */ -static void l2c310_inv_range_erratum(unsigned long start, unsigned long end) +static void l2c310_inv_range_erratum(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; if ((start | end) & (CACHE_LINE_SIZE - 1)) { @@ -496,8 +510,10 @@ static void l2c310_inv_range_erratum(unsigned long start, unsigned long end) __l2c210_cache_sync(base); } -static void l2c310_flush_range_erratum(unsigned long start, unsigned long end) +static void l2c310_flush_range_erratum(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; raw_spinlock_t *lock = &l2x0_lock; unsigned long flags; void __iomem *base = l2x0_base; @@ -1400,12 +1416,12 @@ static void aurora_pa_range(unsigned long start, unsigned long end, start = range_end; } } -static void aurora_inv_range(unsigned long start, unsigned long end) +static void aurora_inv_range(phys_addr_t start, phys_addr_t end) { aurora_pa_range(start, end, AURORA_INVAL_RANGE_REG); } -static void aurora_clean_range(unsigned long start, unsigned long end) +static void aurora_clean_range(phys_addr_t start, phys_addr_t end) { /* * If L2 is forced to WT, the L2 will always be clean and we @@ -1415,7 +1431,7 @@ static void aurora_clean_range(unsigned long start, unsigned long end) aurora_pa_range(start, end, AURORA_CLEAN_RANGE_REG); } -static void aurora_flush_range(unsigned long start, unsigned long end) +static void aurora_flush_range(phys_addr_t start, phys_addr_t end) { if (l2_wt_override) aurora_pa_range(start, end, AURORA_INVAL_RANGE_REG); @@ -1604,8 +1620,10 @@ static inline unsigned long bcm_l2_phys_addr(unsigned long addr) return addr + BCM_VC_EMI_OFFSET; } -static void bcm_inv_range(unsigned long start, unsigned long end) +static void bcm_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long new_start, new_end; BUG_ON(start < BCM_SYS_EMI_START_ADDR); @@ -1631,8 +1649,10 @@ static void bcm_inv_range(unsigned long start, unsigned long end) new_end); } -static void bcm_clean_range(unsigned long start, unsigned long end) +static void bcm_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long new_start, new_end; BUG_ON(start < BCM_SYS_EMI_START_ADDR); @@ -1658,8 +1678,10 @@ static void bcm_clean_range(unsigned long start, unsigned long end) new_end); } -static void bcm_flush_range(unsigned long start, unsigned long end) +static void bcm_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long new_start, new_end; BUG_ON(start < BCM_SYS_EMI_START_ADDR); diff --git a/arch/arm/mm/cache-tauros2.c b/arch/arm/mm/cache-tauros2.c index 88255bea65e41e6..d768bbb5e05c690 100644 --- a/arch/arm/mm/cache-tauros2.c +++ b/arch/arm/mm/cache-tauros2.c @@ -66,8 +66,11 @@ static inline void tauros2_inv_pa(unsigned long addr) */ #define CACHE_LINE_SIZE 32 -static void tauros2_inv_range(unsigned long start, unsigned long end) +static void tauros2_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + /* * Clean and invalidate partial first cache line. */ @@ -95,8 +98,11 @@ static void tauros2_inv_range(unsigned long start, unsigned long end) dsb(); } -static void tauros2_clean_range(unsigned long start, unsigned long end) +static void tauros2_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + start &= ~(CACHE_LINE_SIZE - 1); while (start < end) { tauros2_clean_pa(start); @@ -106,8 +112,11 @@ static void tauros2_clean_range(unsigned long start, unsigned long end) dsb(); } -static void tauros2_flush_range(unsigned long start, unsigned long end) +static void tauros2_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + start &= ~(CACHE_LINE_SIZE - 1); while (start < end) { tauros2_clean_inv_pa(start); diff --git a/arch/arm/mm/cache-uniphier.c b/arch/arm/mm/cache-uniphier.c index ff2881458504329..e2508358e9f4082 100644 --- a/arch/arm/mm/cache-uniphier.c +++ b/arch/arm/mm/cache-uniphier.c @@ -250,17 +250,17 @@ static void uniphier_cache_maint_all(u32 operation) __uniphier_cache_maint_all(data, operation); } -static void uniphier_cache_inv_range(unsigned long start, unsigned long end) +static void uniphier_cache_inv_range(phys_addr_t start, phys_addr_t end) { uniphier_cache_maint_range(start, end, UNIPHIER_SSCOQM_CM_INV); } -static void uniphier_cache_clean_range(unsigned long start, unsigned long end) +static void uniphier_cache_clean_range(phys_addr_t start, phys_addr_t end) { uniphier_cache_maint_range(start, end, UNIPHIER_SSCOQM_CM_CLEAN); } -static void uniphier_cache_flush_range(unsigned long start, unsigned long end) +static void uniphier_cache_flush_range(phys_addr_t start, phys_addr_t end) { uniphier_cache_maint_range(start, end, UNIPHIER_SSCOQM_CM_FLUSH); } diff --git a/arch/arm/mm/cache-xsc3l2.c b/arch/arm/mm/cache-xsc3l2.c index d20d7af02d10fc0..5814731653d9091 100644 --- a/arch/arm/mm/cache-xsc3l2.c +++ b/arch/arm/mm/cache-xsc3l2.c @@ -83,8 +83,10 @@ static inline unsigned long l2_map_va(unsigned long pa, unsigned long prev_va) #endif } -static void xsc3_l2_inv_range(unsigned long start, unsigned long end) +static void xsc3_l2_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long vaddr; if (start == 0 && end == -1ul) { @@ -127,8 +129,10 @@ static void xsc3_l2_inv_range(unsigned long start, unsigned long end) dsb(); } -static void xsc3_l2_clean_range(unsigned long start, unsigned long end) +static void xsc3_l2_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long vaddr; vaddr = -1; /* to force the first mapping */ @@ -165,8 +169,10 @@ static inline void xsc3_l2_flush_all(void) dsb(); } -static void xsc3_l2_flush_range(unsigned long start, unsigned long end) +static void xsc3_l2_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long vaddr; if (start == 0 && end == -1ul) { -- 2.26.0.106.g9fadedd ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v7 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks @ 2021-02-02 7:16 ` Zhen Lei 0 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei The outercache of some Hisilicon SOCs support physical addresses wider than 32-bits. The unsigned long datatype is not sufficient for mapping physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE: use phys_addr_t instead of unsigned long in outercache functions") has already modified the outercache functions. But the parameters of the outercache hooks are not changed. This patch use phys_addr_t instead of unsigned long in outercache hooks: inv_range, clean_range, flush_range. To ensure the outercache that does not support LPAE works properly, do cast phys_addr_t to unsigned long by adding a group of temporary variables. For example: -static void l2c220_inv_range(unsigned long start, unsigned long end) +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; Note that the outercache functions have been doing this cast before this patch. So now, the cast is just moved into the outercache hook functions. No functional change. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/outercache.h | 6 ++-- arch/arm/mm/cache-feroceon-l2.c | 15 ++++++++-- arch/arm/mm/cache-l2x0.c | 50 ++++++++++++++++++++++--------- arch/arm/mm/cache-tauros2.c | 15 ++++++++-- arch/arm/mm/cache-uniphier.c | 6 ++-- arch/arm/mm/cache-xsc3l2.c | 12 ++++++-- 6 files changed, 75 insertions(+), 29 deletions(-) diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h index 3364637755e86aa..4cee1ea0c15449a 100644 --- a/arch/arm/include/asm/outercache.h +++ b/arch/arm/include/asm/outercache.h @@ -14,9 +14,9 @@ struct l2x0_regs; struct outer_cache_fns { - void (*inv_range)(unsigned long, unsigned long); - void (*clean_range)(unsigned long, unsigned long); - void (*flush_range)(unsigned long, unsigned long); + void (*inv_range)(phys_addr_t, phys_addr_t); + void (*clean_range)(phys_addr_t, phys_addr_t); + void (*flush_range)(phys_addr_t, phys_addr_t); void (*flush_all)(void); void (*disable)(void); #ifdef CONFIG_OUTER_CACHE_SYNC diff --git a/arch/arm/mm/cache-feroceon-l2.c b/arch/arm/mm/cache-feroceon-l2.c index 5c1b7a7b9af6300..10f909744d5e963 100644 --- a/arch/arm/mm/cache-feroceon-l2.c +++ b/arch/arm/mm/cache-feroceon-l2.c @@ -168,8 +168,11 @@ static unsigned long calc_range_end(unsigned long start, unsigned long end) return range_end; } -static void feroceon_l2_inv_range(unsigned long start, unsigned long end) +static void feroceon_l2_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + /* * Clean and invalidate partial first cache line. */ @@ -198,8 +201,11 @@ static void feroceon_l2_inv_range(unsigned long start, unsigned long end) dsb(); } -static void feroceon_l2_clean_range(unsigned long start, unsigned long end) +static void feroceon_l2_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + /* * If L2 is forced to WT, the L2 will always be clean and we * don't need to do anything here. @@ -217,8 +223,11 @@ static void feroceon_l2_clean_range(unsigned long start, unsigned long end) dsb(); } -static void feroceon_l2_flush_range(unsigned long start, unsigned long end) +static void feroceon_l2_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + start &= ~(CACHE_LINE_SIZE - 1); end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1); while (start != end) { diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c index 43d91bfd2360086..cdaddd772b09ede 100644 --- a/arch/arm/mm/cache-l2x0.c +++ b/arch/arm/mm/cache-l2x0.c @@ -184,8 +184,10 @@ static void __l2c210_op_pa_range(void __iomem *reg, unsigned long start, } } -static void l2c210_inv_range(unsigned long start, unsigned long end) +static void l2c210_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; if (start & (CACHE_LINE_SIZE - 1)) { @@ -203,8 +205,10 @@ static void l2c210_inv_range(unsigned long start, unsigned long end) __l2c210_cache_sync(base); } -static void l2c210_clean_range(unsigned long start, unsigned long end) +static void l2c210_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; start &= ~(CACHE_LINE_SIZE - 1); @@ -212,8 +216,10 @@ static void l2c210_clean_range(unsigned long start, unsigned long end) __l2c210_cache_sync(base); } -static void l2c210_flush_range(unsigned long start, unsigned long end) +static void l2c210_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; start &= ~(CACHE_LINE_SIZE - 1); @@ -304,8 +310,10 @@ static unsigned long l2c220_op_pa_range(void __iomem *reg, unsigned long start, return flags; } -static void l2c220_inv_range(unsigned long start, unsigned long end) +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; unsigned long flags; @@ -331,8 +339,10 @@ static void l2c220_inv_range(unsigned long start, unsigned long end) raw_spin_unlock_irqrestore(&l2x0_lock, flags); } -static void l2c220_clean_range(unsigned long start, unsigned long end) +static void l2c220_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; unsigned long flags; @@ -350,8 +360,10 @@ static void l2c220_clean_range(unsigned long start, unsigned long end) raw_spin_unlock_irqrestore(&l2x0_lock, flags); } -static void l2c220_flush_range(unsigned long start, unsigned long end) +static void l2c220_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; unsigned long flags; @@ -464,8 +476,10 @@ static const struct l2c_init_data l2c220_data = { * Affects: store buffer * store buffer is not automatically drained. */ -static void l2c310_inv_range_erratum(unsigned long start, unsigned long end) +static void l2c310_inv_range_erratum(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; void __iomem *base = l2x0_base; if ((start | end) & (CACHE_LINE_SIZE - 1)) { @@ -496,8 +510,10 @@ static void l2c310_inv_range_erratum(unsigned long start, unsigned long end) __l2c210_cache_sync(base); } -static void l2c310_flush_range_erratum(unsigned long start, unsigned long end) +static void l2c310_flush_range_erratum(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; raw_spinlock_t *lock = &l2x0_lock; unsigned long flags; void __iomem *base = l2x0_base; @@ -1400,12 +1416,12 @@ static void aurora_pa_range(unsigned long start, unsigned long end, start = range_end; } } -static void aurora_inv_range(unsigned long start, unsigned long end) +static void aurora_inv_range(phys_addr_t start, phys_addr_t end) { aurora_pa_range(start, end, AURORA_INVAL_RANGE_REG); } -static void aurora_clean_range(unsigned long start, unsigned long end) +static void aurora_clean_range(phys_addr_t start, phys_addr_t end) { /* * If L2 is forced to WT, the L2 will always be clean and we @@ -1415,7 +1431,7 @@ static void aurora_clean_range(unsigned long start, unsigned long end) aurora_pa_range(start, end, AURORA_CLEAN_RANGE_REG); } -static void aurora_flush_range(unsigned long start, unsigned long end) +static void aurora_flush_range(phys_addr_t start, phys_addr_t end) { if (l2_wt_override) aurora_pa_range(start, end, AURORA_INVAL_RANGE_REG); @@ -1604,8 +1620,10 @@ static inline unsigned long bcm_l2_phys_addr(unsigned long addr) return addr + BCM_VC_EMI_OFFSET; } -static void bcm_inv_range(unsigned long start, unsigned long end) +static void bcm_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long new_start, new_end; BUG_ON(start < BCM_SYS_EMI_START_ADDR); @@ -1631,8 +1649,10 @@ static void bcm_inv_range(unsigned long start, unsigned long end) new_end); } -static void bcm_clean_range(unsigned long start, unsigned long end) +static void bcm_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long new_start, new_end; BUG_ON(start < BCM_SYS_EMI_START_ADDR); @@ -1658,8 +1678,10 @@ static void bcm_clean_range(unsigned long start, unsigned long end) new_end); } -static void bcm_flush_range(unsigned long start, unsigned long end) +static void bcm_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long new_start, new_end; BUG_ON(start < BCM_SYS_EMI_START_ADDR); diff --git a/arch/arm/mm/cache-tauros2.c b/arch/arm/mm/cache-tauros2.c index 88255bea65e41e6..d768bbb5e05c690 100644 --- a/arch/arm/mm/cache-tauros2.c +++ b/arch/arm/mm/cache-tauros2.c @@ -66,8 +66,11 @@ static inline void tauros2_inv_pa(unsigned long addr) */ #define CACHE_LINE_SIZE 32 -static void tauros2_inv_range(unsigned long start, unsigned long end) +static void tauros2_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + /* * Clean and invalidate partial first cache line. */ @@ -95,8 +98,11 @@ static void tauros2_inv_range(unsigned long start, unsigned long end) dsb(); } -static void tauros2_clean_range(unsigned long start, unsigned long end) +static void tauros2_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + start &= ~(CACHE_LINE_SIZE - 1); while (start < end) { tauros2_clean_pa(start); @@ -106,8 +112,11 @@ static void tauros2_clean_range(unsigned long start, unsigned long end) dsb(); } -static void tauros2_flush_range(unsigned long start, unsigned long end) +static void tauros2_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; + start &= ~(CACHE_LINE_SIZE - 1); while (start < end) { tauros2_clean_inv_pa(start); diff --git a/arch/arm/mm/cache-uniphier.c b/arch/arm/mm/cache-uniphier.c index ff2881458504329..e2508358e9f4082 100644 --- a/arch/arm/mm/cache-uniphier.c +++ b/arch/arm/mm/cache-uniphier.c @@ -250,17 +250,17 @@ static void uniphier_cache_maint_all(u32 operation) __uniphier_cache_maint_all(data, operation); } -static void uniphier_cache_inv_range(unsigned long start, unsigned long end) +static void uniphier_cache_inv_range(phys_addr_t start, phys_addr_t end) { uniphier_cache_maint_range(start, end, UNIPHIER_SSCOQM_CM_INV); } -static void uniphier_cache_clean_range(unsigned long start, unsigned long end) +static void uniphier_cache_clean_range(phys_addr_t start, phys_addr_t end) { uniphier_cache_maint_range(start, end, UNIPHIER_SSCOQM_CM_CLEAN); } -static void uniphier_cache_flush_range(unsigned long start, unsigned long end) +static void uniphier_cache_flush_range(phys_addr_t start, phys_addr_t end) { uniphier_cache_maint_range(start, end, UNIPHIER_SSCOQM_CM_FLUSH); } diff --git a/arch/arm/mm/cache-xsc3l2.c b/arch/arm/mm/cache-xsc3l2.c index d20d7af02d10fc0..5814731653d9091 100644 --- a/arch/arm/mm/cache-xsc3l2.c +++ b/arch/arm/mm/cache-xsc3l2.c @@ -83,8 +83,10 @@ static inline unsigned long l2_map_va(unsigned long pa, unsigned long prev_va) #endif } -static void xsc3_l2_inv_range(unsigned long start, unsigned long end) +static void xsc3_l2_inv_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long vaddr; if (start == 0 && end == -1ul) { @@ -127,8 +129,10 @@ static void xsc3_l2_inv_range(unsigned long start, unsigned long end) dsb(); } -static void xsc3_l2_clean_range(unsigned long start, unsigned long end) +static void xsc3_l2_clean_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long vaddr; vaddr = -1; /* to force the first mapping */ @@ -165,8 +169,10 @@ static inline void xsc3_l2_flush_all(void) dsb(); } -static void xsc3_l2_flush_range(unsigned long start, unsigned long end) +static void xsc3_l2_flush_range(phys_addr_t pa_start, phys_addr_t pa_end) { + unsigned long start = pa_start; + unsigned long end = pa_end; unsigned long vaddr; if (start == 0 && end == -1ul) { -- 2.26.0.106.g9fadedd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v7 2/4] ARM: hisi: add support for Kunpeng50x SoC 2021-02-02 7:16 ` Zhen Lei @ 2021-02-02 7:16 ` Zhen Lei -1 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei Enable support for the Hisilicon Kunpeng506 and Kunpeng509 SoC. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mach-hisi/Kconfig | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm/mach-hisi/Kconfig b/arch/arm/mach-hisi/Kconfig index 2e980f834a6aa1b..a004eac24b243af 100644 --- a/arch/arm/mach-hisi/Kconfig +++ b/arch/arm/mach-hisi/Kconfig @@ -55,6 +55,12 @@ config ARCH_HIX5HD2 help Support for Hisilicon HIX5HD2 SoC family +config ARCH_KUNPENG50X + bool "Hisilicon Kunpeng50x family" + depends on ARCH_MULTI_V7 + help + Support for Hisilicon Kunpeng506 and Kunpeng509 SoC family + config ARCH_SD5203 bool "Hisilicon SD5203 family" depends on ARCH_MULTI_V5 -- 2.26.0.106.g9fadedd ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v7 2/4] ARM: hisi: add support for Kunpeng50x SoC @ 2021-02-02 7:16 ` Zhen Lei 0 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei Enable support for the Hisilicon Kunpeng506 and Kunpeng509 SoC. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mach-hisi/Kconfig | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm/mach-hisi/Kconfig b/arch/arm/mach-hisi/Kconfig index 2e980f834a6aa1b..a004eac24b243af 100644 --- a/arch/arm/mach-hisi/Kconfig +++ b/arch/arm/mach-hisi/Kconfig @@ -55,6 +55,12 @@ config ARCH_HIX5HD2 help Support for Hisilicon HIX5HD2 SoC family +config ARCH_KUNPENG50X + bool "Hisilicon Kunpeng50x family" + depends on ARCH_MULTI_V7 + help + Support for Hisilicon Kunpeng506 and Kunpeng509 SoC family + config ARCH_SD5203 bool "Hisilicon SD5203 family" depends on ARCH_MULTI_V5 -- 2.26.0.106.g9fadedd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v7 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller 2021-02-02 7:16 ` Zhen Lei @ 2021-02-02 7:16 ` Zhen Lei -1 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei Add devicetree binding for Hisilicon Kunpeng L3 cache controller. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> --- .../arm/hisilicon/kunpeng-l3cache.yaml | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml diff --git a/Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml b/Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml new file mode 100644 index 000000000000000..5bf33c0e4d14b7f --- /dev/null +++ b/Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml @@ -0,0 +1,40 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/arm/hisilicon/kunpeng-l3cache.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Hisilicon Kunpeng L3 cache controller + +maintainers: + - Wei Xu <xuwei5@hisilicon.com> + +description: | + The Hisilicon Kunpeng L3 outer cache controller supports a maximum of 36-bit + physical addresses. The data cached in the L3 outer cache can be operated + based on the physical address range or the entire cache. + +properties: + compatible: + items: + - enum: + - hisilicon,kunpeng506-l3cache + - hisilicon,kunpeng509-l3cache + - const: hisilicon,kunpeng-l3cache + + reg: + maxItems: 1 + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + l3cache@f302b000 { + compatible = "hisilicon,kunpeng509-l3cache", "hisilicon,kunpeng-l3cache"; + reg = <0xf302b000 0x1000>; + }; +... -- 2.26.0.106.g9fadedd ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v7 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller @ 2021-02-02 7:16 ` Zhen Lei 0 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei Add devicetree binding for Hisilicon Kunpeng L3 cache controller. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> --- .../arm/hisilicon/kunpeng-l3cache.yaml | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml diff --git a/Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml b/Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml new file mode 100644 index 000000000000000..5bf33c0e4d14b7f --- /dev/null +++ b/Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml @@ -0,0 +1,40 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/arm/hisilicon/kunpeng-l3cache.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Hisilicon Kunpeng L3 cache controller + +maintainers: + - Wei Xu <xuwei5@hisilicon.com> + +description: | + The Hisilicon Kunpeng L3 outer cache controller supports a maximum of 36-bit + physical addresses. The data cached in the L3 outer cache can be operated + based on the physical address range or the entire cache. + +properties: + compatible: + items: + - enum: + - hisilicon,kunpeng506-l3cache + - hisilicon,kunpeng509-l3cache + - const: hisilicon,kunpeng-l3cache + + reg: + maxItems: 1 + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + l3cache@f302b000 { + compatible = "hisilicon,kunpeng509-l3cache", "hisilicon,kunpeng-l3cache"; + reg = <0xf302b000 0x1000>; + }; +... -- 2.26.0.106.g9fadedd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v7 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller 2021-02-02 7:16 ` Zhen Lei @ 2021-02-05 21:12 ` Rob Herring -1 siblings, 0 replies; 18+ messages in thread From: Rob Herring @ 2021-02-05 21:12 UTC (permalink / raw) To: Zhen Lei Cc: Rob Herring, devicetree, linux-arm-kernel, linux-kernel, Will Deacon, Greg Kroah-Hartman, Wei Xu, Russell King, Arnd Bergmann, Haojian Zhuang On Tue, 02 Feb 2021 15:16:47 +0800, Zhen Lei wrote: > Add devicetree binding for Hisilicon Kunpeng L3 cache controller. > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > Reviewed-by: Arnd Bergmann <arnd@arndb.de> > --- > .../arm/hisilicon/kunpeng-l3cache.yaml | 40 +++++++++++++++++++ > 1 file changed, 40 insertions(+) > create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml > Reviewed-by: Rob Herring <robh@kernel.org> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller @ 2021-02-05 21:12 ` Rob Herring 0 siblings, 0 replies; 18+ messages in thread From: Rob Herring @ 2021-02-05 21:12 UTC (permalink / raw) To: Zhen Lei Cc: devicetree, Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, linux-kernel, Wei Xu, Rob Herring, Haojian Zhuang, Russell King, linux-arm-kernel On Tue, 02 Feb 2021 15:16:47 +0800, Zhen Lei wrote: > Add devicetree binding for Hisilicon Kunpeng L3 cache controller. > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > Reviewed-by: Arnd Bergmann <arnd@arndb.de> > --- > .../arm/hisilicon/kunpeng-l3cache.yaml | 40 +++++++++++++++++++ > 1 file changed, 40 insertions(+) > create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml > Reviewed-by: Rob Herring <robh@kernel.org> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v7 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller 2021-02-02 7:16 ` Zhen Lei @ 2021-02-02 7:16 ` Zhen Lei -1 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei Add support for the Hisilicon Kunpeng L3 cache controller as used with Kunpeng506 and Kunpeng509 SoCs. These Hisilicon SoCs support LPAE, so the physical addresses is wider than 32-bits, but the actual bit width does not exceed 36 bits. When the cache operation is performed based on the address range, the upper 30 bits of the physical address are recorded in registers L3_MAINT_START and L3_MAINT_END, and ignore the lower 6 bits cacheline offset. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/Kconfig | 10 ++ arch/arm/mm/Makefile | 1 + arch/arm/mm/cache-kunpeng-l3.c | 178 +++++++++++++++++++++++++++++++++ 3 files changed, 189 insertions(+) create mode 100644 arch/arm/mm/cache-kunpeng-l3.c diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index 02692fbe2db5c59..d2082503de053d2 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -1070,6 +1070,16 @@ config CACHE_XSC3L2 help This option enables the L2 cache on XScale3. +config CACHE_KUNPENG_L3 + bool "Enable the Hisilicon Kunpeng L3 cache controller" + depends on ARCH_KUNPENG50X && OF + default y + select OUTER_CACHE + help + This option enables the Kunpeng L3 cache controller on Hisilicon + Kunpeng506 and Kunpeng509 SoCs. It supports a maximum of 36-bit + physical addresses. + config ARM_L1_CACHE_SHIFT_6 bool default y if CPU_V7 diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 3510503bc5e688b..ececc5489e353eb 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -112,6 +112,7 @@ obj-$(CONFIG_CACHE_L2X0_PMU) += cache-l2x0-pmu.o obj-$(CONFIG_CACHE_XSC3L2) += cache-xsc3l2.o obj-$(CONFIG_CACHE_TAUROS2) += cache-tauros2.o obj-$(CONFIG_CACHE_UNIPHIER) += cache-uniphier.o +obj-$(CONFIG_CACHE_KUNPENG_L3) += cache-kunpeng-l3.o KASAN_SANITIZE_kasan_init.o := n obj-$(CONFIG_KASAN) += kasan_init.o diff --git a/arch/arm/mm/cache-kunpeng-l3.c b/arch/arm/mm/cache-kunpeng-l3.c new file mode 100644 index 000000000000000..64f892de9d68058 --- /dev/null +++ b/arch/arm/mm/cache-kunpeng-l3.c @@ -0,0 +1,178 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2021 Hisilicon Limited. + */ + +#include <linux/init.h> +#include <linux/spinlock.h> +#include <linux/io.h> +#include <linux/of_address.h> + +#include <asm/cacheflush.h> + +#define L3_CACHE_LINE_SHITF 6 + +#define L3_CTRL 0x0 +#define L3_CTRL_ENABLE (1U << 0) +#define L3_CTRL_DISABLE (0U << 0) + +#define L3_AUCTRL 0x4 +#define L3_AUCTRL_EVENT_EN BIT(23) +#define L3_AUCTRL_ECC_EN BIT(8) + +#define L3_MAINT_CTRL 0x20 +#define L3_MAINT_RANGE_MASK GENMASK(3, 3) +#define L3_MAINT_RANGE_ALL (0U << 3) +#define L3_MAINT_RANGE_ADDR (1U << 3) +#define L3_MAINT_TYPE_MASK GENMASK(2, 1) +#define L3_MAINT_TYPE_CLEAN (1U << 1) +#define L3_MAINT_TYPE_INV (2U << 1) +#define L3_MAINT_TYPE_FLUSH (3U << 1) +#define L3_MAINT_STATUS_MASK GENMASK(0, 0) +#define L3_MAINT_STATUS_START (1U << 0) +#define L3_MAINT_STATUS_END (0U << 0) + +#define L3_MAINT_START 0x24 +#define L3_MAINT_END 0x28 + +static DEFINE_RAW_SPINLOCK(l3cache_lock); +static void __iomem *l3_ctrl_base; + +/* + * All read and write operations on L3 cache registers are protected by the + * spinlock, except for l3cache_init(). Each time the L3 cache operation is + * performed, all related information is filled into its registers. Therefore, + * there is no memory order problem when only _relaxed() functions are used. + * This can help us achieve some performance improvement: + * 1) The readl_relaxed() is about 20ns faster than readl(). + * 2) The writel_relaxed() is about 123ns faster than writel(). + */ +static void l3cache_maint_common(u32 range, u32 op_type) +{ + u32 reg; + + reg = readl_relaxed(l3_ctrl_base + L3_MAINT_CTRL); + reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK); + reg |= range | op_type; + reg |= L3_MAINT_STATUS_START; + writel_relaxed(reg, l3_ctrl_base + L3_MAINT_CTRL); + + /* Wait until the hardware maintenance operation is complete. */ + do { + cpu_relax(); + reg = readl_relaxed(l3_ctrl_base + L3_MAINT_CTRL); + } while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END); +} + +static void l3cache_maint_range(phys_addr_t start, phys_addr_t end, u32 op_type) +{ + start = start >> L3_CACHE_LINE_SHITF; + end = ((end - 1) >> L3_CACHE_LINE_SHITF) + 1; + + writel_relaxed(start, l3_ctrl_base + L3_MAINT_START); + writel_relaxed(end, l3_ctrl_base + L3_MAINT_END); + + l3cache_maint_common(L3_MAINT_RANGE_ADDR, op_type); +} + +static inline void l3cache_flush_all_nolock(void) +{ + l3cache_maint_common(L3_MAINT_RANGE_ALL, L3_MAINT_TYPE_FLUSH); +} + +static void l3cache_flush_all(void) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_flush_all_nolock(); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static void l3cache_inv_range(phys_addr_t start, phys_addr_t end) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_maint_range(start, end, L3_MAINT_TYPE_INV); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static void l3cache_clean_range(phys_addr_t start, phys_addr_t end) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_maint_range(start, end, L3_MAINT_TYPE_CLEAN); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static void l3cache_flush_range(phys_addr_t start, phys_addr_t end) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_maint_range(start, end, L3_MAINT_TYPE_FLUSH); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static void l3cache_disable(void) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_flush_all_nolock(); + writel_relaxed(L3_CTRL_DISABLE, l3_ctrl_base + L3_CTRL); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static const struct of_device_id l3cache_ids[] __initconst = { + {.compatible = "hisilicon,kunpeng-l3cache", .data = NULL}, + {} +}; + +static int __init l3cache_init(void) +{ + u32 reg; + struct device_node *node; + + node = of_find_matching_node(NULL, l3cache_ids); + if (!node) + return -ENODEV; + + l3_ctrl_base = of_iomap(node, 0); + if (!l3_ctrl_base) { + pr_err("failed to map Kunpeng L3 cache controller registers\n"); + return -ENOMEM; + } + + reg = readl_relaxed(l3_ctrl_base + L3_CTRL); + if (!(reg & L3_CTRL_ENABLE)) { + /* + * Ensure that no L3 cache hardware maintenance operations are + * being performed before enabling the L3 cache. Wait for it to + * finish. + */ + do { + cpu_relax(); + reg = readl_relaxed(l3_ctrl_base + L3_MAINT_CTRL); + } while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END); + + reg = readl_relaxed(l3_ctrl_base + L3_AUCTRL); + reg |= L3_AUCTRL_EVENT_EN | L3_AUCTRL_ECC_EN; + writel_relaxed(reg, l3_ctrl_base + L3_AUCTRL); + + writel_relaxed(L3_CTRL_ENABLE, l3_ctrl_base + L3_CTRL); + } + + outer_cache.inv_range = l3cache_inv_range; + outer_cache.clean_range = l3cache_clean_range; + outer_cache.flush_range = l3cache_flush_range; + outer_cache.flush_all = l3cache_flush_all; + outer_cache.disable = l3cache_disable; + + pr_info("Hisilicon Kunpeng L3 cache controller enabled\n"); + + return 0; +} +arch_initcall(l3cache_init); -- 2.26.0.106.g9fadedd ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v7 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller @ 2021-02-02 7:16 ` Zhen Lei 0 siblings, 0 replies; 18+ messages in thread From: Zhen Lei @ 2021-02-02 7:16 UTC (permalink / raw) To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel Cc: Zhen Lei Add support for the Hisilicon Kunpeng L3 cache controller as used with Kunpeng506 and Kunpeng509 SoCs. These Hisilicon SoCs support LPAE, so the physical addresses is wider than 32-bits, but the actual bit width does not exceed 36 bits. When the cache operation is performed based on the address range, the upper 30 bits of the physical address are recorded in registers L3_MAINT_START and L3_MAINT_END, and ignore the lower 6 bits cacheline offset. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/mm/Kconfig | 10 ++ arch/arm/mm/Makefile | 1 + arch/arm/mm/cache-kunpeng-l3.c | 178 +++++++++++++++++++++++++++++++++ 3 files changed, 189 insertions(+) create mode 100644 arch/arm/mm/cache-kunpeng-l3.c diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index 02692fbe2db5c59..d2082503de053d2 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -1070,6 +1070,16 @@ config CACHE_XSC3L2 help This option enables the L2 cache on XScale3. +config CACHE_KUNPENG_L3 + bool "Enable the Hisilicon Kunpeng L3 cache controller" + depends on ARCH_KUNPENG50X && OF + default y + select OUTER_CACHE + help + This option enables the Kunpeng L3 cache controller on Hisilicon + Kunpeng506 and Kunpeng509 SoCs. It supports a maximum of 36-bit + physical addresses. + config ARM_L1_CACHE_SHIFT_6 bool default y if CPU_V7 diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 3510503bc5e688b..ececc5489e353eb 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -112,6 +112,7 @@ obj-$(CONFIG_CACHE_L2X0_PMU) += cache-l2x0-pmu.o obj-$(CONFIG_CACHE_XSC3L2) += cache-xsc3l2.o obj-$(CONFIG_CACHE_TAUROS2) += cache-tauros2.o obj-$(CONFIG_CACHE_UNIPHIER) += cache-uniphier.o +obj-$(CONFIG_CACHE_KUNPENG_L3) += cache-kunpeng-l3.o KASAN_SANITIZE_kasan_init.o := n obj-$(CONFIG_KASAN) += kasan_init.o diff --git a/arch/arm/mm/cache-kunpeng-l3.c b/arch/arm/mm/cache-kunpeng-l3.c new file mode 100644 index 000000000000000..64f892de9d68058 --- /dev/null +++ b/arch/arm/mm/cache-kunpeng-l3.c @@ -0,0 +1,178 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2021 Hisilicon Limited. + */ + +#include <linux/init.h> +#include <linux/spinlock.h> +#include <linux/io.h> +#include <linux/of_address.h> + +#include <asm/cacheflush.h> + +#define L3_CACHE_LINE_SHITF 6 + +#define L3_CTRL 0x0 +#define L3_CTRL_ENABLE (1U << 0) +#define L3_CTRL_DISABLE (0U << 0) + +#define L3_AUCTRL 0x4 +#define L3_AUCTRL_EVENT_EN BIT(23) +#define L3_AUCTRL_ECC_EN BIT(8) + +#define L3_MAINT_CTRL 0x20 +#define L3_MAINT_RANGE_MASK GENMASK(3, 3) +#define L3_MAINT_RANGE_ALL (0U << 3) +#define L3_MAINT_RANGE_ADDR (1U << 3) +#define L3_MAINT_TYPE_MASK GENMASK(2, 1) +#define L3_MAINT_TYPE_CLEAN (1U << 1) +#define L3_MAINT_TYPE_INV (2U << 1) +#define L3_MAINT_TYPE_FLUSH (3U << 1) +#define L3_MAINT_STATUS_MASK GENMASK(0, 0) +#define L3_MAINT_STATUS_START (1U << 0) +#define L3_MAINT_STATUS_END (0U << 0) + +#define L3_MAINT_START 0x24 +#define L3_MAINT_END 0x28 + +static DEFINE_RAW_SPINLOCK(l3cache_lock); +static void __iomem *l3_ctrl_base; + +/* + * All read and write operations on L3 cache registers are protected by the + * spinlock, except for l3cache_init(). Each time the L3 cache operation is + * performed, all related information is filled into its registers. Therefore, + * there is no memory order problem when only _relaxed() functions are used. + * This can help us achieve some performance improvement: + * 1) The readl_relaxed() is about 20ns faster than readl(). + * 2) The writel_relaxed() is about 123ns faster than writel(). + */ +static void l3cache_maint_common(u32 range, u32 op_type) +{ + u32 reg; + + reg = readl_relaxed(l3_ctrl_base + L3_MAINT_CTRL); + reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK); + reg |= range | op_type; + reg |= L3_MAINT_STATUS_START; + writel_relaxed(reg, l3_ctrl_base + L3_MAINT_CTRL); + + /* Wait until the hardware maintenance operation is complete. */ + do { + cpu_relax(); + reg = readl_relaxed(l3_ctrl_base + L3_MAINT_CTRL); + } while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END); +} + +static void l3cache_maint_range(phys_addr_t start, phys_addr_t end, u32 op_type) +{ + start = start >> L3_CACHE_LINE_SHITF; + end = ((end - 1) >> L3_CACHE_LINE_SHITF) + 1; + + writel_relaxed(start, l3_ctrl_base + L3_MAINT_START); + writel_relaxed(end, l3_ctrl_base + L3_MAINT_END); + + l3cache_maint_common(L3_MAINT_RANGE_ADDR, op_type); +} + +static inline void l3cache_flush_all_nolock(void) +{ + l3cache_maint_common(L3_MAINT_RANGE_ALL, L3_MAINT_TYPE_FLUSH); +} + +static void l3cache_flush_all(void) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_flush_all_nolock(); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static void l3cache_inv_range(phys_addr_t start, phys_addr_t end) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_maint_range(start, end, L3_MAINT_TYPE_INV); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static void l3cache_clean_range(phys_addr_t start, phys_addr_t end) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_maint_range(start, end, L3_MAINT_TYPE_CLEAN); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static void l3cache_flush_range(phys_addr_t start, phys_addr_t end) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_maint_range(start, end, L3_MAINT_TYPE_FLUSH); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static void l3cache_disable(void) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&l3cache_lock, flags); + l3cache_flush_all_nolock(); + writel_relaxed(L3_CTRL_DISABLE, l3_ctrl_base + L3_CTRL); + raw_spin_unlock_irqrestore(&l3cache_lock, flags); +} + +static const struct of_device_id l3cache_ids[] __initconst = { + {.compatible = "hisilicon,kunpeng-l3cache", .data = NULL}, + {} +}; + +static int __init l3cache_init(void) +{ + u32 reg; + struct device_node *node; + + node = of_find_matching_node(NULL, l3cache_ids); + if (!node) + return -ENODEV; + + l3_ctrl_base = of_iomap(node, 0); + if (!l3_ctrl_base) { + pr_err("failed to map Kunpeng L3 cache controller registers\n"); + return -ENOMEM; + } + + reg = readl_relaxed(l3_ctrl_base + L3_CTRL); + if (!(reg & L3_CTRL_ENABLE)) { + /* + * Ensure that no L3 cache hardware maintenance operations are + * being performed before enabling the L3 cache. Wait for it to + * finish. + */ + do { + cpu_relax(); + reg = readl_relaxed(l3_ctrl_base + L3_MAINT_CTRL); + } while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END); + + reg = readl_relaxed(l3_ctrl_base + L3_AUCTRL); + reg |= L3_AUCTRL_EVENT_EN | L3_AUCTRL_ECC_EN; + writel_relaxed(reg, l3_ctrl_base + L3_AUCTRL); + + writel_relaxed(L3_CTRL_ENABLE, l3_ctrl_base + L3_CTRL); + } + + outer_cache.inv_range = l3cache_inv_range; + outer_cache.clean_range = l3cache_clean_range; + outer_cache.flush_range = l3cache_flush_range; + outer_cache.flush_all = l3cache_flush_all; + outer_cache.disable = l3cache_disable; + + pr_info("Hisilicon Kunpeng L3 cache controller enabled\n"); + + return 0; +} +arch_initcall(l3cache_init); -- 2.26.0.106.g9fadedd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v7 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller 2021-02-02 7:16 ` Zhen Lei @ 2021-02-02 8:44 ` Arnd Bergmann -1 siblings, 0 replies; 18+ messages in thread From: Arnd Bergmann @ 2021-02-02 8:44 UTC (permalink / raw) To: Zhen Lei Cc: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel On Tue, Feb 2, 2021 at 8:16 AM Zhen Lei <thunder.leizhen@huawei.com> wrote: > + > +/* > + * All read and write operations on L3 cache registers are protected by the > + * spinlock, except for l3cache_init(). Each time the L3 cache operation is > + * performed, all related information is filled into its registers. Therefore, > + * there is no memory order problem when only _relaxed() functions are used. Thank you for including the text. I don't think the explanation with the spin_lock() explains why this can be considered safe though, as spin_lock() only contains serialization against other CPUs (smp_mb()) rather than the stronger DMA barriers implied by readl and writel. As Russell previously explained, these barriers are the L1 cache operations (e.g. v7_dma_inv_range) do include stronger barriers, so it would be better to come up with a justification based on those. > + * This can help us achieve some performance improvement: > + * 1) The readl_relaxed() is about 20ns faster than readl(). > + * 2) The writel_relaxed() is about 123ns faster than writel(). These are not really the performance numbers I asked for, as a low-level benchmark comparing the instructions is rather meaningless. The time spent waiting for the barrier depends on what else is going on around the barrier. Also, most of the time would likely be spent spinning in the loop around readl() while the cache operations are in progress, so the latency of a single readl() is not necessarily significant. To have a more useful performance number, try mentioning the most performance sensitive non-coherent DMA master on one of the chips that has this cache controller, and a high-level performance number such as "1.2% more network packets per second" if that is something you can measure easily. Of course, if all high-speed DMA masters on this chip are cache coherent, there is no need for performance numbers, just mention that we don't care about speed in that case. Arnd ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller @ 2021-02-02 8:44 ` Arnd Bergmann 0 siblings, 0 replies; 18+ messages in thread From: Arnd Bergmann @ 2021-02-02 8:44 UTC (permalink / raw) To: Zhen Lei Cc: devicetree, Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, linux-kernel, Haojian Zhuang, Rob Herring, Wei Xu, Russell King, linux-arm-kernel On Tue, Feb 2, 2021 at 8:16 AM Zhen Lei <thunder.leizhen@huawei.com> wrote: > + > +/* > + * All read and write operations on L3 cache registers are protected by the > + * spinlock, except for l3cache_init(). Each time the L3 cache operation is > + * performed, all related information is filled into its registers. Therefore, > + * there is no memory order problem when only _relaxed() functions are used. Thank you for including the text. I don't think the explanation with the spin_lock() explains why this can be considered safe though, as spin_lock() only contains serialization against other CPUs (smp_mb()) rather than the stronger DMA barriers implied by readl and writel. As Russell previously explained, these barriers are the L1 cache operations (e.g. v7_dma_inv_range) do include stronger barriers, so it would be better to come up with a justification based on those. > + * This can help us achieve some performance improvement: > + * 1) The readl_relaxed() is about 20ns faster than readl(). > + * 2) The writel_relaxed() is about 123ns faster than writel(). These are not really the performance numbers I asked for, as a low-level benchmark comparing the instructions is rather meaningless. The time spent waiting for the barrier depends on what else is going on around the barrier. Also, most of the time would likely be spent spinning in the loop around readl() while the cache operations are in progress, so the latency of a single readl() is not necessarily significant. To have a more useful performance number, try mentioning the most performance sensitive non-coherent DMA master on one of the chips that has this cache controller, and a high-level performance number such as "1.2% more network packets per second" if that is something you can measure easily. Of course, if all high-speed DMA masters on this chip are cache coherent, there is no need for performance numbers, just mention that we don't care about speed in that case. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller 2021-02-02 8:44 ` Arnd Bergmann @ 2021-02-02 12:18 ` Leizhen (ThunderTown) -1 siblings, 0 replies; 18+ messages in thread From: Leizhen (ThunderTown) @ 2021-02-02 12:18 UTC (permalink / raw) To: Arnd Bergmann Cc: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel On 2021/2/2 16:44, Arnd Bergmann wrote: > On Tue, Feb 2, 2021 at 8:16 AM Zhen Lei <thunder.leizhen@huawei.com> wrote: >> + >> +/* >> + * All read and write operations on L3 cache registers are protected by the >> + * spinlock, except for l3cache_init(). Each time the L3 cache operation is >> + * performed, all related information is filled into its registers. Therefore, >> + * there is no memory order problem when only _relaxed() functions are used. > > Thank you for including the text. > > I don't think the explanation with the spin_lock() explains why this > can be considered safe though, as spin_lock() only contains serialization > against other CPUs (smp_mb()) rather than the stronger DMA barriers > implied by readl and writel. As Russell previously explained, these > barriers are the L1 cache operations (e.g. v7_dma_inv_range) do > include stronger barriers, so it would be better to come up with a > justification based on those. Okay, I'll correct the description. > >> + * This can help us achieve some performance improvement: >> + * 1) The readl_relaxed() is about 20ns faster than readl(). >> + * 2) The writel_relaxed() is about 123ns faster than writel(). > > These are not really the performance numbers I asked for, as a > low-level benchmark comparing the instructions is rather meaningless. > The time spent waiting for the barrier depends on what else is going > on around the barrier. Also, most of the time would likely be > spent spinning in the loop around readl() while the cache operations > are in progress, so the latency of a single readl() is not necessarily > significant. > > To have a more useful performance number, try mentioning the > most performance sensitive non-coherent DMA master on one > of the chips that has this cache controller, and a high-level > performance number such as "1.2% more network packets per > second" if that is something you can measure easily. It's not easy. My board only have debugging NIC, only the downstream products have high-speed service NIC. Software needs to be packaged layer by layer. > > Of course, if all high-speed DMA masters on this chip are > cache coherent, there is no need for performance numbers, just > mention that we don't care about speed in that case. It's not cache coherent, otherwise, the L3 cache does not need to be operated. > > Arnd > > . > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller @ 2021-02-02 12:18 ` Leizhen (ThunderTown) 0 siblings, 0 replies; 18+ messages in thread From: Leizhen (ThunderTown) @ 2021-02-02 12:18 UTC (permalink / raw) To: Arnd Bergmann Cc: devicetree, Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, linux-kernel, Haojian Zhuang, Rob Herring, Wei Xu, Russell King, linux-arm-kernel On 2021/2/2 16:44, Arnd Bergmann wrote: > On Tue, Feb 2, 2021 at 8:16 AM Zhen Lei <thunder.leizhen@huawei.com> wrote: >> + >> +/* >> + * All read and write operations on L3 cache registers are protected by the >> + * spinlock, except for l3cache_init(). Each time the L3 cache operation is >> + * performed, all related information is filled into its registers. Therefore, >> + * there is no memory order problem when only _relaxed() functions are used. > > Thank you for including the text. > > I don't think the explanation with the spin_lock() explains why this > can be considered safe though, as spin_lock() only contains serialization > against other CPUs (smp_mb()) rather than the stronger DMA barriers > implied by readl and writel. As Russell previously explained, these > barriers are the L1 cache operations (e.g. v7_dma_inv_range) do > include stronger barriers, so it would be better to come up with a > justification based on those. Okay, I'll correct the description. > >> + * This can help us achieve some performance improvement: >> + * 1) The readl_relaxed() is about 20ns faster than readl(). >> + * 2) The writel_relaxed() is about 123ns faster than writel(). > > These are not really the performance numbers I asked for, as a > low-level benchmark comparing the instructions is rather meaningless. > The time spent waiting for the barrier depends on what else is going > on around the barrier. Also, most of the time would likely be > spent spinning in the loop around readl() while the cache operations > are in progress, so the latency of a single readl() is not necessarily > significant. > > To have a more useful performance number, try mentioning the > most performance sensitive non-coherent DMA master on one > of the chips that has this cache controller, and a high-level > performance number such as "1.2% more network packets per > second" if that is something you can measure easily. It's not easy. My board only have debugging NIC, only the downstream products have high-speed service NIC. Software needs to be packaged layer by layer. > > Of course, if all high-speed DMA masters on this chip are > cache coherent, there is no need for performance numbers, just > mention that we don't care about speed in that case. It's not cache coherent, otherwise, the L3 cache does not need to be operated. > > Arnd > > . > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller 2021-02-02 12:18 ` Leizhen (ThunderTown) @ 2021-02-02 15:54 ` Arnd Bergmann -1 siblings, 0 replies; 18+ messages in thread From: Arnd Bergmann @ 2021-02-02 15:54 UTC (permalink / raw) To: Leizhen (ThunderTown) Cc: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel On Tue, Feb 2, 2021 at 1:18 PM Leizhen (ThunderTown) <thunder.leizhen@huawei.com> wrote: > On 2021/2/2 16:44, Arnd Bergmann wrote: > > > > To have a more useful performance number, try mentioning the > > most performance sensitive non-coherent DMA master on one > > of the chips that has this cache controller, and a high-level > > performance number such as "1.2% more network packets per > > second" if that is something you can measure easily. > > It's not easy. My board only have debugging NIC, only the downstream > products have high-speed service NIC. Software needs to be packaged > layer by layer. > > > > > Of course, if all high-speed DMA masters on this chip are > > cache coherent, there is no need for performance numbers, just > > mention that we don't care about speed in that case. > > It's not cache coherent, otherwise, the L3 cache does not need to be > operated. Ok, I see. In this case, just explain that the high-speed NIC is not cache-coherent, so this is expected to make a difference, even if you can't quantify it exactly. Arnd ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v7 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller @ 2021-02-02 15:54 ` Arnd Bergmann 0 siblings, 0 replies; 18+ messages in thread From: Arnd Bergmann @ 2021-02-02 15:54 UTC (permalink / raw) To: Leizhen (ThunderTown) Cc: devicetree, Arnd Bergmann, Greg Kroah-Hartman, Will Deacon, linux-kernel, Haojian Zhuang, Rob Herring, Wei Xu, Russell King, linux-arm-kernel On Tue, Feb 2, 2021 at 1:18 PM Leizhen (ThunderTown) <thunder.leizhen@huawei.com> wrote: > On 2021/2/2 16:44, Arnd Bergmann wrote: > > > > To have a more useful performance number, try mentioning the > > most performance sensitive non-coherent DMA master on one > > of the chips that has this cache controller, and a high-level > > performance number such as "1.2% more network packets per > > second" if that is something you can measure easily. > > It's not easy. My board only have debugging NIC, only the downstream > products have high-speed service NIC. Software needs to be packaged > layer by layer. > > > > > Of course, if all high-speed DMA masters on this chip are > > cache coherent, there is no need for performance numbers, just > > mention that we don't care about speed in that case. > > It's not cache coherent, otherwise, the L3 cache does not need to be > operated. Ok, I see. In this case, just explain that the high-speed NIC is not cache-coherent, so this is expected to make a difference, even if you can't quantify it exactly. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-02-05 21:14 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-02 7:16 [PATCH v7 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller Zhen Lei 2021-02-02 7:16 ` Zhen Lei 2021-02-02 7:16 ` [PATCH v7 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks Zhen Lei 2021-02-02 7:16 ` Zhen Lei 2021-02-02 7:16 ` [PATCH v7 2/4] ARM: hisi: add support for Kunpeng50x SoC Zhen Lei 2021-02-02 7:16 ` Zhen Lei 2021-02-02 7:16 ` [PATCH v7 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller Zhen Lei 2021-02-02 7:16 ` Zhen Lei 2021-02-05 21:12 ` Rob Herring 2021-02-05 21:12 ` Rob Herring 2021-02-02 7:16 ` [PATCH v7 4/4] ARM: Add support for Hisilicon " Zhen Lei 2021-02-02 7:16 ` Zhen Lei 2021-02-02 8:44 ` Arnd Bergmann 2021-02-02 8:44 ` Arnd Bergmann 2021-02-02 12:18 ` Leizhen (ThunderTown) 2021-02-02 12:18 ` Leizhen (ThunderTown) 2021-02-02 15:54 ` Arnd Bergmann 2021-02-02 15:54 ` Arnd Bergmann
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.