linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
@ 2021-01-16  3:27 Zhen Lei
  2021-01-16  3:27 ` [PATCH v5 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks Zhen Lei
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Zhen Lei @ 2021-01-16  3:27 UTC (permalink / raw)
  To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel
  Cc: Zhen Lei

v4 --> v5:
1. Add SoC macro ARCH_KUNPENG50X, and the Kunpeng L3 cache controller only enabled
   on that platform.
2. Require the compatible string of the Kunpeng L3 cache controller must have a
   relevant name on a specific SoC. For example:
   compatible = "hisilicon,kunpeng509-l3cache", "hisilicon,kunpeng-l3cache";

v3 --> v4:
Rename the compatible string from "hisilicon,l3cache" to "hisilicon,kunpeng-l3cache".
Then adjust the file name, configuration option name, and description accordingly.

v2 --> v3:
Add Hisilicon L3 cache controller driver and its document. That's: patch 2-3.

v1 --> v2:
Discard the middle-tier functions and do silent narrowing cast in the outcache
hook functions. For example:
-static void l2c220_inv_range(unsigned long start, unsigned long end)
+static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;


v1:
Do cast phys_addr_t to unsigned long by adding a middle-tier function.
For example:
-static void l2c220_inv_range(unsigned long start, unsigned long end)
+static void __l2c220_inv_range(unsigned long start, unsigned long end)
 {
 	...
 }
+static void l2c220_inv_range(phys_addr_t start, phys_addr_t end)
+{
+  __l2c220_inv_range(start, end);
+}


Zhen Lei (4):
  ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache
    hooks
  ARM: hisi: add support for Kunpeng50x SoC
  dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache
    controller
  ARM: Add support for Hisilicon Kunpeng L3 cache controller

 .../arm/hisilicon/kunpeng-l3cache.yaml        |  40 +++++
 arch/arm/include/asm/outercache.h             |   6 +-
 arch/arm/mach-hisi/Kconfig                    |   8 +
 arch/arm/mm/Kconfig                           |  10 ++
 arch/arm/mm/Makefile                          |   1 +
 arch/arm/mm/cache-feroceon-l2.c               |  15 +-
 arch/arm/mm/cache-kunpeng-l3.c                | 153 ++++++++++++++++++
 arch/arm/mm/cache-kunpeng-l3.h                |  30 ++++
 arch/arm/mm/cache-l2x0.c                      |  50 ++++--
 arch/arm/mm/cache-tauros2.c                   |  15 +-
 arch/arm/mm/cache-uniphier.c                  |   6 +-
 arch/arm/mm/cache-xsc3l2.c                    |  12 +-
 12 files changed, 317 insertions(+), 29 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml
 create mode 100644 arch/arm/mm/cache-kunpeng-l3.c
 create mode 100644 arch/arm/mm/cache-kunpeng-l3.h

-- 
2.26.0.106.g9fadedd



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks
  2021-01-16  3:27 [PATCH v5 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller Zhen Lei
@ 2021-01-16  3:27 ` Zhen Lei
  2021-01-28 14:29   ` Arnd Bergmann
  2021-01-16  3:27 ` [PATCH v5 2/4] ARM: hisi: add support for Kunpeng50x SoC Zhen Lei
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Zhen Lei @ 2021-01-16  3:27 UTC (permalink / raw)
  To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel
  Cc: Zhen Lei

The outercache of some Hisilicon SOCs support physical addresses wider
than 32-bits. The unsigned long datatype is not sufficient for mapping
physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE:
use phys_addr_t instead of unsigned long in outercache functions") has
already modified the outercache functions. But the parameters of the
outercache hooks are not changed. This patch use phys_addr_t instead of
unsigned long in outercache hooks: inv_range, clean_range, flush_range.

To ensure the outercache that does not support LPAE works properly, do
cast phys_addr_t to unsigned long by adding a group of temporary
variables. For example:
-static void l2c220_inv_range(unsigned long start, unsigned long end)
+static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;

Note that the outercache functions have been doing this cast before this
patch. So now, the cast is just moved into the outercache hook functions.

No functional change.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm/include/asm/outercache.h |  6 ++--
 arch/arm/mm/cache-feroceon-l2.c   | 15 ++++++++--
 arch/arm/mm/cache-l2x0.c          | 50 ++++++++++++++++++++++---------
 arch/arm/mm/cache-tauros2.c       | 15 ++++++++--
 arch/arm/mm/cache-uniphier.c      |  6 ++--
 arch/arm/mm/cache-xsc3l2.c        | 12 ++++++--
 6 files changed, 75 insertions(+), 29 deletions(-)

diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h
index 3364637755e86aa..4cee1ea0c15449a 100644
--- a/arch/arm/include/asm/outercache.h
+++ b/arch/arm/include/asm/outercache.h
@@ -14,9 +14,9 @@
 struct l2x0_regs;
 
 struct outer_cache_fns {
-	void (*inv_range)(unsigned long, unsigned long);
-	void (*clean_range)(unsigned long, unsigned long);
-	void (*flush_range)(unsigned long, unsigned long);
+	void (*inv_range)(phys_addr_t, phys_addr_t);
+	void (*clean_range)(phys_addr_t, phys_addr_t);
+	void (*flush_range)(phys_addr_t, phys_addr_t);
 	void (*flush_all)(void);
 	void (*disable)(void);
 #ifdef CONFIG_OUTER_CACHE_SYNC
diff --git a/arch/arm/mm/cache-feroceon-l2.c b/arch/arm/mm/cache-feroceon-l2.c
index 5c1b7a7b9af6300..10f909744d5e963 100644
--- a/arch/arm/mm/cache-feroceon-l2.c
+++ b/arch/arm/mm/cache-feroceon-l2.c
@@ -168,8 +168,11 @@ static unsigned long calc_range_end(unsigned long start, unsigned long end)
 	return range_end;
 }
 
-static void feroceon_l2_inv_range(unsigned long start, unsigned long end)
+static void feroceon_l2_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
+
 	/*
 	 * Clean and invalidate partial first cache line.
 	 */
@@ -198,8 +201,11 @@ static void feroceon_l2_inv_range(unsigned long start, unsigned long end)
 	dsb();
 }
 
-static void feroceon_l2_clean_range(unsigned long start, unsigned long end)
+static void feroceon_l2_clean_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
+
 	/*
 	 * If L2 is forced to WT, the L2 will always be clean and we
 	 * don't need to do anything here.
@@ -217,8 +223,11 @@ static void feroceon_l2_clean_range(unsigned long start, unsigned long end)
 	dsb();
 }
 
-static void feroceon_l2_flush_range(unsigned long start, unsigned long end)
+static void feroceon_l2_flush_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
+
 	start &= ~(CACHE_LINE_SIZE - 1);
 	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
 	while (start != end) {
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 43d91bfd2360086..cdaddd772b09ede 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -184,8 +184,10 @@ static void __l2c210_op_pa_range(void __iomem *reg, unsigned long start,
 	}
 }
 
-static void l2c210_inv_range(unsigned long start, unsigned long end)
+static void l2c210_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	void __iomem *base = l2x0_base;
 
 	if (start & (CACHE_LINE_SIZE - 1)) {
@@ -203,8 +205,10 @@ static void l2c210_inv_range(unsigned long start, unsigned long end)
 	__l2c210_cache_sync(base);
 }
 
-static void l2c210_clean_range(unsigned long start, unsigned long end)
+static void l2c210_clean_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	void __iomem *base = l2x0_base;
 
 	start &= ~(CACHE_LINE_SIZE - 1);
@@ -212,8 +216,10 @@ static void l2c210_clean_range(unsigned long start, unsigned long end)
 	__l2c210_cache_sync(base);
 }
 
-static void l2c210_flush_range(unsigned long start, unsigned long end)
+static void l2c210_flush_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	void __iomem *base = l2x0_base;
 
 	start &= ~(CACHE_LINE_SIZE - 1);
@@ -304,8 +310,10 @@ static unsigned long l2c220_op_pa_range(void __iomem *reg, unsigned long start,
 	return flags;
 }
 
-static void l2c220_inv_range(unsigned long start, unsigned long end)
+static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	void __iomem *base = l2x0_base;
 	unsigned long flags;
 
@@ -331,8 +339,10 @@ static void l2c220_inv_range(unsigned long start, unsigned long end)
 	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
 }
 
-static void l2c220_clean_range(unsigned long start, unsigned long end)
+static void l2c220_clean_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	void __iomem *base = l2x0_base;
 	unsigned long flags;
 
@@ -350,8 +360,10 @@ static void l2c220_clean_range(unsigned long start, unsigned long end)
 	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
 }
 
-static void l2c220_flush_range(unsigned long start, unsigned long end)
+static void l2c220_flush_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	void __iomem *base = l2x0_base;
 	unsigned long flags;
 
@@ -464,8 +476,10 @@ static const struct l2c_init_data l2c220_data = {
  *	Affects: store buffer
  *	store buffer is not automatically drained.
  */
-static void l2c310_inv_range_erratum(unsigned long start, unsigned long end)
+static void l2c310_inv_range_erratum(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	void __iomem *base = l2x0_base;
 
 	if ((start | end) & (CACHE_LINE_SIZE - 1)) {
@@ -496,8 +510,10 @@ static void l2c310_inv_range_erratum(unsigned long start, unsigned long end)
 	__l2c210_cache_sync(base);
 }
 
-static void l2c310_flush_range_erratum(unsigned long start, unsigned long end)
+static void l2c310_flush_range_erratum(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	raw_spinlock_t *lock = &l2x0_lock;
 	unsigned long flags;
 	void __iomem *base = l2x0_base;
@@ -1400,12 +1416,12 @@ static void aurora_pa_range(unsigned long start, unsigned long end,
 		start = range_end;
 	}
 }
-static void aurora_inv_range(unsigned long start, unsigned long end)
+static void aurora_inv_range(phys_addr_t start, phys_addr_t end)
 {
 	aurora_pa_range(start, end, AURORA_INVAL_RANGE_REG);
 }
 
-static void aurora_clean_range(unsigned long start, unsigned long end)
+static void aurora_clean_range(phys_addr_t start, phys_addr_t end)
 {
 	/*
 	 * If L2 is forced to WT, the L2 will always be clean and we
@@ -1415,7 +1431,7 @@ static void aurora_clean_range(unsigned long start, unsigned long end)
 		aurora_pa_range(start, end, AURORA_CLEAN_RANGE_REG);
 }
 
-static void aurora_flush_range(unsigned long start, unsigned long end)
+static void aurora_flush_range(phys_addr_t start, phys_addr_t end)
 {
 	if (l2_wt_override)
 		aurora_pa_range(start, end, AURORA_INVAL_RANGE_REG);
@@ -1604,8 +1620,10 @@ static inline unsigned long bcm_l2_phys_addr(unsigned long addr)
 		return addr + BCM_VC_EMI_OFFSET;
 }
 
-static void bcm_inv_range(unsigned long start, unsigned long end)
+static void bcm_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	unsigned long new_start, new_end;
 
 	BUG_ON(start < BCM_SYS_EMI_START_ADDR);
@@ -1631,8 +1649,10 @@ static void bcm_inv_range(unsigned long start, unsigned long end)
 		new_end);
 }
 
-static void bcm_clean_range(unsigned long start, unsigned long end)
+static void bcm_clean_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	unsigned long new_start, new_end;
 
 	BUG_ON(start < BCM_SYS_EMI_START_ADDR);
@@ -1658,8 +1678,10 @@ static void bcm_clean_range(unsigned long start, unsigned long end)
 		new_end);
 }
 
-static void bcm_flush_range(unsigned long start, unsigned long end)
+static void bcm_flush_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	unsigned long new_start, new_end;
 
 	BUG_ON(start < BCM_SYS_EMI_START_ADDR);
diff --git a/arch/arm/mm/cache-tauros2.c b/arch/arm/mm/cache-tauros2.c
index 88255bea65e41e6..d768bbb5e05c690 100644
--- a/arch/arm/mm/cache-tauros2.c
+++ b/arch/arm/mm/cache-tauros2.c
@@ -66,8 +66,11 @@ static inline void tauros2_inv_pa(unsigned long addr)
  */
 #define CACHE_LINE_SIZE		32
 
-static void tauros2_inv_range(unsigned long start, unsigned long end)
+static void tauros2_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
+
 	/*
 	 * Clean and invalidate partial first cache line.
 	 */
@@ -95,8 +98,11 @@ static void tauros2_inv_range(unsigned long start, unsigned long end)
 	dsb();
 }
 
-static void tauros2_clean_range(unsigned long start, unsigned long end)
+static void tauros2_clean_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
+
 	start &= ~(CACHE_LINE_SIZE - 1);
 	while (start < end) {
 		tauros2_clean_pa(start);
@@ -106,8 +112,11 @@ static void tauros2_clean_range(unsigned long start, unsigned long end)
 	dsb();
 }
 
-static void tauros2_flush_range(unsigned long start, unsigned long end)
+static void tauros2_flush_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
+
 	start &= ~(CACHE_LINE_SIZE - 1);
 	while (start < end) {
 		tauros2_clean_inv_pa(start);
diff --git a/arch/arm/mm/cache-uniphier.c b/arch/arm/mm/cache-uniphier.c
index ff2881458504329..e2508358e9f4082 100644
--- a/arch/arm/mm/cache-uniphier.c
+++ b/arch/arm/mm/cache-uniphier.c
@@ -250,17 +250,17 @@ static void uniphier_cache_maint_all(u32 operation)
 		__uniphier_cache_maint_all(data, operation);
 }
 
-static void uniphier_cache_inv_range(unsigned long start, unsigned long end)
+static void uniphier_cache_inv_range(phys_addr_t start, phys_addr_t end)
 {
 	uniphier_cache_maint_range(start, end, UNIPHIER_SSCOQM_CM_INV);
 }
 
-static void uniphier_cache_clean_range(unsigned long start, unsigned long end)
+static void uniphier_cache_clean_range(phys_addr_t start, phys_addr_t end)
 {
 	uniphier_cache_maint_range(start, end, UNIPHIER_SSCOQM_CM_CLEAN);
 }
 
-static void uniphier_cache_flush_range(unsigned long start, unsigned long end)
+static void uniphier_cache_flush_range(phys_addr_t start, phys_addr_t end)
 {
 	uniphier_cache_maint_range(start, end, UNIPHIER_SSCOQM_CM_FLUSH);
 }
diff --git a/arch/arm/mm/cache-xsc3l2.c b/arch/arm/mm/cache-xsc3l2.c
index d20d7af02d10fc0..5814731653d9091 100644
--- a/arch/arm/mm/cache-xsc3l2.c
+++ b/arch/arm/mm/cache-xsc3l2.c
@@ -83,8 +83,10 @@ static inline unsigned long l2_map_va(unsigned long pa, unsigned long prev_va)
 #endif
 }
 
-static void xsc3_l2_inv_range(unsigned long start, unsigned long end)
+static void xsc3_l2_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	unsigned long vaddr;
 
 	if (start == 0 && end == -1ul) {
@@ -127,8 +129,10 @@ static void xsc3_l2_inv_range(unsigned long start, unsigned long end)
 	dsb();
 }
 
-static void xsc3_l2_clean_range(unsigned long start, unsigned long end)
+static void xsc3_l2_clean_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	unsigned long vaddr;
 
 	vaddr = -1;  /* to force the first mapping */
@@ -165,8 +169,10 @@ static inline void xsc3_l2_flush_all(void)
 	dsb();
 }
 
-static void xsc3_l2_flush_range(unsigned long start, unsigned long end)
+static void xsc3_l2_flush_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;
 	unsigned long vaddr;
 
 	if (start == 0 && end == -1ul) {
-- 
2.26.0.106.g9fadedd



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 2/4] ARM: hisi: add support for Kunpeng50x SoC
  2021-01-16  3:27 [PATCH v5 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller Zhen Lei
  2021-01-16  3:27 ` [PATCH v5 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks Zhen Lei
@ 2021-01-16  3:27 ` Zhen Lei
  2021-01-28 14:28   ` Arnd Bergmann
  2021-01-16  3:27 ` [PATCH v5 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller Zhen Lei
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Zhen Lei @ 2021-01-16  3:27 UTC (permalink / raw)
  To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel
  Cc: Zhen Lei

Enable support for the Hisilicon Kunpeng506 and Kunpeng509 SoC.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm/mach-hisi/Kconfig | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/mach-hisi/Kconfig b/arch/arm/mach-hisi/Kconfig
index 2e980f834a6aa1b..c724acc5c642b97 100644
--- a/arch/arm/mach-hisi/Kconfig
+++ b/arch/arm/mach-hisi/Kconfig
@@ -55,6 +55,14 @@ config ARCH_HIX5HD2
 	help
 	  Support for Hisilicon HIX5HD2 SoC family
 
+config ARCH_KUNPENG50X
+	bool "Hisilicon Kunpeng50x family"
+	depends on ARCH_MULTI_V7
+	select ARCH_FLATMEM_ENABLE
+	select ARCH_HAS_HOLES_MEMORYMODEL if SPARSEMEM
+	help
+	  Support for Hisilicon Kunpeng506 and Kunpeng509 SoC family
+
 config ARCH_SD5203
 	bool "Hisilicon SD5203 family"
 	depends on ARCH_MULTI_V5
-- 
2.26.0.106.g9fadedd



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller
  2021-01-16  3:27 [PATCH v5 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller Zhen Lei
  2021-01-16  3:27 ` [PATCH v5 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks Zhen Lei
  2021-01-16  3:27 ` [PATCH v5 2/4] ARM: hisi: add support for Kunpeng50x SoC Zhen Lei
@ 2021-01-16  3:27 ` Zhen Lei
  2021-01-28 14:25   ` Arnd Bergmann
  2021-01-16  3:27 ` [PATCH v5 4/4] ARM: Add support for Hisilicon " Zhen Lei
  2021-01-28  1:30 ` [PATCH v5 0/4] " Leizhen (ThunderTown)
  4 siblings, 1 reply; 14+ messages in thread
From: Zhen Lei @ 2021-01-16  3:27 UTC (permalink / raw)
  To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel
  Cc: Zhen Lei

Add devicetree binding for Hisilicon Kunpeng L3 cache controller.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 .../arm/hisilicon/kunpeng-l3cache.yaml        | 40 +++++++++++++++++++
 1 file changed, 40 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml

diff --git a/Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml b/Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml
new file mode 100644
index 000000000000000..5bf33c0e4d14b7f
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml
@@ -0,0 +1,40 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/arm/hisilicon/kunpeng-l3cache.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Hisilicon Kunpeng L3 cache controller
+
+maintainers:
+  - Wei Xu <xuwei5@hisilicon.com>
+
+description: |
+  The Hisilicon Kunpeng L3 outer cache controller supports a maximum of 36-bit
+  physical addresses. The data cached in the L3 outer cache can be operated
+  based on the physical address range or the entire cache.
+
+properties:
+  compatible:
+    items:
+      - enum:
+          - hisilicon,kunpeng506-l3cache
+          - hisilicon,kunpeng509-l3cache
+      - const: hisilicon,kunpeng-l3cache
+
+  reg:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+
+additionalProperties: false
+
+examples:
+  - |
+    l3cache@f302b000 {
+        compatible = "hisilicon,kunpeng509-l3cache", "hisilicon,kunpeng-l3cache";
+        reg = <0xf302b000 0x1000>;
+    };
+...
-- 
2.26.0.106.g9fadedd



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
  2021-01-16  3:27 [PATCH v5 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller Zhen Lei
                   ` (2 preceding siblings ...)
  2021-01-16  3:27 ` [PATCH v5 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller Zhen Lei
@ 2021-01-16  3:27 ` Zhen Lei
  2021-01-28 14:24   ` Arnd Bergmann
  2021-01-28  1:30 ` [PATCH v5 0/4] " Leizhen (ThunderTown)
  4 siblings, 1 reply; 14+ messages in thread
From: Zhen Lei @ 2021-01-16  3:27 UTC (permalink / raw)
  To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel
  Cc: Zhen Lei

Add support for the Hisilicon Kunpeng L3 cache controller as used with
Kunpeng506 and Kunpeng509 SoCs.

These Hisilicon SoCs support LPAE, so the physical addresses is wider than
32-bits, but the actual bit width does not exceed 36 bits. When the cache
operation is performed based on the address range, the upper 30 bits of
the physical address are recorded in registers L3_MAINT_START and
L3_MAINT_END, and ignore the lower 6 bits cacheline offset.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm/mm/Kconfig            |  10 +++
 arch/arm/mm/Makefile           |   1 +
 arch/arm/mm/cache-kunpeng-l3.c | 153 +++++++++++++++++++++++++++++++++
 arch/arm/mm/cache-kunpeng-l3.h |  30 +++++++
 4 files changed, 194 insertions(+)
 create mode 100644 arch/arm/mm/cache-kunpeng-l3.c
 create mode 100644 arch/arm/mm/cache-kunpeng-l3.h

diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 02692fbe2db5c59..d2082503de053d2 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -1070,6 +1070,16 @@ config CACHE_XSC3L2
 	help
 	  This option enables the L2 cache on XScale3.
 
+config CACHE_KUNPENG_L3
+	bool "Enable the Hisilicon Kunpeng L3 cache controller"
+	depends on ARCH_KUNPENG50X && OF
+	default y
+	select OUTER_CACHE
+	help
+	  This option enables the Kunpeng L3 cache controller on Hisilicon
+	  Kunpeng506 and Kunpeng509 SoCs. It supports a maximum of 36-bit
+	  physical addresses.
+
 config ARM_L1_CACHE_SHIFT_6
 	bool
 	default y if CPU_V7
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 3510503bc5e688b..ececc5489e353eb 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -112,6 +112,7 @@ obj-$(CONFIG_CACHE_L2X0_PMU)	+= cache-l2x0-pmu.o
 obj-$(CONFIG_CACHE_XSC3L2)	+= cache-xsc3l2.o
 obj-$(CONFIG_CACHE_TAUROS2)	+= cache-tauros2.o
 obj-$(CONFIG_CACHE_UNIPHIER)	+= cache-uniphier.o
+obj-$(CONFIG_CACHE_KUNPENG_L3)	+= cache-kunpeng-l3.o
 
 KASAN_SANITIZE_kasan_init.o	:= n
 obj-$(CONFIG_KASAN)		+= kasan_init.o
diff --git a/arch/arm/mm/cache-kunpeng-l3.c b/arch/arm/mm/cache-kunpeng-l3.c
new file mode 100644
index 000000000000000..cb81f15d26a0cf2
--- /dev/null
+++ b/arch/arm/mm/cache-kunpeng-l3.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 Hisilicon Limited.
+ */
+
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/io.h>
+#include <linux/of_address.h>
+
+#include <asm/cacheflush.h>
+
+#include "cache-kunpeng-l3.h"
+
+static DEFINE_SPINLOCK(l3cache_lock);
+static void __iomem *l3_ctrl_base;
+
+
+static void l3cache_maint_common(u32 range, u32 op_type)
+{
+	u32 reg;
+
+	reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+	reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
+	reg |= range | op_type;
+	reg |= L3_MAINT_STATUS_START;
+	writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
+
+	/* Wait until the hardware maintenance operation is complete. */
+	do {
+		cpu_relax();
+		reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+	} while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END);
+}
+
+static void l3cache_maint_range(phys_addr_t start, phys_addr_t end, u32 op_type)
+{
+	start = start >> L3_CACHE_LINE_SHITF;
+	end = ((end - 1) >> L3_CACHE_LINE_SHITF) + 1;
+
+	writel(start, l3_ctrl_base + L3_MAINT_START);
+	writel(end, l3_ctrl_base + L3_MAINT_END);
+
+	l3cache_maint_common(L3_MAINT_RANGE_ADDR, op_type);
+}
+
+static inline void l3cache_flush_all_nolock(void)
+{
+	l3cache_maint_common(L3_MAINT_RANGE_ALL, L3_MAINT_TYPE_FLUSH);
+}
+
+static void l3cache_flush_all(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&l3cache_lock, flags);
+	l3cache_flush_all_nolock();
+	spin_unlock_irqrestore(&l3cache_lock, flags);
+}
+
+static void l3cache_inv_range(phys_addr_t start, phys_addr_t end)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&l3cache_lock, flags);
+	l3cache_maint_range(start, end, L3_MAINT_TYPE_INV);
+	spin_unlock_irqrestore(&l3cache_lock, flags);
+}
+
+static void l3cache_clean_range(phys_addr_t start, phys_addr_t end)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&l3cache_lock, flags);
+	l3cache_maint_range(start, end, L3_MAINT_TYPE_CLEAN);
+	spin_unlock_irqrestore(&l3cache_lock, flags);
+}
+
+static void l3cache_flush_range(phys_addr_t start, phys_addr_t end)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&l3cache_lock, flags);
+	l3cache_maint_range(start, end, L3_MAINT_TYPE_FLUSH);
+	spin_unlock_irqrestore(&l3cache_lock, flags);
+}
+
+static void l3cache_disable(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&l3cache_lock, flags);
+	l3cache_flush_all_nolock();
+	writel(L3_CTRL_DISABLE, l3_ctrl_base + L3_CTRL);
+	spin_unlock_irqrestore(&l3cache_lock, flags);
+}
+
+static const struct of_device_id l3cache_ids[] __initconst = {
+	{.compatible = "hisilicon,kunpeng-l3cache", .data = NULL},
+	{}
+};
+
+static int __init l3cache_init(void)
+{
+	u32 reg;
+	struct device_node *node;
+
+	node = of_find_matching_node(NULL, l3cache_ids);
+	if (!node)
+		return -ENODEV;
+
+	l3_ctrl_base = of_iomap(node, 0);
+	if (!l3_ctrl_base) {
+		pr_err("failed to map Kunpeng L3 cache controller registers\n");
+		return -ENOMEM;
+	}
+
+	reg = readl(l3_ctrl_base + L3_CTRL);
+	if (!(reg & L3_CTRL_ENABLE)) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&l3cache_lock, flags);
+
+		/*
+		 * Ensure that no L3 cache hardware maintenance operations are
+		 * being performed before enabling the L3 cache. Wait for it to
+		 * finish.
+		 */
+		do {
+			cpu_relax();
+			reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+		} while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END);
+
+		reg = readl(l3_ctrl_base + L3_AUCTRL);
+		reg |= L3_AUCTRL_EVENT_EN | L3_AUCTRL_ECC_EN;
+		writel(reg, l3_ctrl_base + L3_AUCTRL);
+
+		writel(L3_CTRL_ENABLE, l3_ctrl_base + L3_CTRL);
+
+		spin_unlock_irqrestore(&l3cache_lock, flags);
+	}
+
+	outer_cache.inv_range = l3cache_inv_range;
+	outer_cache.clean_range = l3cache_clean_range;
+	outer_cache.flush_range = l3cache_flush_range;
+	outer_cache.flush_all = l3cache_flush_all;
+	outer_cache.disable = l3cache_disable;
+
+	pr_info("Hisilicon Kunpeng L3 cache controller enabled\n");
+
+	return 0;
+}
+arch_initcall(l3cache_init);
diff --git a/arch/arm/mm/cache-kunpeng-l3.h b/arch/arm/mm/cache-kunpeng-l3.h
new file mode 100644
index 000000000000000..9ef6a53e7d4db49
--- /dev/null
+++ b/arch/arm/mm/cache-kunpeng-l3.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __CACHE_KUNPENG_L3_H
+#define __CACHE_KUNPENG_L3_H
+
+#define L3_CACHE_LINE_SHITF		6
+
+#define L3_CTRL				0x0
+#define L3_CTRL_ENABLE			(1U << 0)
+#define L3_CTRL_DISABLE			(0U << 0)
+
+#define L3_AUCTRL			0x4
+#define L3_AUCTRL_EVENT_EN		BIT(23)
+#define L3_AUCTRL_ECC_EN		BIT(8)
+
+#define L3_MAINT_CTRL			0x20
+#define L3_MAINT_RANGE_MASK		GENMASK(3, 3)
+#define L3_MAINT_RANGE_ALL		(0U << 3)
+#define L3_MAINT_RANGE_ADDR		(1U << 3)
+#define L3_MAINT_TYPE_MASK		GENMASK(2, 1)
+#define L3_MAINT_TYPE_CLEAN		(1U << 1)
+#define L3_MAINT_TYPE_INV		(2U << 1)
+#define L3_MAINT_TYPE_FLUSH		(3U << 1)
+#define L3_MAINT_STATUS_MASK		GENMASK(0, 0)
+#define L3_MAINT_STATUS_START		(1U << 0)
+#define L3_MAINT_STATUS_END		(0U << 0)
+
+#define L3_MAINT_START			0x24
+#define L3_MAINT_END			0x28
+
+#endif
-- 
2.26.0.106.g9fadedd



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
  2021-01-16  3:27 [PATCH v5 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller Zhen Lei
                   ` (3 preceding siblings ...)
  2021-01-16  3:27 ` [PATCH v5 4/4] ARM: Add support for Hisilicon " Zhen Lei
@ 2021-01-28  1:30 ` Leizhen (ThunderTown)
  4 siblings, 0 replies; 14+ messages in thread
From: Leizhen (ThunderTown) @ 2021-01-28  1:30 UTC (permalink / raw)
  To: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel

Hi Russell and Arnd:
  Do you have time to review it?


On 2021/1/16 11:27, Zhen Lei wrote:
> v4 --> v5:
> 1. Add SoC macro ARCH_KUNPENG50X, and the Kunpeng L3 cache controller only enabled
>    on that platform.
> 2. Require the compatible string of the Kunpeng L3 cache controller must have a
>    relevant name on a specific SoC. For example:
>    compatible = "hisilicon,kunpeng509-l3cache", "hisilicon,kunpeng-l3cache";
> 
> v3 --> v4:
> Rename the compatible string from "hisilicon,l3cache" to "hisilicon,kunpeng-l3cache".
> Then adjust the file name, configuration option name, and description accordingly.
> 
> v2 --> v3:
> Add Hisilicon L3 cache controller driver and its document. That's: patch 2-3.
> 
> v1 --> v2:
> Discard the middle-tier functions and do silent narrowing cast in the outcache
> hook functions. For example:
> -static void l2c220_inv_range(unsigned long start, unsigned long end)
> +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
>  {
> +	unsigned long start = pa_start;
> +	unsigned long end = pa_end;
> 
> 
> v1:
> Do cast phys_addr_t to unsigned long by adding a middle-tier function.
> For example:
> -static void l2c220_inv_range(unsigned long start, unsigned long end)
> +static void __l2c220_inv_range(unsigned long start, unsigned long end)
>  {
>  	...
>  }
> +static void l2c220_inv_range(phys_addr_t start, phys_addr_t end)
> +{
> +  __l2c220_inv_range(start, end);
> +}
> 
> 
> Zhen Lei (4):
>   ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache
>     hooks
>   ARM: hisi: add support for Kunpeng50x SoC
>   dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache
>     controller
>   ARM: Add support for Hisilicon Kunpeng L3 cache controller
> 
>  .../arm/hisilicon/kunpeng-l3cache.yaml        |  40 +++++
>  arch/arm/include/asm/outercache.h             |   6 +-
>  arch/arm/mach-hisi/Kconfig                    |   8 +
>  arch/arm/mm/Kconfig                           |  10 ++
>  arch/arm/mm/Makefile                          |   1 +
>  arch/arm/mm/cache-feroceon-l2.c               |  15 +-
>  arch/arm/mm/cache-kunpeng-l3.c                | 153 ++++++++++++++++++
>  arch/arm/mm/cache-kunpeng-l3.h                |  30 ++++
>  arch/arm/mm/cache-l2x0.c                      |  50 ++++--
>  arch/arm/mm/cache-tauros2.c                   |  15 +-
>  arch/arm/mm/cache-uniphier.c                  |   6 +-
>  arch/arm/mm/cache-xsc3l2.c                    |  12 +-
>  12 files changed, 317 insertions(+), 29 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml
>  create mode 100644 arch/arm/mm/cache-kunpeng-l3.c
>  create mode 100644 arch/arm/mm/cache-kunpeng-l3.h
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
  2021-01-16  3:27 ` [PATCH v5 4/4] ARM: Add support for Hisilicon " Zhen Lei
@ 2021-01-28 14:24   ` Arnd Bergmann
  2021-01-29  7:23     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2021-01-28 14:24 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel

On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
> +
> +static void l3cache_maint_common(u32 range, u32 op_type)
> +{
> +       u32 reg;
> +
> +       reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
> +       reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
> +       reg |= range | op_type;
> +       reg |= L3_MAINT_STATUS_START;
> +       writel(reg, l3_ctrl_base + L3_MAINT_CTRL);

Are there contents of L3_MAINT_CTRL that need to be preserved
across calls and can not be inferred? A 'readl()' is often expensive,
so it might be more efficient if you can avoid that.

> +static inline void l3cache_flush_all_nolock(void)
> +{
> +       l3cache_maint_common(L3_MAINT_RANGE_ALL, L3_MAINT_TYPE_FLUSH);
> +}
> +
> +static void l3cache_flush_all(void)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&l3cache_lock, flags);
> +       l3cache_flush_all_nolock();
> +       spin_unlock_irqrestore(&l3cache_lock, flags);
> +}

I see that cache-l2x0 uses raw_spin_lock_irqsave() instead of
spin_lock_irqsave(), to avoid preemption in the middle of a cache
operation. This is probably a good idea here as well.

I also see that l2x0 uses readl_relaxed(), to avoid a deadlock
in l2x0_cache_sync(). This may also be beneficial for performance
reasons, so it might be helpful to compare performance
overhead. On the other hand, readl()/writel() are usually the
safe choice, as those avoid the need to argue over whether
the relaxed versions are safe in all corner cases.

> +static int __init l3cache_init(void)
> +{
> +       u32 reg;
> +       struct device_node *node;
> +
> +       node = of_find_matching_node(NULL, l3cache_ids);
> +       if (!node)
> +               return -ENODEV;

I think the initcall should return '0' to indicate success when running
a kernel with this driver built-in on a platform that does not have
this device.

> diff --git a/arch/arm/mm/cache-kunpeng-l3.h b/arch/arm/mm/cache-kunpeng-l3.h
> new file mode 100644
> index 000000000000000..9ef6a53e7d4db49
> --- /dev/null
> +++ b/arch/arm/mm/cache-kunpeng-l3.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __CACHE_KUNPENG_L3_H
> +#define __CACHE_KUNPENG_L3_H
> +
> +#define L3_CACHE_LINE_SHITF            6

I would suggest moving the contents of the header file into the .c file,
since there is only a single user of these macros.

          Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller
  2021-01-16  3:27 ` [PATCH v5 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller Zhen Lei
@ 2021-01-28 14:25   ` Arnd Bergmann
  0 siblings, 0 replies; 14+ messages in thread
From: Arnd Bergmann @ 2021-01-28 14:25 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel

On Sat, Jan 16, 2021 at 4:34 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
>
> Add devicetree binding for Hisilicon Kunpeng L3 cache controller.
>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  .../arm/hisilicon/kunpeng-l3cache.yaml        | 40 +++++++++++++++++++

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/4] ARM: hisi: add support for Kunpeng50x SoC
  2021-01-16  3:27 ` [PATCH v5 2/4] ARM: hisi: add support for Kunpeng50x SoC Zhen Lei
@ 2021-01-28 14:28   ` Arnd Bergmann
  0 siblings, 0 replies; 14+ messages in thread
From: Arnd Bergmann @ 2021-01-28 14:28 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel

On Sat, Jan 16, 2021 at 4:32 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
>
> Enable support for the Hisilicon Kunpeng506 and Kunpeng509 SoC.
>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  arch/arm/mach-hisi/Kconfig | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/arm/mach-hisi/Kconfig b/arch/arm/mach-hisi/Kconfig
> index 2e980f834a6aa1b..c724acc5c642b97 100644
> --- a/arch/arm/mach-hisi/Kconfig
> +++ b/arch/arm/mach-hisi/Kconfig
> @@ -55,6 +55,14 @@ config ARCH_HIX5HD2
>         help
>           Support for Hisilicon HIX5HD2 SoC family
>
> +config ARCH_KUNPENG50X
> +       bool "Hisilicon Kunpeng50x family"
> +       depends on ARCH_MULTI_V7
> +       select ARCH_FLATMEM_ENABLE
> +       select ARCH_HAS_HOLES_MEMORYMODEL if SPARSEMEM

I think the two 'select' statements are both wrong, though for
different reasons:

- ARCH_FLATMEM_ENABLE is already selected by ARCH_MULTIPLATFORM,
  and is something that should not be platform specific

- ARCH_HAS_HOLES_MEMORYMODEL was removed in v5.11,
  and should also not be selected by a platform.

Otherwise, this seems fine.

         Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks
  2021-01-16  3:27 ` [PATCH v5 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks Zhen Lei
@ 2021-01-28 14:29   ` Arnd Bergmann
  0 siblings, 0 replies; 14+ messages in thread
From: Arnd Bergmann @ 2021-01-28 14:29 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel

On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
>
> The outercache of some Hisilicon SOCs support physical addresses wider
> than 32-bits. The unsigned long datatype is not sufficient for mapping
> physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE:
> use phys_addr_t instead of unsigned long in outercache functions") has
> already modified the outercache functions. But the parameters of the
> outercache hooks are not changed. This patch use phys_addr_t instead of
> unsigned long in outercache hooks: inv_range, clean_range, flush_range.
>
> To ensure the outercache that does not support LPAE works properly, do
> cast phys_addr_t to unsigned long by adding a group of temporary
> variables. For example:
> -static void l2c220_inv_range(unsigned long start, unsigned long end)
> +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
>  {
> +       unsigned long start = pa_start;
> +       unsigned long end = pa_end;
>
> Note that the outercache functions have been doing this cast before this
> patch. So now, the cast is just moved into the outercache hook functions.
>
> No functional change.
>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
  2021-01-28 14:24   ` Arnd Bergmann
@ 2021-01-29  7:23     ` Leizhen (ThunderTown)
       [not found]       ` <CAK8P3a3Hj0Hyc8mVdGYhB7AEuHCYbhGxHnhNk1xWonEmxZOxRw@mail.gmail.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Leizhen (ThunderTown) @ 2021-01-29  7:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King, Greg Kroah-Hartman, Will Deacon, Haojian Zhuang,
	Arnd Bergmann, Rob Herring, Wei Xu, devicetree, linux-arm-kernel,
	linux-kernel



On 2021/1/28 22:24, Arnd Bergmann wrote:
> On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
>> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
>> +
>> +static void l3cache_maint_common(u32 range, u32 op_type)
>> +{
>> +       u32 reg;
>> +
>> +       reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>> +       reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>> +       reg |= range | op_type;
>> +       reg |= L3_MAINT_STATUS_START;
>> +       writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
> 
> Are there contents of L3_MAINT_CTRL that need to be preserved
> across calls and can not be inferred? A 'readl()' is often expensive,
> so it might be more efficient if you can avoid that.

Right, this readl() can be replaced with readl_relaxed(). Thanks.

I'll check and correct the readl() and writel() in other places.

> 
>> +static inline void l3cache_flush_all_nolock(void)
>> +{
>> +       l3cache_maint_common(L3_MAINT_RANGE_ALL, L3_MAINT_TYPE_FLUSH);
>> +}
>> +
>> +static void l3cache_flush_all(void)
>> +{
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&l3cache_lock, flags);
>> +       l3cache_flush_all_nolock();
>> +       spin_unlock_irqrestore(&l3cache_lock, flags);
>> +}
> 
> I see that cache-l2x0 uses raw_spin_lock_irqsave() instead of
> spin_lock_irqsave(), to avoid preemption in the middle of a cache
> operation. This is probably a good idea here as well.

I don't think there's any essential difference between the two! I don't know
if the compiler or tool will do anything extra. I checked the git log of the
l2x0 driver and it used raw_spin_lock_irqsave() at the beginning. Maybe
there's a description in 2.6. Since you mentioned this potential risk, I'll
change it to raw_spin_lock_irqsave.

include/linux/spinlock.h:
static __always_inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
{
        return &lock->rlock;
}

#define spin_lock_irqsave(lock, flags)                          \
do {                                                            \
        raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
} while (0)

> 
> I also see that l2x0 uses readl_relaxed(), to avoid a deadlock
> in l2x0_cache_sync(). This may also be beneficial for performance
> reasons, so it might be helpful to compare performance
> overhead. On the other hand, readl()/writel() are usually the
> safe choice, as those avoid the need to argue over whether
> the relaxed versions are safe in all corner cases.
> 
>> +static int __init l3cache_init(void)
>> +{
>> +       u32 reg;
>> +       struct device_node *node;
>> +
>> +       node = of_find_matching_node(NULL, l3cache_ids);
>> +       if (!node)
>> +               return -ENODEV;
> 
> I think the initcall should return '0' to indicate success when running
> a kernel with this driver built-in on a platform that does not have
> this device.

I have added "depends on ARCH_KUNPENG50X" for this driver. But it's OK to
return 0.

> 
>> diff --git a/arch/arm/mm/cache-kunpeng-l3.h b/arch/arm/mm/cache-kunpeng-l3.h
>> new file mode 100644
>> index 000000000000000..9ef6a53e7d4db49
>> --- /dev/null
>> +++ b/arch/arm/mm/cache-kunpeng-l3.h
>> @@ -0,0 +1,30 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef __CACHE_KUNPENG_L3_H
>> +#define __CACHE_KUNPENG_L3_H
>> +
>> +#define L3_CACHE_LINE_SHITF            6
> 
> I would suggest moving the contents of the header file into the .c file,
> since there is only a single user of these macros.

Okay, I'll move it.

> 
>           Arnd
> 
> .
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
       [not found]         ` <CAK8P3a1j+mr3bCp2uCuuYzW0ygjTmGv9vELuNy7v-iQ=WoDMOw@mail.gmail.com>
@ 2021-01-29 10:33           ` Russell King - ARM Linux admin
  2021-01-30  2:51             ` Leizhen (ThunderTown)
       [not found]           ` <6c4d3650-0040-06d4-4342-79392738877b@huawei.com>
  1 sibling, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux admin @ 2021-01-29 10:33 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Leizhen (ThunderTown),
	Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann,
	Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel

On Fri, Jan 29, 2021 at 11:26:38AM +0100, Arnd Bergmann wrote:
> Another clarification, as there are actually two independent
> points here:
> 
> * if you can completely remove the readl() above and just write a
>   hardcoded value into the register, or perhaps read the original
>   value once at boot time, that is probably a win because it
>   avoids one of the barriers in the beginning. The datasheet should
>   tell you if there are any bits in the register that have to be
>   preserved
> 
> * Regarding the _relaxed() accessors, it's a lot harder to know
>   whether that is safe, as you first have to show, in particular in case
>   any of the accesses stop being guarded by the spinlock in that
>   case, and whether there may be a case where you have to
>   serialize the memory access against accesses that are still in the
>   store queue or prefetched.
> 
> Whether this matters at all depends mostly on the type of devices
> you are driving on your SoC. If you have any high-speed network
> interfaces that are unable to do cache coherent DMA, any extra
> instruction here may impact the number of packets you can transfer,
> but if all your high-speed devices are connected to a coherent
> interconnect, I would just go with the obvious approach and use
> the safe MMIO accessors everywhere.

For L2 cache code, I would say the opposite, actually, because it is
all too easy to get into a deadlock otherwise.

If you implement the sync callback, that will be called from every
non-relaxed accessor, which means if you need to take some kind of
lock in the sync callback and elsewhere in the L2 cache code, you will
definitely deadlock.

It is safer to put explicit barriers where it is necessary.

Also remember that the barrier in readl() etc is _after_ the read, not
before, and the barrier in writel() is _before_ the write, not after.
The point is to ensure that DMA memory accesses are properly ordered
with the IO-accessing instructions.

So, using readl_relaxed() with a read-modify-write is entirely sensible
provided you do not access DMA memory inbetween.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
  2021-01-29 10:33           ` Russell King - ARM Linux admin
@ 2021-01-30  2:51             ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 14+ messages in thread
From: Leizhen (ThunderTown) @ 2021-01-30  2:51 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Arnd Bergmann
  Cc: Greg Kroah-Hartman, Will Deacon, Haojian Zhuang, Arnd Bergmann,
	Rob Herring, Wei Xu, devicetree, linux-arm-kernel, linux-kernel



On 2021/1/29 18:33, Russell King - ARM Linux admin wrote:
> On Fri, Jan 29, 2021 at 11:26:38AM +0100, Arnd Bergmann wrote:
>> Another clarification, as there are actually two independent
>> points here:
>>
>> * if you can completely remove the readl() above and just write a
>>   hardcoded value into the register, or perhaps read the original
>>   value once at boot time, that is probably a win because it
>>   avoids one of the barriers in the beginning. The datasheet should
>>   tell you if there are any bits in the register that have to be
>>   preserved
>>
>> * Regarding the _relaxed() accessors, it's a lot harder to know
>>   whether that is safe, as you first have to show, in particular in case
>>   any of the accesses stop being guarded by the spinlock in that
>>   case, and whether there may be a case where you have to
>>   serialize the memory access against accesses that are still in the
>>   store queue or prefetched.
>>
>> Whether this matters at all depends mostly on the type of devices
>> you are driving on your SoC. If you have any high-speed network
>> interfaces that are unable to do cache coherent DMA, any extra
>> instruction here may impact the number of packets you can transfer,
>> but if all your high-speed devices are connected to a coherent
>> interconnect, I would just go with the obvious approach and use
>> the safe MMIO accessors everywhere.
> 
> For L2 cache code, I would say the opposite, actually, because it is
> all too easy to get into a deadlock otherwise.
> 
> If you implement the sync callback, that will be called from every
> non-relaxed accessor, which means if you need to take some kind of
> lock in the sync callback and elsewhere in the L2 cache code, you will
> definitely deadlock.
> 
> It is safer to put explicit barriers where it is necessary.
> 
> Also remember that the barrier in readl() etc is _after_ the read, not
> before, and the barrier in writel() is _before_ the write, not after.
> The point is to ensure that DMA memory accesses are properly ordered
> with the IO-accessing instructions.

Yes, I known it. writel() must be used for the write operations that control
"start/stop" or "enable/disable" function, to ensure that the data of previous
write operations reaches the target. I've met this kind of problem before.

> 
> So, using readl_relaxed() with a read-modify-write is entirely sensible
> provided you do not access DMA memory inbetween.

Actually, I don't think this register is that complicated. I copied the code
back below. All the bits of L3_MAINT_CTRL are not affected by DMA access operations.
The software change the "range | op_type" to specify the operation type and scope,
the set the bit "L3_MAINT_STATUS_START" to start the operation. Then wait for that
bit to change from 1 to 0 by hardware.

+	reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+	reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
+	reg |= range | op_type;
+	reg |= L3_MAINT_STATUS_START;
+	writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
+
+	/* Wait until the hardware maintenance operation is complete. */
+	do {
+		cpu_relax();
+		reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+	} while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END);

> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
       [not found]           ` <6c4d3650-0040-06d4-4342-79392738877b@huawei.com>
@ 2021-01-30  3:00             ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 14+ messages in thread
From: Leizhen (ThunderTown) @ 2021-01-30  3:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: devicetree, Arnd Bergmann, Greg Kroah-Hartman, Will Deacon,
	linux-kernel, Haojian Zhuang, Rob Herring, Wei Xu, Russell King,
	linux-arm-kernel



On 2021/1/29 21:54, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/1/29 18:26, Arnd Bergmann wrote:
>> On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann <arnd@kernel.org> wrote:
>>> On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown)
>>> <thunder.leizhen@huawei.com> wrote:
>>>> On 2021/1/28 22:24, Arnd Bergmann wrote:
>>>>> On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
>>>>>> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
>>>>>> +
>>>>>> +static void l3cache_maint_common(u32 range, u32 op_type)
>>>>>> +{
>>>>>> +       u32 reg;
>>>>>> +
>>>>>> +       reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>>>>>> +       reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>>>>>> +       reg |= range | op_type;
>>>>>> +       reg |= L3_MAINT_STATUS_START;
>>>>>> +       writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
>>>>>
>>>>> Are there contents of L3_MAINT_CTRL that need to be preserved
>>>>> across calls and can not be inferred? A 'readl()' is often expensive,
>>>>> so it might be more efficient if you can avoid that.
>>>>
>>>> Right, this readl() can be replaced with readl_relaxed(). Thanks.
>>>>
>>>> I'll check and correct the readl() and writel() in other places.
>>>
>>> What I meant is that if you want to replace them, you should provide
>>> performance numbers that show how much difference this makes
>>> and add comments in the source code explaining how you proved that
>>> the _relaxed() version is actually correct.
>>
>> Another clarification, as there are actually two independent
>> points here:
>>
>> * if you can completely remove the readl() above and just write a
>>   hardcoded value into the register, or perhaps read the original
>>   value once at boot time, that is probably a win because it
>>   avoids one of the barriers in the beginning. The datasheet should
>>   tell you if there are any bits in the register that have to be
>>   preserved
> 
> Code coupling will become very strong.
> 
>>
>> * Regarding the _relaxed() accessors, it's a lot harder to know
>>   whether that is safe, as you first have to show, in particular in case
>>   any of the accesses stop being guarded by the spinlock in that
>>   case, and whether there may be a case where you have to
>>   serialize the memory access against accesses that are still in the
>>   store queue or prefetched.
>>
>> Whether this matters at all depends mostly on the type of devices
>> you are driving on your SoC. If you have any high-speed network
>> interfaces that are unable to do cache coherent DMA, any extra
>> instruction here may impact the number of packets you can transfer,
>> but if all your high-speed devices are connected to a coherent
>> interconnect, I would just go with the obvious approach and use
>> the safe MMIO accessors everywhere.
> 
> In fact, this driver has been running on an earlier version for several years
> and has not received any feedback about the performance issue. So I didn't
> try to optimize it when I first sent these patches. I had to reconsider it
> until you noticed it.
> 
> How about keeping it unchanged for the moment? It'll take a lot of time and
> energy to retest.

In the spirit of code excellence, it's still necessary to optimize it.
Yesterday, my family urged me to go back, I wrote it in a hurry.

> 
>>
>>        Arnd
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-01-30  9:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-16  3:27 [PATCH v5 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller Zhen Lei
2021-01-16  3:27 ` [PATCH v5 1/4] ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache hooks Zhen Lei
2021-01-28 14:29   ` Arnd Bergmann
2021-01-16  3:27 ` [PATCH v5 2/4] ARM: hisi: add support for Kunpeng50x SoC Zhen Lei
2021-01-28 14:28   ` Arnd Bergmann
2021-01-16  3:27 ` [PATCH v5 3/4] dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache controller Zhen Lei
2021-01-28 14:25   ` Arnd Bergmann
2021-01-16  3:27 ` [PATCH v5 4/4] ARM: Add support for Hisilicon " Zhen Lei
2021-01-28 14:24   ` Arnd Bergmann
2021-01-29  7:23     ` Leizhen (ThunderTown)
     [not found]       ` <CAK8P3a3Hj0Hyc8mVdGYhB7AEuHCYbhGxHnhNk1xWonEmxZOxRw@mail.gmail.com>
     [not found]         ` <CAK8P3a1j+mr3bCp2uCuuYzW0ygjTmGv9vELuNy7v-iQ=WoDMOw@mail.gmail.com>
2021-01-29 10:33           ` Russell King - ARM Linux admin
2021-01-30  2:51             ` Leizhen (ThunderTown)
     [not found]           ` <6c4d3650-0040-06d4-4342-79392738877b@huawei.com>
2021-01-30  3:00             ` Leizhen (ThunderTown)
2021-01-28  1:30 ` [PATCH v5 0/4] " Leizhen (ThunderTown)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).