linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mips: Introduce some IO-accessors optimizations
@ 2020-09-20 11:00 Serge Semin
  2020-09-20 11:00 ` [PATCH 1/2] mips: Add strong UC ordering config Serge Semin
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Serge Semin @ 2020-09-20 11:00 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Serge Semin, Serge Semin, Alexey Malahov, Pavel Parkhomenko,
	Vadim Vlasov, Maciej W . Rozycki, linux-mips, linux-kernel

It has been discovered that on our MIPS P5600-based CPU the IO accessors
aren't that rapid as they could be even taking into account a relatively
slow AXI2APB bridge embedded into the system interconnect. Turned out we
can introduce two types of optimizations. First we can remove the
execution barriers from the relaxed IO-accessors as our CPU conforms to
the MIPS Coherency Protocol Specification [1, 2]. Of course it also
concerns the IO interconnect implementation. So in accordance with [3] we
suggest to remove the barriers at least for the platforms which conform
the specification the same way as ours. Second there is a dedicated
Coherency Manager control register, which can be also used to tune the IO
methods up. For some reason it hasn't been added to the MIPS arch code so
far, while it provides flags for instance to speed the SYNC barrier for
the platforms with non-re-ordering IO interconnect, to set the cache ops
serialization limits, enable the speculative reads, etc. For now we
suggest to add just the macro with the CM2 GCR_CONTROL register accessors
and fields description. So any platform could use it to activate the
corresponding optimization. Our platform-wise we'll do this in the
framework of our Baikal-T1 platform code in the prom_init() method.

[1] MIPS Coherence Protocol Specification, Document Number: MD00605,
    Revision 01.01. September 14, 2015, 4.2 Execution Order Behavior,
    p. 33

[2] MIPS Coherence Protocol Specification, Document Number: MD00605,
    Revision 01.01. September 14, 2015, 4.8.1 IO Device Access, p. 58

[3] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt,
Section "KERNEL I/O BARRIER EFFECTS"

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru>
Cc: Vadim Vlasov <V.Vlasov@baikalelectronics.ru>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Serge Semin (2):
  mips: Add strong UC ordering config
  mips: Introduce MIPS CM2 GCR Control register accessors

 arch/mips/Kconfig               |  8 ++++++++
 arch/mips/include/asm/io.h      | 20 ++++++++++----------
 arch/mips/include/asm/mips-cm.h | 15 +++++++++++++++
 3 files changed, 33 insertions(+), 10 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] mips: Add strong UC ordering config
  2020-09-20 11:00 [PATCH 0/2] mips: Introduce some IO-accessors optimizations Serge Semin
@ 2020-09-20 11:00 ` Serge Semin
  2020-09-25  3:54   ` Jiaxun Yang
  2020-09-20 11:00 ` [PATCH 2/2] mips: Introduce MIPS CM2 GCR Control register accessors Serge Semin
  2020-09-29 21:12 ` [PATCH 0/2] mips: Introduce some IO-accessors optimizations Serge Semin
  2 siblings, 1 reply; 8+ messages in thread
From: Serge Semin @ 2020-09-20 11:00 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Serge Semin, Serge Semin, Alexey Malahov, Pavel Parkhomenko,
	Vadim Vlasov, Maciej W . Rozycki, linux-mips, linux-kernel

In accordance with [1, 2] memory transactions using CCA=2 (Uncached
Cacheability and Coherency Attribute) are always strongly ordered. This
means the younger memory accesses using CCA=2 are never allowed to be
executed before older memory accesses using CCA=2 (no bypassing is
allowed), and Loads and Stores using CCA=2 are never speculative. It is
expected by the specification that the rest of the system maintains these
properties for processor initiated uncached accesses. So the system IO
interconnect doesn't reorder uncached transactions once they have left the
processor subsystem. Taking into account these properties and what [3]
says about the relaxed IO-accessors we can infer that normal Loads and
Stores from/to CCA=2 memory and without any additional execution barriers
will fully comply with the {read,write}X_relaxed() methods requirements.

Let's convert then currently generated relaxed IO-accessors to being pure
Loads and Stores. Seeing the commit 3d474dacae72 ("MIPS: Enforce strong
ordering for MMIO accessors") and commit 8b656253a7a4 ("MIPS: Provide
actually relaxed MMIO accessors") have already made a preparation in the
corresponding macro, we can do that just by replacing the "barrier"
parameter utilization with the "relax" one. Note the "barrier" macro
argument can be removed, since it isn't fully used anyway other than being
always assigned to 1.

Of course it would be fullish to believe that all the available MIPS-based
CPUs completely follow the denoted specification, especially considering
how old the architecture is. Instead we introduced a dedicated kernel
config, which when enabled will convert the relaxed IO-accessors to being
pure Loads and Stores without any additional barriers around. So if some
CPU supports the strongly ordered UC memory access, it can enable that
config and use a fully optimized relaxed IO-methods. For instance,
Baikal-T1 architecture support code will do that.

[1] MIPS Coherence Protocol Specification, Document Number: MD00605,
    Revision 01.01. September 14, 2015, 4.2 Execution Order Behavior,
    p. 33

[2] MIPS Coherence Protocol Specification, Document Number: MD00605,
    Revision 01.01. September 14, 2015, 4.8.1 IO Device Access, p. 58

[3] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt,
    Section "KERNEL I/O BARRIER EFFECTS"

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
---
 arch/mips/Kconfig          |  8 ++++++++
 arch/mips/include/asm/io.h | 20 ++++++++++----------
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c95fa3a2484c..2c82d927347d 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2066,6 +2066,14 @@ config WEAK_ORDERING
 #
 config WEAK_REORDERING_BEYOND_LLSC
 	bool
+
+#
+# CPU may not reorder reads and writes R->R, R->W, W->R, W->W within Uncached
+# Cacheability and Coherency Attribute (CCA=2)
+#
+config STRONG_UC_ORDERING
+	bool
+
 endmenu
 
 #
diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index 78537aa23500..130c4b6458fc 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -213,7 +213,7 @@ void iounmap(const volatile void __iomem *addr);
 #define war_io_reorder_wmb()		barrier()
 #endif
 
-#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, barrier, relax, irq)	\
+#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, relax, irq)		\
 									\
 static inline void pfx##write##bwlq(type val,				\
 				    volatile void __iomem *mem)		\
@@ -221,7 +221,7 @@ static inline void pfx##write##bwlq(type val,				\
 	volatile type *__mem;						\
 	type __val;							\
 									\
-	if (barrier)							\
+	if (!(relax && IS_ENABLED(CONFIG_STRONG_UC_ORDERING)))		\
 		iobarrier_rw();						\
 	else								\
 		war_io_reorder_wmb();					\
@@ -262,7 +262,7 @@ static inline type pfx##read##bwlq(const volatile void __iomem *mem)	\
 									\
 	__mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));	\
 									\
-	if (barrier)							\
+	if (!(relax && IS_ENABLED(CONFIG_STRONG_UC_ORDERING)))		\
 		iobarrier_rw();						\
 									\
 	if (sizeof(type) != sizeof(u64) || sizeof(u64) == sizeof(long)) \
@@ -294,14 +294,14 @@ static inline type pfx##read##bwlq(const volatile void __iomem *mem)	\
 	return pfx##ioswab##bwlq(__mem, __val);				\
 }
 
-#define __BUILD_IOPORT_SINGLE(pfx, bwlq, type, barrier, relax, p)	\
+#define __BUILD_IOPORT_SINGLE(pfx, bwlq, type, relax, p)		\
 									\
 static inline void pfx##out##bwlq##p(type val, unsigned long port)	\
 {									\
 	volatile type *__addr;						\
 	type __val;							\
 									\
-	if (barrier)							\
+	if (!(relax && IS_ENABLED(CONFIG_STRONG_UC_ORDERING)))		\
 		iobarrier_rw();						\
 	else								\
 		war_io_reorder_wmb();					\
@@ -325,7 +325,7 @@ static inline type pfx##in##bwlq##p(unsigned long port)			\
 									\
 	BUILD_BUG_ON(sizeof(type) > sizeof(unsigned long));		\
 									\
-	if (barrier)							\
+	if (!(relax && IS_ENABLED(CONFIG_STRONG_UC_ORDERING)))		\
 		iobarrier_rw();						\
 									\
 	__val = *__addr;						\
@@ -338,7 +338,7 @@ static inline type pfx##in##bwlq##p(unsigned long port)			\
 
 #define __BUILD_MEMORY_PFX(bus, bwlq, type, relax)			\
 									\
-__BUILD_MEMORY_SINGLE(bus, bwlq, type, 1, relax, 1)
+__BUILD_MEMORY_SINGLE(bus, bwlq, type, relax, 1)
 
 #define BUILDIO_MEM(bwlq, type)						\
 									\
@@ -358,8 +358,8 @@ __BUILD_MEMORY_PFX(__mem_, q, u64, 0)
 #endif
 
 #define __BUILD_IOPORT_PFX(bus, bwlq, type)				\
-	__BUILD_IOPORT_SINGLE(bus, bwlq, type, 1, 0,)			\
-	__BUILD_IOPORT_SINGLE(bus, bwlq, type, 1, 0, _p)
+	__BUILD_IOPORT_SINGLE(bus, bwlq, type, 0,)			\
+	__BUILD_IOPORT_SINGLE(bus, bwlq, type, 0, _p)
 
 #define BUILDIO_IOPORT(bwlq, type)					\
 	__BUILD_IOPORT_PFX(, bwlq, type)				\
@@ -374,7 +374,7 @@ BUILDIO_IOPORT(q, u64)
 
 #define __BUILDIO(bwlq, type)						\
 									\
-__BUILD_MEMORY_SINGLE(____raw_, bwlq, type, 1, 0, 0)
+__BUILD_MEMORY_SINGLE(____raw_, bwlq, type, 0, 0)
 
 __BUILDIO(q, u64)
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] mips: Introduce MIPS CM2 GCR Control register accessors
  2020-09-20 11:00 [PATCH 0/2] mips: Introduce some IO-accessors optimizations Serge Semin
  2020-09-20 11:00 ` [PATCH 1/2] mips: Add strong UC ordering config Serge Semin
@ 2020-09-20 11:00 ` Serge Semin
  2020-09-29 21:12 ` [PATCH 0/2] mips: Introduce some IO-accessors optimizations Serge Semin
  2 siblings, 0 replies; 8+ messages in thread
From: Serge Semin @ 2020-09-20 11:00 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Serge Semin, Serge Semin, Alexey Malahov, Pavel Parkhomenko,
	Vadim Vlasov, Maciej W . Rozycki, linux-mips, linux-kernel

For some reason these accessors have been absent from the MIPS kernel,
while some of them can be used to tune the MIPS code execution up (the
default value are fully acceptable though). For instance, in the framework
of MIPS P5600/P6600 (see [1] for details) if we are sure the IO
interconnect doesn't reorder the requests we can freely set
GCR_CONTROL.SYNCDIS, which will make CM2 to respond on SYNCs just
after a request is accepted on the L2/Memory interface instead of
executing the legacy SYNC and waiting for the full response from
L2/Memory. Needless to say that this will significantly speed the
{read,write}X() IO-accessors due to having more lightweight barriers
around the IO Loads and Stores. There are others MIPS Coherency Manager
optimizations available in framework of that register like cache ops
serialization limits, speculative read enable, etc, which can be useful
for the various MIPS platforms.

[1] MIPS32 P5600 Multiprocessing System Software User's Manual,
    Document Number: MD01025, Revision 01.60, April 19, 2016, p. 400

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>

---

Folks, do you think it would be better to implement a dedicated config for
arch/mips/kernel/mips-cm.c code, which would disable the SI_SyncTxEn
acceptance by setting the GCR_CONTROL.SYNCDIS bit? Currently I intend to
set it in the out platform-specific prom_init() method.
---
 arch/mips/include/asm/mips-cm.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/mips/include/asm/mips-cm.h b/arch/mips/include/asm/mips-cm.h
index aeae2effa123..17b2adf57e0c 100644
--- a/arch/mips/include/asm/mips-cm.h
+++ b/arch/mips/include/asm/mips-cm.h
@@ -143,6 +143,21 @@ GCR_ACCESSOR_RW(64, 0x008, base)
 #define  CM_GCR_BASE_CMDEFTGT_IOCU0		2
 #define  CM_GCR_BASE_CMDEFTGT_IOCU1		3
 
+/* GCR_CONTROL - Global CM2 Settings */
+GCR_ACCESSOR_RW(64, 0x010, control)
+#define CM_GCR_CONTROL_SYNCCTL			BIT(16)
+#define CM_GCR_CONTROL_SYNCDIS			BIT(5)
+#define CM_GCR_CONTROL_IVU_EN			BIT(4)
+#define CM_GCR_CONTROL_SHST_EN			BIT(3)
+#define CM_GCR_CONTROL_PARK_EN			BIT(2)
+#define CM_GCR_CONTROL_MMIO_LIMIT_DIS		BIT(1)
+#define CM_GCR_CONTROL_SPEC_READ_EN		BIT(0)
+
+/* GCR_CONTROL2 - Global CM2 Settings (continue) */
+GCR_ACCESSOR_RW(64, 0x018, control2)
+#define CM_GCR_CONTROL2_L2_CACHEOP_LIMIT	GENMASK(19, 16)
+#define CM_GCR_CONTROL2_L1_CACHEOP_LIMIT	GENMASK(3, 0)
+
 /* GCR_ACCESS - Controls core/IOCU access to GCRs */
 GCR_ACCESSOR_RW(32, 0x020, access)
 #define CM_GCR_ACCESS_ACCESSEN			GENMASK(7, 0)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mips: Add strong UC ordering config
  2020-09-20 11:00 ` [PATCH 1/2] mips: Add strong UC ordering config Serge Semin
@ 2020-09-25  3:54   ` Jiaxun Yang
  2020-09-25 18:18     ` Serge Semin
  0 siblings, 1 reply; 8+ messages in thread
From: Jiaxun Yang @ 2020-09-25  3:54 UTC (permalink / raw)
  To: Serge Semin, Thomas Bogendoerfer
  Cc: Serge Semin, Alexey Malahov, Pavel Parkhomenko, Vadim Vlasov,
	Maciej W . Rozycki, linux-mips, linux-kernel



在 2020/9/20 19:00, Serge Semin 写道:
> In accordance with [1, 2] memory transactions using CCA=2 (Uncached
> Cacheability and Coherency Attribute) are always strongly ordered. This
> means the younger memory accesses using CCA=2 are never allowed to be
> executed before older memory accesses using CCA=2 (no bypassing is
> allowed), and Loads and Stores using CCA=2 are never speculative. It is
> expected by the specification that the rest of the system maintains these
> properties for processor initiated uncached accesses. So the system IO
> interconnect doesn't reorder uncached transactions once they have left the
> processor subsystem. Taking into account these properties and what [3]
> says about the relaxed IO-accessors we can infer that normal Loads and
> Stores from/to CCA=2 memory and without any additional execution barriers
> will fully comply with the {read,write}X_relaxed() methods requirements.
>
> Let's convert then currently generated relaxed IO-accessors to being pure
> Loads and Stores. Seeing the commit 3d474dacae72 ("MIPS: Enforce strong
> ordering for MMIO accessors") and commit 8b656253a7a4 ("MIPS: Provide
> actually relaxed MMIO accessors") have already made a preparation in the
> corresponding macro, we can do that just by replacing the "barrier"
> parameter utilization with the "relax" one. Note the "barrier" macro
> argument can be removed, since it isn't fully used anyway other than being
> always assigned to 1.
>
> Of course it would be fullish to believe that all the available MIPS-based
> CPUs completely follow the denoted specification, especially considering
> how old the architecture is. Instead we introduced a dedicated kernel
> config, which when enabled will convert the relaxed IO-accessors to being
> pure Loads and Stores without any additional barriers around. So if some
> CPU supports the strongly ordered UC memory access, it can enable that
> config and use a fully optimized relaxed IO-methods. For instance,
> Baikal-T1 architecture support code will do that.
>
> [1] MIPS Coherence Protocol Specification, Document Number: MD00605,
>      Revision 01.01. September 14, 2015, 4.2 Execution Order Behavior,
>      p. 33
>
> [2] MIPS Coherence Protocol Specification, Document Number: MD00605,
>      Revision 01.01. September 14, 2015, 4.8.1 IO Device Access, p. 58
>
> [3] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt,
>      Section "KERNEL I/O BARRIER EFFECTS"
>
> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Cc: Maciej W. Rozycki <macro@linux-mips.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>


Based on #mipslinus discussions, I suspect this option can be selected by
most modern MIPS processors including all IMG/MTI cores,
Ingenic and Loongson.

Thanks.

- Jiaxun

> ---
>   arch/mips/Kconfig          |  8 ++++++++
>   arch/mips/include/asm/io.h | 20 ++++++++++----------
>   2 files changed, 18 insertions(+), 10 deletions(-)
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mips: Add strong UC ordering config
  2020-09-25  3:54   ` Jiaxun Yang
@ 2020-09-25 18:18     ` Serge Semin
  0 siblings, 0 replies; 8+ messages in thread
From: Serge Semin @ 2020-09-25 18:18 UTC (permalink / raw)
  To: Jiaxun Yang
  Cc: Thomas Bogendoerfer, Alexey Malahov, Pavel Parkhomenko,
	Vadim Vlasov, Maciej W . Rozycki, linux-mips, linux-kernel

On Fri, Sep 25, 2020 at 11:54:20AM +0800, Jiaxun Yang wrote:
> 
> 
> 在 2020/9/20 19:00, Serge Semin 写道:
> > In accordance with [1, 2] memory transactions using CCA=2 (Uncached
> > Cacheability and Coherency Attribute) are always strongly ordered. This
> > means the younger memory accesses using CCA=2 are never allowed to be
> > executed before older memory accesses using CCA=2 (no bypassing is
> > allowed), and Loads and Stores using CCA=2 are never speculative. It is
> > expected by the specification that the rest of the system maintains these
> > properties for processor initiated uncached accesses. So the system IO
> > interconnect doesn't reorder uncached transactions once they have left the
> > processor subsystem. Taking into account these properties and what [3]
> > says about the relaxed IO-accessors we can infer that normal Loads and
> > Stores from/to CCA=2 memory and without any additional execution barriers
> > will fully comply with the {read,write}X_relaxed() methods requirements.
> > 
> > Let's convert then currently generated relaxed IO-accessors to being pure
> > Loads and Stores. Seeing the commit 3d474dacae72 ("MIPS: Enforce strong
> > ordering for MMIO accessors") and commit 8b656253a7a4 ("MIPS: Provide
> > actually relaxed MMIO accessors") have already made a preparation in the
> > corresponding macro, we can do that just by replacing the "barrier"
> > parameter utilization with the "relax" one. Note the "barrier" macro
> > argument can be removed, since it isn't fully used anyway other than being
> > always assigned to 1.
> > 
> > Of course it would be fullish to believe that all the available MIPS-based
> > CPUs completely follow the denoted specification, especially considering
> > how old the architecture is. Instead we introduced a dedicated kernel
> > config, which when enabled will convert the relaxed IO-accessors to being
> > pure Loads and Stores without any additional barriers around. So if some
> > CPU supports the strongly ordered UC memory access, it can enable that
> > config and use a fully optimized relaxed IO-methods. For instance,
> > Baikal-T1 architecture support code will do that.
> > 
> > [1] MIPS Coherence Protocol Specification, Document Number: MD00605,
> >      Revision 01.01. September 14, 2015, 4.2 Execution Order Behavior,
> >      p. 33
> > 
> > [2] MIPS Coherence Protocol Specification, Document Number: MD00605,
> >      Revision 01.01. September 14, 2015, 4.8.1 IO Device Access, p. 58
> > 
> > [3] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt,
> >      Section "KERNEL I/O BARRIER EFFECTS"
> > 
> > Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > Cc: Maciej W. Rozycki <macro@linux-mips.org>

> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
> 
> 
> Based on #mipslinus discussions, I suspect this option can be selected by
> most modern MIPS processors including all IMG/MTI cores,
> Ingenic and Loongson.

Thanks for reviewing the patch.

Regarding the option. Alas it's not that easy and we must be very careful
with assumption whether some processor supports the denoted feature. Even
if the MIPS cores do imply the strict UC load/stores ordering, the system
interconnects may still perform the out-of-order requests execution. For
instance, the P5600 cores installed into our Baikal-T1 SoC do support the
strong UC ordering, but there is a cascade of the OCP2AXI, AXI2AXI and
AXI2APB bridges behind the CPU memory interface, each of which is equipped
with an internal FIFO and some complicated logic of the traffic routing.
So each platform should be carefully analyzed and tested (if it's
possible) before enabling the suggested feature, otherwise we'll risk to end
up with in general working, but at some point buggy, systems. Needless to
say, that out-of-order exec problems is very hard to track and debug due
to a random nature of impact on the system.

-Sergey

> 
> Thanks.
> 
> - Jiaxun
> 
> > ---
> >   arch/mips/Kconfig          |  8 ++++++++
> >   arch/mips/include/asm/io.h | 20 ++++++++++----------
> >   2 files changed, 18 insertions(+), 10 deletions(-)
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] mips: Introduce some IO-accessors optimizations
  2020-09-20 11:00 [PATCH 0/2] mips: Introduce some IO-accessors optimizations Serge Semin
  2020-09-20 11:00 ` [PATCH 1/2] mips: Add strong UC ordering config Serge Semin
  2020-09-20 11:00 ` [PATCH 2/2] mips: Introduce MIPS CM2 GCR Control register accessors Serge Semin
@ 2020-09-29 21:12 ` Serge Semin
  2020-09-30 10:15   ` Thomas Bogendoerfer
  2 siblings, 1 reply; 8+ messages in thread
From: Serge Semin @ 2020-09-29 21:12 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Alexey Malahov, Pavel Parkhomenko, Vadim Vlasov,
	Maciej W . Rozycki, linux-mips, linux-kernel

Thomas,
Any comment on the series? The changes aren't that comprehensive, so it would
be great to merge it in before the 5.10 merge window is opened.

-Sergey

On Sun, Sep 20, 2020 at 02:00:08PM +0300, Serge Semin wrote:
> It has been discovered that on our MIPS P5600-based CPU the IO accessors
> aren't that rapid as they could be even taking into account a relatively
> slow AXI2APB bridge embedded into the system interconnect. Turned out we
> can introduce two types of optimizations. First we can remove the
> execution barriers from the relaxed IO-accessors as our CPU conforms to
> the MIPS Coherency Protocol Specification [1, 2]. Of course it also
> concerns the IO interconnect implementation. So in accordance with [3] we
> suggest to remove the barriers at least for the platforms which conform
> the specification the same way as ours. Second there is a dedicated
> Coherency Manager control register, which can be also used to tune the IO
> methods up. For some reason it hasn't been added to the MIPS arch code so
> far, while it provides flags for instance to speed the SYNC barrier for
> the platforms with non-re-ordering IO interconnect, to set the cache ops
> serialization limits, enable the speculative reads, etc. For now we
> suggest to add just the macro with the CM2 GCR_CONTROL register accessors
> and fields description. So any platform could use it to activate the
> corresponding optimization. Our platform-wise we'll do this in the
> framework of our Baikal-T1 platform code in the prom_init() method.
> 
> [1] MIPS Coherence Protocol Specification, Document Number: MD00605,
>     Revision 01.01. September 14, 2015, 4.2 Execution Order Behavior,
>     p. 33
> 
> [2] MIPS Coherence Protocol Specification, Document Number: MD00605,
>     Revision 01.01. September 14, 2015, 4.8.1 IO Device Access, p. 58
> 
> [3] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt,
> Section "KERNEL I/O BARRIER EFFECTS"
> 
> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
> Cc: Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru>
> Cc: Vadim Vlasov <V.Vlasov@baikalelectronics.ru>
> Cc: Maciej W. Rozycki <macro@linux-mips.org>
> Cc: linux-mips@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> Serge Semin (2):
>   mips: Add strong UC ordering config
>   mips: Introduce MIPS CM2 GCR Control register accessors
> 
>  arch/mips/Kconfig               |  8 ++++++++
>  arch/mips/include/asm/io.h      | 20 ++++++++++----------
>  arch/mips/include/asm/mips-cm.h | 15 +++++++++++++++
>  3 files changed, 33 insertions(+), 10 deletions(-)
> 
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] mips: Introduce some IO-accessors optimizations
  2020-09-29 21:12 ` [PATCH 0/2] mips: Introduce some IO-accessors optimizations Serge Semin
@ 2020-09-30 10:15   ` Thomas Bogendoerfer
  2020-09-30 13:23     ` Serge Semin
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Bogendoerfer @ 2020-09-30 10:15 UTC (permalink / raw)
  To: Serge Semin
  Cc: Alexey Malahov, Pavel Parkhomenko, Vadim Vlasov,
	Maciej W . Rozycki, linux-mips, linux-kernel

On Wed, Sep 30, 2020 at 12:12:32AM +0300, Serge Semin wrote:
> Thomas,
> Any comment on the series? The changes aren't that comprehensive, so it would
> be great to merge it in before the 5.10 merge window is opened.

for the both patches there is no user for it, so I don't see a reason
to apply it.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] mips: Introduce some IO-accessors optimizations
  2020-09-30 10:15   ` Thomas Bogendoerfer
@ 2020-09-30 13:23     ` Serge Semin
  0 siblings, 0 replies; 8+ messages in thread
From: Serge Semin @ 2020-09-30 13:23 UTC (permalink / raw)
  To: Thomas Bogendoerfer, Jiaxun Yang
  Cc: Alexey Malahov, Pavel Parkhomenko, Vadim Vlasov,
	Maciej W . Rozycki, linux-mips, linux-kernel

On Wed, Sep 30, 2020 at 12:15:32PM +0200, Thomas Bogendoerfer wrote:
> On Wed, Sep 30, 2020 at 12:12:32AM +0300, Serge Semin wrote:
> > Thomas,
> > Any comment on the series? The changes aren't that comprehensive, so it would
> > be great to merge it in before the 5.10 merge window is opened.
> 
> for the both patches there is no user for it, so I don't see a reason
> to apply it.

@Thomas. I see your point. I'll merge them into my repo with Baikal-T1 CSP/BSP
patches and will deliver all at once when the kernel is ready to accept the
changes (most likely in 3 - 5 months).

@Jiaxun, if you've any hardware which for sure supports the strong UC
ordering, feel free to submit a patchset which activates the proposed here
config together with my STRONG_UC_ORDERING-alteration applied before your
changes.

-Sergey

> 
> Thomas.
> 
> -- 
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-09-30 13:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-20 11:00 [PATCH 0/2] mips: Introduce some IO-accessors optimizations Serge Semin
2020-09-20 11:00 ` [PATCH 1/2] mips: Add strong UC ordering config Serge Semin
2020-09-25  3:54   ` Jiaxun Yang
2020-09-25 18:18     ` Serge Semin
2020-09-20 11:00 ` [PATCH 2/2] mips: Introduce MIPS CM2 GCR Control register accessors Serge Semin
2020-09-29 21:12 ` [PATCH 0/2] mips: Introduce some IO-accessors optimizations Serge Semin
2020-09-30 10:15   ` Thomas Bogendoerfer
2020-09-30 13:23     ` Serge Semin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).