All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
@ 2019-02-10 16:17 Andre Przywara
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 1/3] arm: clean up asm/io.h Andre Przywara
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Andre Przywara @ 2019-02-10 16:17 UTC (permalink / raw)
  To: u-boot

Hi, this is a resend of what I posted some weeks ago, just adding the
missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
the opportunity to add his Reviewed-by: tags on the first two patches.
(Many thanks for that!) The rest is unchanged.
-------------------

Admittedly this is the long way round to solve some nasty SPL code size
problem, but it looked beneficial to others as well, so here we go:

arch/arm/include/asm/io.h looks like it's been around since the dawn of
time, and was more or less blindly copied from Linux.
We don't use and don't need most of the definitions, and mainline Linux
got rid of them anyway, so patch 1/3 cleans up this header file to
just contain what we need in U-Boot.

Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
but more importantly save one (barrier) instruction per accessor. This
helps to bring down code size, since especially DRAM controller inits in
SPLs tend to do a lot of MMIO.

Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
which reduces the SPL size by a whopping 2KB, due to a twist:
The AArch64 exception table needs to be 2KB aligned, but we don't do
anything special about it the linker script. So depending on where the
code before the vectors ends, we have potentially large padding:
At the moment this last address is 0x1824 for the H6, so the vectors can
only start at 0x2000. By reducing the code size before the vectors by just 
(at least) 9 instructions, the vectors start at 0x1800 and we save most of
the padding.

I understand that the proper solution is to fill the gap before the vectors
with code instead of NOPs, but I couldn't find any obvious way doing this
in the linker script. If anyone has any idea here, I am all ears.

Cheers,
Andre.

Andre Przywara (3):
  arm: clean up asm/io.h
  arm: introduce _relaxed MMIO accessors
  sunxi: H6: use writel_relaxed for DRAM timing register accesses

 arch/arm/include/asm/io.h            | 164 +++--------------------------------
 arch/arm/mach-sunxi/dram_sun50i_h6.c |  79 +++++++++--------
 2 files changed, 54 insertions(+), 189 deletions(-)

-- 
2.14.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 1/3] arm: clean up asm/io.h
  2019-02-10 16:17 [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Andre Przywara
@ 2019-02-10 16:17 ` Andre Przywara
  2019-02-20 12:24   ` Alexander Graf
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 2/3] arm: introduce _relaxed MMIO accessors Andre Przywara
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Andre Przywara @ 2019-02-10 16:17 UTC (permalink / raw)
  To: u-boot

asm/io.h is the header file containing the central MMIO accessor macros.
Judging by the header and the comments, it was apparently once copied
from the Linux kernel, but has deviated since then heavily.

Clean up the definitions by:
- removing pointless Linux history
- removing commented code
- removing outdated comments
- removing unused definitions, for instance for mem_isa and mem_pci

This massively improves the readability of the file.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
---
 arch/arm/include/asm/io.h | 154 +---------------------------------------------
 1 file changed, 2 insertions(+), 152 deletions(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 12bc7fbe06..bcbaf0d83c 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -1,44 +1,25 @@
 /*
- *  linux/include/asm-arm/io.h
+ * I/O device access primitives. Based on early versions from the Linux kernel.
  *
  *  Copyright (C) 1996-2000 Russell King
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
- *
- * Modifications:
- *  16-Sep-1996	RMK	Inlined the inx/outx functions & optimised for both
- *			constant addresses and variable addresses.
- *  04-Dec-1997	RMK	Moved a lot of this stuff to the new architecture
- *			specific IO header files.
- *  27-Mar-1999	PJB	Second parameter of memcpy_toio is const..
- *  04-Apr-1999	PJB	Added check_signature.
- *  12-Dec-1999	RMK	More cleanups
- *  18-Jun-2000 RMK	Removed virt_to_* and friends definitions
  */
 #ifndef __ASM_ARM_IO_H
 #define __ASM_ARM_IO_H
 
-#ifdef __KERNEL__
-
 #include <linux/types.h>
 #include <asm/byteorder.h>
 #include <asm/memory.h>
 #include <asm/barriers.h>
-#if 0	/* XXX###XXX */
-#include <asm/arch/hardware.h>
-#endif	/* XXX###XXX */
 
 static inline void sync(void)
 {
 }
 
-/*
- * Generic virtual read/write.  Note that we don't support half-word
- * read/writes.  We define __arch_*[bl] here, and leave __arch_*w
- * to the architecture specific code.
- */
+/* Generic virtual read/write. */
 #define __arch_getb(a)			(*(volatile unsigned char *)(a))
 #define __arch_getw(a)			(*(volatile unsigned short *)(a))
 #define __arch_getl(a)			(*(volatile unsigned int *)(a))
@@ -205,13 +186,6 @@ static inline void __raw_readsl(unsigned long addr, void *data, int longlen)
 #define setbits_8(addr, set) setbits(8, addr, set)
 #define clrsetbits_8(addr, clear, set) clrsetbits(8, addr, clear, set)
 
-/*
- * Now, pick up the machine-defined IO definitions
- */
-#if 0	/* XXX###XXX */
-#include <asm/arch/io.h>
-#endif	/* XXX###XXX */
-
 /*
  *  IO port access primitives
  *  -------------------------
@@ -275,16 +249,6 @@ static inline void __raw_readsl(unsigned long addr, void *data, int longlen)
 #define writesb(a, d, s)	__raw_writesb((unsigned long)a, d, s)
 #define readsb(a, d, s)		__raw_readsb((unsigned long)a, d, s)
 
-/*
- * DMA-consistent mapping functions.  These allocate/free a region of
- * uncached, unwrite-buffered mapped memory space for use with DMA
- * devices.  This is the "generic" version.  The PCI specific version
- * is in pci.h
- */
-extern void *consistent_alloc(int gfp, size_t size, dma_addr_t *handle);
-extern void consistent_free(void *vaddr, size_t size, dma_addr_t handle);
-extern void consistent_sync(void *vaddr, size_t size, int rw);
-
 /*
  * String version of IO memory access ops:
  */
@@ -292,124 +256,10 @@ extern void _memcpy_fromio(void *, unsigned long, size_t);
 extern void _memcpy_toio(unsigned long, const void *, size_t);
 extern void _memset_io(unsigned long, int, size_t);
 
-extern void __readwrite_bug(const char *fn);
-
-/*
- * If this architecture has PCI memory IO, then define the read/write
- * macros.  These should only be used with the cookie passed from
- * ioremap.
- */
-#ifdef __mem_pci
-
-#define readb(c) ({ unsigned int __v = __raw_readb(__mem_pci(c)); __v; })
-#define readw(c) ({ unsigned int __v = le16_to_cpu(__raw_readw(__mem_pci(c))); __v; })
-#define readl(c) ({ unsigned int __v = le32_to_cpu(__raw_readl(__mem_pci(c))); __v; })
-
-#define writeb(v,c)		__raw_writeb(v,__mem_pci(c))
-#define writew(v,c)		__raw_writew(cpu_to_le16(v),__mem_pci(c))
-#define writel(v,c)		__raw_writel(cpu_to_le32(v),__mem_pci(c))
-
-#define memset_io(c,v,l)		_memset_io(__mem_pci(c),(v),(l))
-#define memcpy_fromio(a,c,l)		_memcpy_fromio((a),__mem_pci(c),(l))
-#define memcpy_toio(c,a,l)		_memcpy_toio(__mem_pci(c),(a),(l))
-
-#define eth_io_copy_and_sum(s,c,l,b) \
-				eth_copy_and_sum((s),__mem_pci(c),(l),(b))
-
-static inline int
-check_signature(unsigned long io_addr, const unsigned char *signature,
-		int length)
-{
-	int retval = 0;
-	do {
-		if (readb(io_addr) != *signature)
-			goto out;
-		io_addr++;
-		signature++;
-		length--;
-	} while (length);
-	retval = 1;
-out:
-	return retval;
-}
-
-#else
 #define memset_io(a, b, c)		memset((void *)(a), (b), (c))
 #define memcpy_fromio(a, b, c)		memcpy((a), (void *)(b), (c))
 #define memcpy_toio(a, b, c)		memcpy((void *)(a), (b), (c))
 
-#if !defined(readb)
-
-#define readb(addr)			(__readwrite_bug("readb"),0)
-#define readw(addr)			(__readwrite_bug("readw"),0)
-#define readl(addr)			(__readwrite_bug("readl"),0)
-#define writeb(v,addr)			__readwrite_bug("writeb")
-#define writew(v,addr)			__readwrite_bug("writew")
-#define writel(v,addr)			__readwrite_bug("writel")
-
-#define eth_io_copy_and_sum(a,b,c,d)	__readwrite_bug("eth_io_copy_and_sum")
-
-#define check_signature(io,sig,len)	(0)
-
-#endif
-#endif	/* __mem_pci */
-
-/*
- * If this architecture has ISA IO, then define the isa_read/isa_write
- * macros.
- */
-#ifdef __mem_isa
-
-#define isa_readb(addr)			__raw_readb(__mem_isa(addr))
-#define isa_readw(addr)			__raw_readw(__mem_isa(addr))
-#define isa_readl(addr)			__raw_readl(__mem_isa(addr))
-#define isa_writeb(val,addr)		__raw_writeb(val,__mem_isa(addr))
-#define isa_writew(val,addr)		__raw_writew(val,__mem_isa(addr))
-#define isa_writel(val,addr)		__raw_writel(val,__mem_isa(addr))
-#define isa_memset_io(a,b,c)		_memset_io(__mem_isa(a),(b),(c))
-#define isa_memcpy_fromio(a,b,c)	_memcpy_fromio((a),__mem_isa(b),(c))
-#define isa_memcpy_toio(a,b,c)		_memcpy_toio(__mem_isa((a)),(b),(c))
-
-#define isa_eth_io_copy_and_sum(a,b,c,d) \
-				eth_copy_and_sum((a),__mem_isa(b),(c),(d))
-
-static inline int
-isa_check_signature(unsigned long io_addr, const unsigned char *signature,
-		    int length)
-{
-	int retval = 0;
-	do {
-		if (isa_readb(io_addr) != *signature)
-			goto out;
-		io_addr++;
-		signature++;
-		length--;
-	} while (length);
-	retval = 1;
-out:
-	return retval;
-}
-
-#else	/* __mem_isa */
-
-#define isa_readb(addr)			(__readwrite_bug("isa_readb"),0)
-#define isa_readw(addr)			(__readwrite_bug("isa_readw"),0)
-#define isa_readl(addr)			(__readwrite_bug("isa_readl"),0)
-#define isa_writeb(val,addr)		__readwrite_bug("isa_writeb")
-#define isa_writew(val,addr)		__readwrite_bug("isa_writew")
-#define isa_writel(val,addr)		__readwrite_bug("isa_writel")
-#define isa_memset_io(a,b,c)		__readwrite_bug("isa_memset_io")
-#define isa_memcpy_fromio(a,b,c)	__readwrite_bug("isa_memcpy_fromio")
-#define isa_memcpy_toio(a,b,c)		__readwrite_bug("isa_memcpy_toio")
-
-#define isa_eth_io_copy_and_sum(a,b,c,d) \
-				__readwrite_bug("isa_eth_io_copy_and_sum")
-
-#define isa_check_signature(io,sig,len)	(0)
-
-#endif	/* __mem_isa */
-#endif	/* __KERNEL__ */
-
 #include <asm-generic/io.h>
 #include <iotrace.h>
 
-- 
2.14.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 2/3] arm: introduce _relaxed MMIO accessors
  2019-02-10 16:17 [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Andre Przywara
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 1/3] arm: clean up asm/io.h Andre Przywara
@ 2019-02-10 16:17 ` Andre Przywara
  2019-02-20 12:24   ` Alexander Graf
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 3/3] sunxi: H6: use writel_relaxed for DRAM timing register accesses Andre Przywara
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Andre Przywara @ 2019-02-10 16:17 UTC (permalink / raw)
  To: u-boot

The normal MMIO accessor macros (readX/writeX) guarantee a strong ordering,
even with normal memory accesses: https://lwn.net/Articles/698014/
For some MMIO operations (framebuffers being a prominent example) this is
not needed, and the rather costly barrier inserted on weakly ordered
systems like ARM costs some performance.
To mitigate this issue, Linux introduced readX_relaxed and
writeX_relaxed primitives, which only guarantee ordering between each
other, so are typically faster due to the missing barrier.

We probably do not care so much about performance in U-Boot, but want to
have those primitives for two other reasons:
- Being able to use the _relaxed macros simplifies porting drivers from
  Linux.
- The missing barrier typically allows to generate smaller code, which can
  relieve some chronically tight SPL builds.

Add those macros definitions by using the __raw versions of the
accessors, which matches an earlier (and less complicated) version of
the Linux implementation.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
[On my experimental RK3399 after modifying a few drivers:]
Tested-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
---
 arch/arm/include/asm/io.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index bcbaf0d83c..5b24b88efc 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -98,11 +98,21 @@ static inline void __raw_readsl(unsigned long addr, void *data, int longlen)
 #define writel(v,c)	({ u32 __v = v; __iowmb(); __arch_putl(__v,c); __v; })
 #define writeq(v,c)	({ u64 __v = v; __iowmb(); __arch_putq(__v,c); __v; })
 
+#define writeb_relaxed(v,c)	__raw_writeb(v, c)
+#define writew_relaxed(v,c)	__raw_writew(v, c)
+#define writel_relaxed(v,c)	__raw_writel(v, c)
+#define writeq_relaxed(v,c)	__raw_writeq(v, c)
+
 #define readb(c)	({ u8  __v = __arch_getb(c); __iormb(); __v; })
 #define readw(c)	({ u16 __v = __arch_getw(c); __iormb(); __v; })
 #define readl(c)	({ u32 __v = __arch_getl(c); __iormb(); __v; })
 #define readq(c)	({ u64 __v = __arch_getq(c); __iormb(); __v; })
 
+#define readb_relaxed(c)	__raw_readb(c)
+#define readw_relaxed(c)	__raw_readw(c)
+#define readl_relaxed(c)	__raw_readl(c)
+#define readq_relaxed(c)	__raw_readq(c)
+
 /*
  * The compiler seems to be incapable of optimising constants
  * properly.  Spell it out to the compiler in some cases.
-- 
2.14.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 3/3] sunxi: H6: use writel_relaxed for DRAM timing register accesses
  2019-02-10 16:17 [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Andre Przywara
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 1/3] arm: clean up asm/io.h Andre Przywara
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 2/3] arm: introduce _relaxed MMIO accessors Andre Przywara
@ 2019-02-10 16:17 ` Andre Przywara
  2019-02-20 12:25   ` Alexander Graf
  2019-04-15  6:07 ` [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Jagan Teki
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Andre Przywara @ 2019-02-10 16:17 UTC (permalink / raw)
  To: u-boot

The timing registers in the DRAM controller can be programmed in any
order, as they will only take effect once the controller is eventually
"activated".

Switch the MMIO writes in mctl_set_timing_lpddr3() over to use
writel_relaxed(), since we don't need the stronger guarantee of the
normal writel(). We satisfy the overall ordering requirement by ending
the function with an explicit DMB barrier.

In this case we are not interested in the performance benefit this
usually gives, but in the saved instructions, which sum up for the many
writes we have in the timing setup.
Due to alignment effects this shrinks our chronically tight H6 SPL by a
whopping 2KB, which brings it in the same region as for the other
AArch64 Allwinner SPL builds.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 arch/arm/mach-sunxi/dram_sun50i_h6.c | 79 +++++++++++++++++++-----------------
 1 file changed, 42 insertions(+), 37 deletions(-)

diff --git a/arch/arm/mach-sunxi/dram_sun50i_h6.c b/arch/arm/mach-sunxi/dram_sun50i_h6.c
index 5da90a2835..84a33a63d6 100644
--- a/arch/arm/mach-sunxi/dram_sun50i_h6.c
+++ b/arch/arm/mach-sunxi/dram_sun50i_h6.c
@@ -241,51 +241,55 @@ static void mctl_set_timing_lpddr3(struct dram_para *para)
 	memcpy(mctl_phy->mr, mr_lpddr3, sizeof(mr_lpddr3));
 
 	/* set DRAM timing */
-	writel((twtp << 24) | (tfaw << 16) | (trasmax << 8) | tras,
-	       &mctl_ctl->dramtmg[0]);
-	writel((txp << 16) | (trtp << 8) | trc, &mctl_ctl->dramtmg[1]);
-	writel((tcwl << 24) | (tcl << 16) | (trd2wr << 8) | twr2rd,
-	       &mctl_ctl->dramtmg[2]);
-	writel((tmrw << 20) | (tmrd << 12) | tmod, &mctl_ctl->dramtmg[3]);
-	writel((trcd << 24) | (tccd << 16) | (trrd << 8) | trp,
-	       &mctl_ctl->dramtmg[4]);
-	writel((tcksrx << 24) | (tcksre << 16) | (tckesr << 8) | tcke,
-	       &mctl_ctl->dramtmg[5]);
+	writel_relaxed((twtp << 24) | (tfaw << 16) | (trasmax << 8) | tras,
+		       &mctl_ctl->dramtmg[0]);
+	writel_relaxed((txp << 16) | (trtp << 8) | trc, &mctl_ctl->dramtmg[1]);
+	writel_relaxed((tcwl << 24) | (tcl << 16) | (trd2wr << 8) | twr2rd,
+		       &mctl_ctl->dramtmg[2]);
+	writel_relaxed((tmrw << 20) | (tmrd << 12) | tmod,
+		       &mctl_ctl->dramtmg[3]);
+	writel_relaxed((trcd << 24) | (tccd << 16) | (trrd << 8) | trp,
+		       &mctl_ctl->dramtmg[4]);
+	writel_relaxed((tcksrx << 24) | (tcksre << 16) | (tckesr << 8) | tcke,
+		       &mctl_ctl->dramtmg[5]);
 	/* Value suggested by ZynqMP manual and used by libdram */
-	writel((txp + 2) | 0x02020000, &mctl_ctl->dramtmg[6]);
-	writel((txsfast << 24) | (txsabort << 16) | (txsdll << 8) | txs,
-	       &mctl_ctl->dramtmg[8]);
-	writel(txsr, &mctl_ctl->dramtmg[14]);
+	writel_relaxed((txp + 2) | 0x02020000, &mctl_ctl->dramtmg[6]);
+	writel_relaxed((txsfast << 24) | (txsabort << 16) | (txsdll << 8) | txs,
+		       &mctl_ctl->dramtmg[8]);
+	writel_relaxed(txsr, &mctl_ctl->dramtmg[14]);
 
 	clrsetbits_le32(&mctl_ctl->init[0], (3 << 30), (1 << 30));
-	writel(0, &mctl_ctl->dfimisc);
+	writel_relaxed(0, &mctl_ctl->dfimisc);
 	clrsetbits_le32(&mctl_ctl->rankctl, 0xff0, 0x660);
 
 	/*
 	 * Set timing registers of the PHY.
 	 * Note: the PHY is clocked 2x from the DRAM frequency.
 	 */
-	writel((trrd << 25) | (tras << 17) | (trp << 9) | (trtp << 1),
+	writel_relaxed((trrd << 25) | (tras << 17) | (trp << 9) | (trtp << 1),
 	       &mctl_phy->dtpr[0]);
-	writel((tfaw << 17) | 0x28000400 | (tmrd << 1), &mctl_phy->dtpr[1]);
-	writel(((txs << 6) - 1) | (tcke << 17), &mctl_phy->dtpr[2]);
-	writel(((txsdll << 22) - (0x1 << 16)) | twtr_sa | (tcksrea << 8),
-	       &mctl_phy->dtpr[3]);
-	writel((txp << 1) | (trfc << 17) | 0x800, &mctl_phy->dtpr[4]);
-	writel((trc << 17) | (trcd << 9) | (twtr << 1), &mctl_phy->dtpr[5]);
-	writel(0x0505, &mctl_phy->dtpr[6]);
+	writel_relaxed((tfaw << 17) | 0x28000400 | (tmrd << 1),
+		       &mctl_phy->dtpr[1]);
+	writel_relaxed(((txs << 6) - 1) | (tcke << 17), &mctl_phy->dtpr[2]);
+	writel_relaxed(((txsdll << 22) - (0x1 << 16)) | twtr_sa |
+		       (tcksrea << 8), &mctl_phy->dtpr[3]);
+	writel_relaxed((txp << 1) | (trfc << 17) | 0x800, &mctl_phy->dtpr[4]);
+	writel_relaxed((trc << 17) | (trcd << 9) | (twtr << 1),
+		       &mctl_phy->dtpr[5]);
+	writel_relaxed(0x0505, &mctl_phy->dtpr[6]);
 
 	/* Configure DFI timing */
-	writel(tcl | 0x2000200 | (t_rdata_en << 16) | 0x808000,
-	       &mctl_ctl->dfitmg0);
-	writel(0x040201, &mctl_ctl->dfitmg1);
+	writel_relaxed(tcl | 0x2000200 | (t_rdata_en << 16) | 0x808000,
+		       &mctl_ctl->dfitmg0);
+	writel_relaxed(0x040201, &mctl_ctl->dfitmg1);
 
 	/* Configure PHY timing */
-	writel(tdinit0 | (tdinit1 << 20), &mctl_phy->ptr[3]);
-	writel(tdinit2 | (tdinit3 << 18), &mctl_phy->ptr[4]);
+	writel_relaxed(tdinit0 | (tdinit1 << 20), &mctl_phy->ptr[3]);
+	writel_relaxed(tdinit2 | (tdinit3 << 18), &mctl_phy->ptr[4]);
 
 	/* set refresh timing */
-	writel((trefi << 16) | trfc, &mctl_ctl->rfshtmg);
+	writel_relaxed((trefi << 16) | trfc, &mctl_ctl->rfshtmg);
+	DMB;
 }
 
 static void mctl_sys_init(struct dram_para *para)
@@ -476,17 +480,17 @@ static void mctl_bit_delay_set(struct dram_para *para)
 		val = readl(&mctl_phy->dx[i].bdlr0);
 		for (j = 0; j < 4; j++)
 			val += para->dx_write_delays[i][j] << (j * 8);
-		writel(val, &mctl_phy->dx[i].bdlr0);
+		writel_relaxed(val, &mctl_phy->dx[i].bdlr0);
 
 		val = readl(&mctl_phy->dx[i].bdlr1);
 		for (j = 0; j < 4; j++)
 			val += para->dx_write_delays[i][j + 4] << (j * 8);
-		writel(val, &mctl_phy->dx[i].bdlr1);
+		writel_relaxed(val, &mctl_phy->dx[i].bdlr1);
 
 		val = readl(&mctl_phy->dx[i].bdlr2);
 		for (j = 0; j < 4; j++)
 			val += para->dx_write_delays[i][j + 8] << (j * 8);
-		writel(val, &mctl_phy->dx[i].bdlr2);
+		writel_relaxed(val, &mctl_phy->dx[i].bdlr2);
 	}
 	clrbits_le32(&mctl_phy->pgcr[0], BIT(26));
 
@@ -494,22 +498,22 @@ static void mctl_bit_delay_set(struct dram_para *para)
 		val = readl(&mctl_phy->dx[i].bdlr3);
 		for (j = 0; j < 4; j++)
 			val += para->dx_read_delays[i][j] << (j * 8);
-		writel(val, &mctl_phy->dx[i].bdlr3);
+		writel_relaxed(val, &mctl_phy->dx[i].bdlr3);
 
 		val = readl(&mctl_phy->dx[i].bdlr4);
 		for (j = 0; j < 4; j++)
 			val += para->dx_read_delays[i][j + 4] << (j * 8);
-		writel(val, &mctl_phy->dx[i].bdlr4);
+		writel_relaxed(val, &mctl_phy->dx[i].bdlr4);
 
 		val = readl(&mctl_phy->dx[i].bdlr5);
 		for (j = 0; j < 4; j++)
 			val += para->dx_read_delays[i][j + 8] << (j * 8);
-		writel(val, &mctl_phy->dx[i].bdlr5);
+		writel_relaxed(val, &mctl_phy->dx[i].bdlr5);
 
 		val = readl(&mctl_phy->dx[i].bdlr6);
 		val += (para->dx_read_delays[i][12] << 8) |
 		       (para->dx_read_delays[i][13] << 16);
-		writel(val, &mctl_phy->dx[i].bdlr6);
+		writel_relaxed(val, &mctl_phy->dx[i].bdlr6);
 	}
 	setbits_le32(&mctl_phy->pgcr[0], BIT(26));
 	udelay(1);
@@ -517,8 +521,9 @@ static void mctl_bit_delay_set(struct dram_para *para)
 	for (i = 1; i < 14; i++) {
 		val = readl(&mctl_phy->acbdlr[i]);
 		val += 0x0a0a0a0a;
-		writel(val, &mctl_phy->acbdlr[i]);
+		writel_relaxed(val, &mctl_phy->acbdlr[i]);
 	}
+	DMB;
 }
 
 static void mctl_channel_init(struct dram_para *para)
-- 
2.14.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 1/3] arm: clean up asm/io.h
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 1/3] arm: clean up asm/io.h Andre Przywara
@ 2019-02-20 12:24   ` Alexander Graf
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Graf @ 2019-02-20 12:24 UTC (permalink / raw)
  To: u-boot

On 02/10/2019 05:17 PM, Andre Przywara wrote:
> asm/io.h is the header file containing the central MMIO accessor macros.
> Judging by the header and the comments, it was apparently once copied
> from the Linux kernel, but has deviated since then heavily.
>
> Clean up the definitions by:
> - removing pointless Linux history
> - removing commented code
> - removing outdated comments
> - removing unused definitions, for instance for mem_isa and mem_pci
>
> This massively improves the readability of the file.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>

Reviewed-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 2/3] arm: introduce _relaxed MMIO accessors
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 2/3] arm: introduce _relaxed MMIO accessors Andre Przywara
@ 2019-02-20 12:24   ` Alexander Graf
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Graf @ 2019-02-20 12:24 UTC (permalink / raw)
  To: u-boot

On 02/10/2019 05:17 PM, Andre Przywara wrote:
> The normal MMIO accessor macros (readX/writeX) guarantee a strong ordering,
> even with normal memory accesses: https://lwn.net/Articles/698014/
> For some MMIO operations (framebuffers being a prominent example) this is
> not needed, and the rather costly barrier inserted on weakly ordered
> systems like ARM costs some performance.
> To mitigate this issue, Linux introduced readX_relaxed and
> writeX_relaxed primitives, which only guarantee ordering between each
> other, so are typically faster due to the missing barrier.
>
> We probably do not care so much about performance in U-Boot, but want to
> have those primitives for two other reasons:
> - Being able to use the _relaxed macros simplifies porting drivers from
>    Linux.
> - The missing barrier typically allows to generate smaller code, which can
>    relieve some chronically tight SPL builds.
>
> Add those macros definitions by using the __raw versions of the
> accessors, which matches an earlier (and less complicated) version of
> the Linux implementation.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
> [On my experimental RK3399 after modifying a few drivers:]
> Tested-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>

Reviewed-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 3/3] sunxi: H6: use writel_relaxed for DRAM timing register accesses
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 3/3] sunxi: H6: use writel_relaxed for DRAM timing register accesses Andre Przywara
@ 2019-02-20 12:25   ` Alexander Graf
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Graf @ 2019-02-20 12:25 UTC (permalink / raw)
  To: u-boot

On 02/10/2019 05:17 PM, Andre Przywara wrote:
> The timing registers in the DRAM controller can be programmed in any
> order, as they will only take effect once the controller is eventually
> "activated".
>
> Switch the MMIO writes in mctl_set_timing_lpddr3() over to use
> writel_relaxed(), since we don't need the stronger guarantee of the
> normal writel(). We satisfy the overall ordering requirement by ending
> the function with an explicit DMB barrier.
>
> In this case we are not interested in the performance benefit this
> usually gives, but in the saved instructions, which sum up for the many
> writes we have in the timing setup.
> Due to alignment effects this shrinks our chronically tight H6 SPL by a
> whopping 2KB, which brings it in the same region as for the other
> AArch64 Allwinner SPL builds.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>

If you say it still works, it sounds like a pretty nifty optimization :).

Reviewed-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-02-10 16:17 [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Andre Przywara
                   ` (2 preceding siblings ...)
  2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 3/3] sunxi: H6: use writel_relaxed for DRAM timing register accesses Andre Przywara
@ 2019-04-15  6:07 ` Jagan Teki
  2019-04-15  6:10   ` Chen-Yu Tsai
  2019-04-18 17:00 ` Jagan Teki
  2019-04-29 17:16 ` Jagan Teki
  5 siblings, 1 reply; 18+ messages in thread
From: Jagan Teki @ 2019-04-15  6:07 UTC (permalink / raw)
  To: u-boot

On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>
> Hi, this is a resend of what I posted some weeks ago, just adding the
> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> the opportunity to add his Reviewed-by: tags on the first two patches.
> (Many thanks for that!) The rest is unchanged.
> -------------------
>
> Admittedly this is the long way round to solve some nasty SPL code size
> problem, but it looked beneficial to others as well, so here we go:
>
> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> time, and was more or less blindly copied from Linux.
> We don't use and don't need most of the definitions, and mainline Linux
> got rid of them anyway, so patch 1/3 cleans up this header file to
> just contain what we need in U-Boot.
>
> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> but more importantly save one (barrier) instruction per accessor. This
> helps to bring down code size, since especially DRAM controller inits in
> SPLs tend to do a lot of MMIO.
>
> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> which reduces the SPL size by a whopping 2KB, due to a twist:
> The AArch64 exception table needs to be 2KB aligned, but we don't do
> anything special about it the linker script. So depending on where the
> code before the vectors ends, we have potentially large padding:
> At the moment this last address is 0x1824 for the H6, so the vectors can
> only start at 0x2000. By reducing the code size before the vectors by just
> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> the padding.

How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1

₹ aarch64-linux-gnu-size spl/u-boot-spl*
   text       data        bss        dec        hex    filename
  28376        408        504      29288       7268    spl/u-boot-spl

₹ aarch64-linux-gnu-size spl/u-boot-spl*
   text       data        bss        dec        hex    filename
  28216        408        504      29128       71c8    spl/u-boot-spl

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-04-15  6:07 ` [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Jagan Teki
@ 2019-04-15  6:10   ` Chen-Yu Tsai
  2019-04-15  6:22     ` Jagan Teki
  0 siblings, 1 reply; 18+ messages in thread
From: Chen-Yu Tsai @ 2019-04-15  6:10 UTC (permalink / raw)
  To: u-boot

On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
>
> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >
> > Hi, this is a resend of what I posted some weeks ago, just adding the
> > missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> > the opportunity to add his Reviewed-by: tags on the first two patches.
> > (Many thanks for that!) The rest is unchanged.
> > -------------------
> >
> > Admittedly this is the long way round to solve some nasty SPL code size
> > problem, but it looked beneficial to others as well, so here we go:
> >
> > arch/arm/include/asm/io.h looks like it's been around since the dawn of
> > time, and was more or less blindly copied from Linux.
> > We don't use and don't need most of the definitions, and mainline Linux
> > got rid of them anyway, so patch 1/3 cleans up this header file to
> > just contain what we need in U-Boot.
> >
> > Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> > but more importantly save one (barrier) instruction per accessor. This
> > helps to bring down code size, since especially DRAM controller inits in
> > SPLs tend to do a lot of MMIO.
> >
> > Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> > which reduces the SPL size by a whopping 2KB, due to a twist:
> > The AArch64 exception table needs to be 2KB aligned, but we don't do
> > anything special about it the linker script. So depending on where the
> > code before the vectors ends, we have potentially large padding:
> > At the moment this last address is 0x1824 for the H6, so the vectors can
> > only start at 0x2000. By reducing the code size before the vectors by just
> > (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> > the padding.
>
> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
>
> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>    text       data        bss        dec        hex    filename
>   28376        408        504      29288       7268    spl/u-boot-spl
>
> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>    text       data        bss        dec        hex    filename
>   28216        408        504      29128       71c8    spl/u-boot-spl

Because of section alignment issues? I believe Andre is referring to the
size of the whole file. Since it gets loaded as a whole, the total size
is what matters, not the size of the individual sections.

ChenYu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-04-15  6:10   ` Chen-Yu Tsai
@ 2019-04-15  6:22     ` Jagan Teki
  2019-04-15  7:48       ` André Przywara
  0 siblings, 1 reply; 18+ messages in thread
From: Jagan Teki @ 2019-04-15  6:22 UTC (permalink / raw)
  To: u-boot

On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
>
> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
> >
> > On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> > >
> > > Hi, this is a resend of what I posted some weeks ago, just adding the
> > > missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> > > the opportunity to add his Reviewed-by: tags on the first two patches.
> > > (Many thanks for that!) The rest is unchanged.
> > > -------------------
> > >
> > > Admittedly this is the long way round to solve some nasty SPL code size
> > > problem, but it looked beneficial to others as well, so here we go:
> > >
> > > arch/arm/include/asm/io.h looks like it's been around since the dawn of
> > > time, and was more or less blindly copied from Linux.
> > > We don't use and don't need most of the definitions, and mainline Linux
> > > got rid of them anyway, so patch 1/3 cleans up this header file to
> > > just contain what we need in U-Boot.
> > >
> > > Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> > > but more importantly save one (barrier) instruction per accessor. This
> > > helps to bring down code size, since especially DRAM controller inits in
> > > SPLs tend to do a lot of MMIO.
> > >
> > > Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> > > which reduces the SPL size by a whopping 2KB, due to a twist:
> > > The AArch64 exception table needs to be 2KB aligned, but we don't do
> > > anything special about it the linker script. So depending on where the
> > > code before the vectors ends, we have potentially large padding:
> > > At the moment this last address is 0x1824 for the H6, so the vectors can
> > > only start at 0x2000. By reducing the code size before the vectors by just
> > > (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> > > the padding.
> >
> > How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
> >
> > ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >    text       data        bss        dec        hex    filename
> >   28376        408        504      29288       7268    spl/u-boot-spl
> >
> > ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >    text       data        bss        dec        hex    filename
> >   28216        408        504      29128       71c8    spl/u-boot-spl
>
> Because of section alignment issues? I believe Andre is referring to the
> size of the whole file. Since it gets loaded as a whole, the total size
> is what matters, not the size of the individual sections.

Well, the input for final sunxi-spl.bin would be u-boot-spl and the
above shows the size of file as well 29128 bytes with -160 bytes from
29288.

Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
any difference either.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-04-15  6:22     ` Jagan Teki
@ 2019-04-15  7:48       ` André Przywara
  2019-04-17 12:00         ` Jagan Teki
  0 siblings, 1 reply; 18+ messages in thread
From: André Przywara @ 2019-04-15  7:48 UTC (permalink / raw)
  To: u-boot

On 15/04/2019 07:22, Jagan Teki wrote:

Hi,

> On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
>>
>> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
>>>
>>> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>>>>
>>>> Hi, this is a resend of what I posted some weeks ago, just adding the
>>>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
>>>> the opportunity to add his Reviewed-by: tags on the first two patches.
>>>> (Many thanks for that!) The rest is unchanged.
>>>> -------------------
>>>>
>>>> Admittedly this is the long way round to solve some nasty SPL code size
>>>> problem, but it looked beneficial to others as well, so here we go:
>>>>
>>>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
>>>> time, and was more or less blindly copied from Linux.
>>>> We don't use and don't need most of the definitions, and mainline Linux
>>>> got rid of them anyway, so patch 1/3 cleans up this header file to
>>>> just contain what we need in U-Boot.
>>>>
>>>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
>>>> but more importantly save one (barrier) instruction per accessor. This
>>>> helps to bring down code size, since especially DRAM controller inits in
>>>> SPLs tend to do a lot of MMIO.
>>>>
>>>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
>>>> which reduces the SPL size by a whopping 2KB, due to a twist:
>>>> The AArch64 exception table needs to be 2KB aligned, but we don't do
>>>> anything special about it the linker script. So depending on where the
>>>> code before the vectors ends, we have potentially large padding:
>>>> At the moment this last address is 0x1824 for the H6, so the vectors can
>>>> only start at 0x2000. By reducing the code size before the vectors by just
>>>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
>>>> the padding.
>>>
>>> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
>>>
>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>>>    text       data        bss        dec        hex    filename
>>>   28376        408        504      29288       7268    spl/u-boot-spl
>>>
>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>>>    text       data        bss        dec        hex    filename
>>>   28216        408        504      29128       71c8    spl/u-boot-spl
>>
>> Because of section alignment issues? I believe Andre is referring to the
>> size of the whole file. Since it gets loaded as a whole, the total size
>> is what matters, not the size of the individual sections.
> 
> Well, the input for final sunxi-spl.bin would be u-boot-spl and the
> above shows the size of file as well 29128 bytes with -160 bytes from
> 29288.
> 
> Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
> any difference either.

As mentioned in the commit messasge, this is a fragile topic. Since
commit ef331e3685fe ("armv8: Disable exception vectors in SPL by
default") we disable the SPL exception vectors by default now, so the
numbers are now different.
You should be able to see the 2K saving with the SPL exception vectors
explicitly enabled in menuconfig.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-04-15  7:48       ` André Przywara
@ 2019-04-17 12:00         ` Jagan Teki
  2019-04-18  0:37           ` André Przywara
  0 siblings, 1 reply; 18+ messages in thread
From: Jagan Teki @ 2019-04-17 12:00 UTC (permalink / raw)
  To: u-boot

On Mon, Apr 15, 2019 at 1:21 PM André Przywara <andre.przywara@arm.com> wrote:
>
> On 15/04/2019 07:22, Jagan Teki wrote:
>
> Hi,
>
> > On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
> >>
> >> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
> >>>
> >>> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >>>>
> >>>> Hi, this is a resend of what I posted some weeks ago, just adding the
> >>>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> >>>> the opportunity to add his Reviewed-by: tags on the first two patches.
> >>>> (Many thanks for that!) The rest is unchanged.
> >>>> -------------------
> >>>>
> >>>> Admittedly this is the long way round to solve some nasty SPL code size
> >>>> problem, but it looked beneficial to others as well, so here we go:
> >>>>
> >>>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> >>>> time, and was more or less blindly copied from Linux.
> >>>> We don't use and don't need most of the definitions, and mainline Linux
> >>>> got rid of them anyway, so patch 1/3 cleans up this header file to
> >>>> just contain what we need in U-Boot.
> >>>>
> >>>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> >>>> but more importantly save one (barrier) instruction per accessor. This
> >>>> helps to bring down code size, since especially DRAM controller inits in
> >>>> SPLs tend to do a lot of MMIO.
> >>>>
> >>>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> >>>> which reduces the SPL size by a whopping 2KB, due to a twist:
> >>>> The AArch64 exception table needs to be 2KB aligned, but we don't do
> >>>> anything special about it the linker script. So depending on where the
> >>>> code before the vectors ends, we have potentially large padding:
> >>>> At the moment this last address is 0x1824 for the H6, so the vectors can
> >>>> only start at 0x2000. By reducing the code size before the vectors by just
> >>>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> >>>> the padding.
> >>>
> >>> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
> >>>
> >>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >>>    text       data        bss        dec        hex    filename
> >>>   28376        408        504      29288       7268    spl/u-boot-spl
> >>>
> >>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >>>    text       data        bss        dec        hex    filename
> >>>   28216        408        504      29128       71c8    spl/u-boot-spl
> >>
> >> Because of section alignment issues? I believe Andre is referring to the
> >> size of the whole file. Since it gets loaded as a whole, the total size
> >> is what matters, not the size of the individual sections.
> >
> > Well, the input for final sunxi-spl.bin would be u-boot-spl and the
> > above shows the size of file as well 29128 bytes with -160 bytes from
> > 29288.
> >
> > Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
> > any difference either.
>
> As mentioned in the commit messasge, this is a fragile topic. Since
> commit ef331e3685fe ("armv8: Disable exception vectors in SPL by
> default") we disable the SPL exception vectors by default now, so the
> numbers are now different.

Sorry, I over looked the commit messages.

> You should be able to see the 2K saving with the SPL exception vectors
> explicitly enabled in menuconfig.

Oh. So I enabled the vectors via ARMV8_SPL_EXCEPTION_VECTORS=y but it
seems increased the SPL size.

₹ aarch64-linux-gnu-size spl/u-boot-spl
   text       data        bss        dec        hex    filename
  30130        408        504      31042       7942    spl/u-boot-spl

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-04-17 12:00         ` Jagan Teki
@ 2019-04-18  0:37           ` André Przywara
  2019-04-18 16:59             ` Jagan Teki
  0 siblings, 1 reply; 18+ messages in thread
From: André Przywara @ 2019-04-18  0:37 UTC (permalink / raw)
  To: u-boot

On 17/04/2019 13:00, Jagan Teki wrote:
> On Mon, Apr 15, 2019 at 1:21 PM André Przywara <andre.przywara@arm.com> wrote:
>>
>> On 15/04/2019 07:22, Jagan Teki wrote:
>>
>> Hi,
>>
>>> On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
>>>>
>>>> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
>>>>>
>>>>> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>>>>>>
>>>>>> Hi, this is a resend of what I posted some weeks ago, just adding the
>>>>>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
>>>>>> the opportunity to add his Reviewed-by: tags on the first two patches.
>>>>>> (Many thanks for that!) The rest is unchanged.
>>>>>> -------------------
>>>>>>
>>>>>> Admittedly this is the long way round to solve some nasty SPL code size
>>>>>> problem, but it looked beneficial to others as well, so here we go:
>>>>>>
>>>>>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
>>>>>> time, and was more or less blindly copied from Linux.
>>>>>> We don't use and don't need most of the definitions, and mainline Linux
>>>>>> got rid of them anyway, so patch 1/3 cleans up this header file to
>>>>>> just contain what we need in U-Boot.
>>>>>>
>>>>>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
>>>>>> but more importantly save one (barrier) instruction per accessor. This
>>>>>> helps to bring down code size, since especially DRAM controller inits in
>>>>>> SPLs tend to do a lot of MMIO.
>>>>>>
>>>>>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
>>>>>> which reduces the SPL size by a whopping 2KB, due to a twist:
>>>>>> The AArch64 exception table needs to be 2KB aligned, but we don't do
>>>>>> anything special about it the linker script. So depending on where the
>>>>>> code before the vectors ends, we have potentially large padding:
>>>>>> At the moment this last address is 0x1824 for the H6, so the vectors can
>>>>>> only start at 0x2000. By reducing the code size before the vectors by just
>>>>>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
>>>>>> the padding.
>>>>>
>>>>> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
>>>>>
>>>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>>>>>    text       data        bss        dec        hex    filename
>>>>>   28376        408        504      29288       7268    spl/u-boot-spl
>>>>>
>>>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>>>>>    text       data        bss        dec        hex    filename
>>>>>   28216        408        504      29128       71c8    spl/u-boot-spl
>>>>
>>>> Because of section alignment issues? I believe Andre is referring to the
>>>> size of the whole file. Since it gets loaded as a whole, the total size
>>>> is what matters, not the size of the individual sections.
>>>
>>> Well, the input for final sunxi-spl.bin would be u-boot-spl and the
>>> above shows the size of file as well 29128 bytes with -160 bytes from
>>> 29288.
>>>
>>> Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
>>> any difference either.
>>
>> As mentioned in the commit messasge, this is a fragile topic. Since
>> commit ef331e3685fe ("armv8: Disable exception vectors in SPL by
>> default") we disable the SPL exception vectors by default now, so the
>> numbers are now different.
> 
> Sorry, I over looked the commit messages.
> 
>> You should be able to see the 2K saving with the SPL exception vectors
>> explicitly enabled in menuconfig.
> 
> Oh. So I enabled the vectors via ARMV8_SPL_EXCEPTION_VECTORS=y but it
> seems increased the SPL size.
> 
> ₹ aarch64-linux-gnu-size spl/u-boot-spl
>    text       data        bss        dec        hex    filename
>   30130        408        504      31042       7942    spl/u-boot-spl

Sure, I meant you should see an effect with vs. without these patches,
both with the vectors enabled. But as mentioned, this is somewhat
random, depending on other code changes.
With current mainline for pine_h64_defconfig I get 29344 vs. 29504 bytes
without the vectors, and 31202 vs. "will not fit" with the vectors.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-04-18  0:37           ` André Przywara
@ 2019-04-18 16:59             ` Jagan Teki
  0 siblings, 0 replies; 18+ messages in thread
From: Jagan Teki @ 2019-04-18 16:59 UTC (permalink / raw)
  To: u-boot

On Thu, Apr 18, 2019 at 6:07 AM André Przywara <andre.przywara@arm.com> wrote:
>
> On 17/04/2019 13:00, Jagan Teki wrote:
> > On Mon, Apr 15, 2019 at 1:21 PM André Przywara <andre.przywara@arm.com> wrote:
> >>
> >> On 15/04/2019 07:22, Jagan Teki wrote:
> >>
> >> Hi,
> >>
> >>> On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
> >>>>
> >>>> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
> >>>>>
> >>>>> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >>>>>>
> >>>>>> Hi, this is a resend of what I posted some weeks ago, just adding the
> >>>>>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> >>>>>> the opportunity to add his Reviewed-by: tags on the first two patches.
> >>>>>> (Many thanks for that!) The rest is unchanged.
> >>>>>> -------------------
> >>>>>>
> >>>>>> Admittedly this is the long way round to solve some nasty SPL code size
> >>>>>> problem, but it looked beneficial to others as well, so here we go:
> >>>>>>
> >>>>>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> >>>>>> time, and was more or less blindly copied from Linux.
> >>>>>> We don't use and don't need most of the definitions, and mainline Linux
> >>>>>> got rid of them anyway, so patch 1/3 cleans up this header file to
> >>>>>> just contain what we need in U-Boot.
> >>>>>>
> >>>>>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> >>>>>> but more importantly save one (barrier) instruction per accessor. This
> >>>>>> helps to bring down code size, since especially DRAM controller inits in
> >>>>>> SPLs tend to do a lot of MMIO.
> >>>>>>
> >>>>>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> >>>>>> which reduces the SPL size by a whopping 2KB, due to a twist:
> >>>>>> The AArch64 exception table needs to be 2KB aligned, but we don't do
> >>>>>> anything special about it the linker script. So depending on where the
> >>>>>> code before the vectors ends, we have potentially large padding:
> >>>>>> At the moment this last address is 0x1824 for the H6, so the vectors can
> >>>>>> only start at 0x2000. By reducing the code size before the vectors by just
> >>>>>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> >>>>>> the padding.
> >>>>>
> >>>>> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
> >>>>>
> >>>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >>>>>    text       data        bss        dec        hex    filename
> >>>>>   28376        408        504      29288       7268    spl/u-boot-spl
> >>>>>
> >>>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >>>>>    text       data        bss        dec        hex    filename
> >>>>>   28216        408        504      29128       71c8    spl/u-boot-spl
> >>>>
> >>>> Because of section alignment issues? I believe Andre is referring to the
> >>>> size of the whole file. Since it gets loaded as a whole, the total size
> >>>> is what matters, not the size of the individual sections.
> >>>
> >>> Well, the input for final sunxi-spl.bin would be u-boot-spl and the
> >>> above shows the size of file as well 29128 bytes with -160 bytes from
> >>> 29288.
> >>>
> >>> Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
> >>> any difference either.
> >>
> >> As mentioned in the commit messasge, this is a fragile topic. Since
> >> commit ef331e3685fe ("armv8: Disable exception vectors in SPL by
> >> default") we disable the SPL exception vectors by default now, so the
> >> numbers are now different.
> >
> > Sorry, I over looked the commit messages.
> >
> >> You should be able to see the 2K saving with the SPL exception vectors
> >> explicitly enabled in menuconfig.
> >
> > Oh. So I enabled the vectors via ARMV8_SPL_EXCEPTION_VECTORS=y but it
> > seems increased the SPL size.
> >
> > ₹ aarch64-linux-gnu-size spl/u-boot-spl
> >    text       data        bss        dec        hex    filename
> >   30130        408        504      31042       7942    spl/u-boot-spl
>
> Sure, I meant you should see an effect with vs. without these patches,
> both with the vectors enabled. But as mentioned, this is somewhat
> random, depending on other code changes.
> With current mainline for pine_h64_defconfig I get 29344 vs. 29504 bytes
> without the vectors, and 31202 vs. "will not fit" with the vectors.

Okay.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-02-10 16:17 [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Andre Przywara
                   ` (3 preceding siblings ...)
  2019-04-15  6:07 ` [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Jagan Teki
@ 2019-04-18 17:00 ` Jagan Teki
  2019-04-25 18:01   ` Jagan Teki
  2019-04-29 17:16 ` Jagan Teki
  5 siblings, 1 reply; 18+ messages in thread
From: Jagan Teki @ 2019-04-18 17:00 UTC (permalink / raw)
  To: u-boot

On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>
> Hi, this is a resend of what I posted some weeks ago, just adding the
> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> the opportunity to add his Reviewed-by: tags on the first two patches.
> (Many thanks for that!) The rest is unchanged.
> -------------------
>
> Admittedly this is the long way round to solve some nasty SPL code size
> problem, but it looked beneficial to others as well, so here we go:
>
> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> time, and was more or less blindly copied from Linux.
> We don't use and don't need most of the definitions, and mainline Linux
> got rid of them anyway, so patch 1/3 cleans up this header file to
> just contain what we need in U-Boot.
>
> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> but more importantly save one (barrier) instruction per accessor. This
> helps to bring down code size, since especially DRAM controller inits in
> SPLs tend to do a lot of MMIO.
>
> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> which reduces the SPL size by a whopping 2KB, due to a twist:
> The AArch64 exception table needs to be 2KB aligned, but we don't do
> anything special about it the linker script. So depending on where the
> code before the vectors ends, we have potentially large padding:
> At the moment this last address is 0x1824 for the H6, so the vectors can
> only start at 0x2000. By reducing the code size before the vectors by just
> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> the padding.
>
> I understand that the proper solution is to fill the gap before the vectors
> with code instead of NOPs, but I couldn't find any obvious way doing this
> in the linker script. If anyone has any idea here, I am all ears.
>
> Cheers,
> Andre.
>
> Andre Przywara (3):
>   arm: clean up asm/io.h
>   arm: introduce _relaxed MMIO accessors
>   sunxi: H6: use writel_relaxed for DRAM timing register accesses

Anyone has any further comments on this? would like to pick this before MW.

Jagan.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-04-18 17:00 ` Jagan Teki
@ 2019-04-25 18:01   ` Jagan Teki
  0 siblings, 0 replies; 18+ messages in thread
From: Jagan Teki @ 2019-04-25 18:01 UTC (permalink / raw)
  To: u-boot

On Thu, Apr 18, 2019 at 10:30 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
>
> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >
> > Hi, this is a resend of what I posted some weeks ago, just adding the
> > missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> > the opportunity to add his Reviewed-by: tags on the first two patches.
> > (Many thanks for that!) The rest is unchanged.
> > -------------------
> >
> > Admittedly this is the long way round to solve some nasty SPL code size
> > problem, but it looked beneficial to others as well, so here we go:
> >
> > arch/arm/include/asm/io.h looks like it's been around since the dawn of
> > time, and was more or less blindly copied from Linux.
> > We don't use and don't need most of the definitions, and mainline Linux
> > got rid of them anyway, so patch 1/3 cleans up this header file to
> > just contain what we need in U-Boot.
> >
> > Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> > but more importantly save one (barrier) instruction per accessor. This
> > helps to bring down code size, since especially DRAM controller inits in
> > SPLs tend to do a lot of MMIO.
> >
> > Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> > which reduces the SPL size by a whopping 2KB, due to a twist:
> > The AArch64 exception table needs to be 2KB aligned, but we don't do
> > anything special about it the linker script. So depending on where the
> > code before the vectors ends, we have potentially large padding:
> > At the moment this last address is 0x1824 for the H6, so the vectors can
> > only start at 0x2000. By reducing the code size before the vectors by just
> > (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> > the padding.
> >
> > I understand that the proper solution is to fill the gap before the vectors
> > with code instead of NOPs, but I couldn't find any obvious way doing this
> > in the linker script. If anyone has any idea here, I am all ears.
> >
> > Cheers,
> > Andre.
> >
> > Andre Przywara (3):
> >   arm: clean up asm/io.h
> >   arm: introduce _relaxed MMIO accessors
> >   sunxi: H6: use writel_relaxed for DRAM timing register accesses
>
> Anyone has any further comments on this? would like to pick this before MW.

Applied to u-boot-sunxi/master

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-02-10 16:17 [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Andre Przywara
                   ` (4 preceding siblings ...)
  2019-04-18 17:00 ` Jagan Teki
@ 2019-04-29 17:16 ` Jagan Teki
  2019-04-30 22:06   ` André Przywara
  5 siblings, 1 reply; 18+ messages in thread
From: Jagan Teki @ 2019-04-29 17:16 UTC (permalink / raw)
  To: u-boot

On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>
> Hi, this is a resend of what I posted some weeks ago, just adding the
> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> the opportunity to add his Reviewed-by: tags on the first two patches.
> (Many thanks for that!) The rest is unchanged.
> -------------------
>
> Admittedly this is the long way round to solve some nasty SPL code size
> problem, but it looked beneficial to others as well, so here we go:
>
> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> time, and was more or less blindly copied from Linux.
> We don't use and don't need most of the definitions, and mainline Linux
> got rid of them anyway, so patch 1/3 cleans up this header file to
> just contain what we need in U-Boot.
>
> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> but more importantly save one (barrier) instruction per accessor. This
> helps to bring down code size, since especially DRAM controller inits in
> SPLs tend to do a lot of MMIO.
>
> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> which reduces the SPL size by a whopping 2KB, due to a twist:
> The AArch64 exception table needs to be 2KB aligned, but we don't do
> anything special about it the linker script. So depending on where the
> code before the vectors ends, we have potentially large padding:
> At the moment this last address is 0x1824 for the H6, so the vectors can
> only start at 0x2000. By reducing the code size before the vectors by just
> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> the padding.
>
> I understand that the proper solution is to fill the gap before the vectors
> with code instead of NOPs, but I couldn't find any obvious way doing this
> in the linker script. If anyone has any idea here, I am all ears.
>
> Cheers,
> Andre.
>
> Andre Przywara (3):
>   arm: clean up asm/io.h
>   arm: introduce _relaxed MMIO accessors
>   sunxi: H6: use writel_relaxed for DRAM timing register accesses

These have build issues with arm32, please send another series.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors
  2019-04-29 17:16 ` Jagan Teki
@ 2019-04-30 22:06   ` André Przywara
  0 siblings, 0 replies; 18+ messages in thread
From: André Przywara @ 2019-04-30 22:06 UTC (permalink / raw)
  To: u-boot

On 29/04/2019 18:16, Jagan Teki wrote:

Hi,

> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>>
>> Hi, this is a resend of what I posted some weeks ago, just adding the
>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
>> the opportunity to add his Reviewed-by: tags on the first two patches.
>> (Many thanks for that!) The rest is unchanged.
>> -------------------
>>
>> Admittedly this is the long way round to solve some nasty SPL code size
>> problem, but it looked beneficial to others as well, so here we go:
>>
>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
>> time, and was more or less blindly copied from Linux.
>> We don't use and don't need most of the definitions, and mainline Linux
>> got rid of them anyway, so patch 1/3 cleans up this header file to
>> just contain what we need in U-Boot.
>>
>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
>> but more importantly save one (barrier) instruction per accessor. This
>> helps to bring down code size, since especially DRAM controller inits in
>> SPLs tend to do a lot of MMIO.
>>
>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
>> which reduces the SPL size by a whopping 2KB, due to a twist:
>> The AArch64 exception table needs to be 2KB aligned, but we don't do
>> anything special about it the linker script. So depending on where the
>> code before the vectors ends, we have potentially large padding:
>> At the moment this last address is 0x1824 for the H6, so the vectors can
>> only start at 0x2000. By reducing the code size before the vectors by just
>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
>> the padding.
>>
>> I understand that the proper solution is to fill the gap before the vectors
>> with code instead of NOPs, but I couldn't find any obvious way doing this
>> in the linker script. If anyone has any idea here, I am all ears.
>>
>> Cheers,
>> Andre.
>>
>> Andre Przywara (3):
>>   arm: clean up asm/io.h
>>   arm: introduce _relaxed MMIO accessors
>>   sunxi: H6: use writel_relaxed for DRAM timing register accesses
> 
> These have build issues with arm32, please send another series.

Thanks for the elaborate error report ;-)

There is commit 6478848d165b63293f7021db9b70ce25a1e1062c, which does
basically the same thing as patch 2/3 in this series and was merged by
Tom already. This causes the double definition.
So just dropping the middle patch from this series should do the trick.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-04-30 22:06 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-10 16:17 [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Andre Przywara
2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 1/3] arm: clean up asm/io.h Andre Przywara
2019-02-20 12:24   ` Alexander Graf
2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 2/3] arm: introduce _relaxed MMIO accessors Andre Przywara
2019-02-20 12:24   ` Alexander Graf
2019-02-10 16:17 ` [U-Boot] [RESEND PATCH 3/3] sunxi: H6: use writel_relaxed for DRAM timing register accesses Andre Przywara
2019-02-20 12:25   ` Alexander Graf
2019-04-15  6:07 ` [U-Boot] [RESEND PATCH 0/3] arm: Introduce writel/readl_relaxed accessors Jagan Teki
2019-04-15  6:10   ` Chen-Yu Tsai
2019-04-15  6:22     ` Jagan Teki
2019-04-15  7:48       ` André Przywara
2019-04-17 12:00         ` Jagan Teki
2019-04-18  0:37           ` André Przywara
2019-04-18 16:59             ` Jagan Teki
2019-04-18 17:00 ` Jagan Teki
2019-04-25 18:01   ` Jagan Teki
2019-04-29 17:16 ` Jagan Teki
2019-04-30 22:06   ` André Przywara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.