All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
@ 2015-03-13  6:13 Stephen Warren
  2015-03-13 14:30 ` Marek Vasut
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Stephen Warren @ 2015-03-13  6:13 UTC (permalink / raw)
  To: u-boot

BCM2835 bus addresses use the top 2 bits to determine whether peripherals
use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states
that:

0: L1 & L2 cached
4: L2 cache coherent (non allocaing)
8: L2 cached only
c: Direct uncached.

That document also states that "Software accessing RAM using the DMA
engines must use bus addresses (base at 0xc0000000). However, this appears
to be incorrect since it does not work in practice on the bcm2835
(although it does on bcm2836). "usb start" causes some EABI function to
call raise(8), presumably due to corrupted USB IN data (the converse is
true on bcm2836; a value of 4 causes signals). However, I haven't
investigated the cause.

A value of 4 matches what the RPI Foundation's kernel; see the definition
of _REAL_BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h. With
the code updated to implement a phys->bus translation by setting the top
two bits of DWC2 DMA addresses to 4, USB keyboard support appears stable.

A similar change is made for bcm2836 (RPi 2). I can't justify this value
since it doesn't match the RPi Foundation kernel. However, it does appear
to work for the built-in USB Ethernet at least.

Ideally, the bcm2835 SoC support would provide some common function for
any DMA-capable driver to call to perform the phys->bus translation,
rather than placing ifdefs in each driver file. However, I can't find
such a standard function in U-Boot.

I'm not sure if e.g. SDHCI needs this change too? It appears to work fine
without...

Cc: Eric Anholt <eric@anholt.net>
Cc: Gordon Hollingworth <gordon@holliweb.co.uk>
Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
---
(For those CC'd: note that this is a patch for U-Boot)

 drivers/usb/host/dwc2.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/dwc2.c b/drivers/usb/host/dwc2.c
index e370d29ffc8e..f647461eabbb 100644
--- a/drivers/usb/host/dwc2.c
+++ b/drivers/usb/host/dwc2.c
@@ -752,6 +752,7 @@ int chunk_msg(struct usb_device *dev, unsigned long pipe, int *pid, int in,
 	uint32_t xfer_len;
 	uint32_t num_packets;
 	int stop_transfer = 0;
+	uint32_t dma_addr;
 
 	debug("%s: msg: pipe %lx pid %d in %d len %d\n", __func__, pipe, *pid,
 	      in, len);
@@ -792,7 +793,26 @@ int chunk_msg(struct usb_device *dev, unsigned long pipe, int *pid, int in,
 		if (!in)
 			memcpy(aligned_buffer, (char *)buffer + done, len);
 
-		writel((uint32_t)aligned_buffer, &hc_regs->hcdma);
+		dma_addr = (uint32_t)aligned_buffer;
+#if defined(CONFIG_BCM2836)
+		/*
+		 * BCM2836 bus addresses use the top 2 bits to determine
+		 * whether peripherals use or bypass the GPU L1 and L2 cache.
+		 * While this doesn't match the value the RPi Foundation
+		 * kernel uses, it does work in practice for U-Boot.
+		 */
+		dma_addr |= 0xc0000000;
+#elif defined(CONFIG_BCM2835)
+		/*
+		 * BCM2835 bus addresses use the top 2 bits to determine
+		 * whether peripherals use or bypass the GPU L1 and L2 cache.
+		 * This phys->virt mapping matches what the RPI Foundation's
+		 * kernel does; see the definition of _REAL_BUS_OFFSET in
+		 * arch/arm/mach-bcm2708/include/mach/memory.h.
+		 */
+		dma_addr |= 0x40000000;
+#endif
+		writel(dma_addr, &hc_regs->hcdma);
 
 		/* Set host channel enable after all other setup is complete. */
 		clrsetbits_le32(&hc_regs->hcchar, DWC2_HCCHAR_MULTICNT_MASK |
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-13  6:13 [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations Stephen Warren
@ 2015-03-13 14:30 ` Marek Vasut
  2015-03-13 16:35   ` Stephen Warren
  2015-03-13 17:02 ` Eric Anholt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Marek Vasut @ 2015-03-13 14:30 UTC (permalink / raw)
  To: u-boot

On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:
> BCM2835 bus addresses use the top 2 bits to determine whether peripherals
> use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states
> that:
> 
> 0: L1 & L2 cached
> 4: L2 cache coherent (non allocaing)
> 8: L2 cached only
> c: Direct uncached.

Caches aren't working on BCM2xxx or what's the reason for this hack ?
Or are these different (not on-CPU) caches we're talking about (yes,
I did notice the GPU Lx cache stuff)?

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-13 14:30 ` Marek Vasut
@ 2015-03-13 16:35   ` Stephen Warren
  2015-03-13 18:13     ` Marek Vasut
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Warren @ 2015-03-13 16:35 UTC (permalink / raw)
  To: u-boot

On 03/13/2015 08:30 AM, Marek Vasut wrote:
> On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:
>> BCM2835 bus addresses use the top 2 bits to determine whether peripherals
>> use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states
>> that:
>>
>> 0: L1 & L2 cached
>> 4: L2 cache coherent (non allocaing)
>> 8: L2 cached only
>> c: Direct uncached.
>
> Caches aren't working on BCM2xxx or what's the reason for this hack ?
> Or are these different (not on-CPU) caches we're talking about (yes,
> I did notice the GPU Lx cache stuff)?

Yes, the "GPU" has its own caches, entirely separate from the ARM core 
and at a different location in the system bus structure, and it seems as 
if at least some other peripherals other than GPU/graphics/VideoCore 
access DRAM via those caches too.

There are some brief details in BCM2835-ARM-Peripherals.pdf, although it 
isn't terribly clear.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-13  6:13 [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations Stephen Warren
  2015-03-13 14:30 ` Marek Vasut
@ 2015-03-13 17:02 ` Eric Anholt
  2015-03-15 16:04 ` Stephen Warren
  2015-03-15 16:51 ` Stephen Warren
  3 siblings, 0 replies; 15+ messages in thread
From: Eric Anholt @ 2015-03-13 17:02 UTC (permalink / raw)
  To: u-boot

Stephen Warren <swarren@wwwdotorg.org> writes:

> BCM2835 bus addresses use the top 2 bits to determine whether peripherals
> use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states
> that:
>
> 0: L1 & L2 cached
> 4: L2 cache coherent (non allocaing)
> 8: L2 cached only
> c: Direct uncached.
>
> That document also states that "Software accessing RAM using the DMA
> engines must use bus addresses (base at 0xc0000000). However, this appears
> to be incorrect since it does not work in practice on the bcm2835
> (although it does on bcm2836). "usb start" causes some EABI function to
> call raise(8), presumably due to corrupted USB IN data (the converse is
> true on bcm2836; a value of 4 causes signals). However, I haven't
> investigated the cause.
>
> A value of 4 matches what the RPI Foundation's kernel; see the definition
> of _REAL_BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h. With
> the code updated to implement a phys->bus translation by setting the top
> two bits of DWC2 DMA addresses to 4, USB keyboard support appears stable.
>
> A similar change is made for bcm2836 (RPi 2). I can't justify this value
> since it doesn't match the RPi Foundation kernel. However, it does appear
> to work for the built-in USB Ethernet at least.
>
> Ideally, the bcm2835 SoC support would provide some common function for
> any DMA-capable driver to call to perform the phys->bus translation,
> rather than placing ifdefs in each driver file. However, I can't find
> such a standard function in U-Boot.

Huh.  Agreed that it seems like it should be 0xc top bits on both, but I
guess whatever works.

It does seem like we ought to have some vtophys / vtobus functions.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150313/639bb1f3/attachment.sig>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-13 16:35   ` Stephen Warren
@ 2015-03-13 18:13     ` Marek Vasut
  2015-03-13 18:39       ` Stephen Warren
  0 siblings, 1 reply; 15+ messages in thread
From: Marek Vasut @ 2015-03-13 18:13 UTC (permalink / raw)
  To: u-boot

On Friday, March 13, 2015 at 05:35:53 PM, Stephen Warren wrote:
> On 03/13/2015 08:30 AM, Marek Vasut wrote:
> > On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:
> >> BCM2835 bus addresses use the top 2 bits to determine whether
> >> peripherals use or bypass the GPU L1 and L2 cache.
> >> BCM2835-ARM-Peripherals.pdf states that:
> >> 
> >> 0: L1 & L2 cached
> >> 4: L2 cache coherent (non allocaing)
> >> 8: L2 cached only
> >> c: Direct uncached.
> > 
> > Caches aren't working on BCM2xxx or what's the reason for this hack ?
> > Or are these different (not on-CPU) caches we're talking about (yes,
> > I did notice the GPU Lx cache stuff)?
> 
> Yes, the "GPU" has its own caches, entirely separate from the ARM core
> and at a different location in the system bus structure, and it seems as
> if at least some other peripherals other than GPU/graphics/VideoCore
> access DRAM via those caches too.
> 
> There are some brief details in BCM2835-ARM-Peripherals.pdf, although it
> isn't terribly clear.

Thanks for clearing this up. I suspect there's no way to turn those caches
off altogether, right ? But uh ... ew :(

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-13 18:13     ` Marek Vasut
@ 2015-03-13 18:39       ` Stephen Warren
  2015-03-13 18:49         ` Marek Vasut
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Warren @ 2015-03-13 18:39 UTC (permalink / raw)
  To: u-boot

On 03/13/2015 12:13 PM, Marek Vasut wrote:
> On Friday, March 13, 2015 at 05:35:53 PM, Stephen Warren wrote:
>> On 03/13/2015 08:30 AM, Marek Vasut wrote:
>>> On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:
>>>> BCM2835 bus addresses use the top 2 bits to determine whether
>>>> peripherals use or bypass the GPU L1 and L2 cache.
>>>> BCM2835-ARM-Peripherals.pdf states that:
>>>>
>>>> 0: L1 & L2 cached
>>>> 4: L2 cache coherent (non allocaing)
>>>> 8: L2 cached only
>>>> c: Direct uncached.
>>>
>>> Caches aren't working on BCM2xxx or what's the reason for this hack ?
>>> Or are these different (not on-CPU) caches we're talking about (yes,
>>> I did notice the GPU Lx cache stuff)?
>>
>> Yes, the "GPU" has its own caches, entirely separate from the ARM core
>> and at a different location in the system bus structure, and it seems as
>> if at least some other peripherals other than GPU/graphics/VideoCore
>> access DRAM via those caches too.
>>
>> There are some brief details in BCM2835-ARM-Peripherals.pdf, although it
>> isn't terribly clear.
>
> Thanks for clearing this up. I suspect there's no way to turn those caches
> off altogether, right ? But uh ... ew :(

There may be, Search for disable_l2cache at http://elinux.org/RPiconfig. 
That option is read by the SoC's binary bootloader (which I believe 
99%-100% runs on the VideoCore not ARM) and programmed before the ARM 
bootloader (U-Boot) is started.

The disadvantages of the option are:

* According to all descriptions of the option I've seen, it requires 
that SW that wishes to run with that option enabled must pass a 
different upper 2 bits of physical address to DMA engines. See for 
example the elinux.org link above and:

https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/include/mach/memory.h#L38

https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/Kconfig#L43

* It's a system-wide option without any runtime control that I'm aware 
of, and so would affect anything U-Boot boots such as Linux, so Linux 
would need to be modified too. I assume it would reduce graphics 
performance at least.

As such, I don't think we want to require that option.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-13 18:39       ` Stephen Warren
@ 2015-03-13 18:49         ` Marek Vasut
  0 siblings, 0 replies; 15+ messages in thread
From: Marek Vasut @ 2015-03-13 18:49 UTC (permalink / raw)
  To: u-boot

On Friday, March 13, 2015 at 07:39:08 PM, Stephen Warren wrote:
> On 03/13/2015 12:13 PM, Marek Vasut wrote:
> > On Friday, March 13, 2015 at 05:35:53 PM, Stephen Warren wrote:
> >> On 03/13/2015 08:30 AM, Marek Vasut wrote:
> >>> On Friday, March 13, 2015 at 07:13:09 AM, Stephen Warren wrote:
> >>>> BCM2835 bus addresses use the top 2 bits to determine whether
> >>>> peripherals use or bypass the GPU L1 and L2 cache.
> >>>> BCM2835-ARM-Peripherals.pdf states that:
> >>>> 
> >>>> 0: L1 & L2 cached
> >>>> 4: L2 cache coherent (non allocaing)
> >>>> 8: L2 cached only
> >>>> c: Direct uncached.
> >>> 
> >>> Caches aren't working on BCM2xxx or what's the reason for this hack ?
> >>> Or are these different (not on-CPU) caches we're talking about (yes,
> >>> I did notice the GPU Lx cache stuff)?
> >> 
> >> Yes, the "GPU" has its own caches, entirely separate from the ARM core
> >> and at a different location in the system bus structure, and it seems as
> >> if at least some other peripherals other than GPU/graphics/VideoCore
> >> access DRAM via those caches too.
> >> 
> >> There are some brief details in BCM2835-ARM-Peripherals.pdf, although it
> >> isn't terribly clear.
> > 
> > Thanks for clearing this up. I suspect there's no way to turn those
> > caches off altogether, right ? But uh ... ew :(
> 
> There may be, Search for disable_l2cache at http://elinux.org/RPiconfig.
> That option is read by the SoC's binary bootloader (which I believe
> 99%-100% runs on the VideoCore not ARM) and programmed before the ARM
> bootloader (U-Boot) is started.
> 
> The disadvantages of the option are:
> 
> * According to all descriptions of the option I've seen, it requires
> that SW that wishes to run with that option enabled must pass a
> different upper 2 bits of physical address to DMA engines. See for
> example the elinux.org link above and:
> 
> https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/
> include/mach/memory.h#L38
> 
> https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/
> Kconfig#L43
> 
> * It's a system-wide option without any runtime control that I'm aware
> of, and so would affect anything U-Boot boots such as Linux, so Linux
> would need to be modified too. I assume it would reduce graphics
> performance at least.
> 
> As such, I don't think we want to require that option.

Agreed.

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-13  6:13 [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations Stephen Warren
  2015-03-13 14:30 ` Marek Vasut
  2015-03-13 17:02 ` Eric Anholt
@ 2015-03-15 16:04 ` Stephen Warren
  2015-03-15 18:20   ` Marek Vasut
  2015-03-15 16:51 ` Stephen Warren
  3 siblings, 1 reply; 15+ messages in thread
From: Stephen Warren @ 2015-03-15 16:04 UTC (permalink / raw)
  To: u-boot

On 03/13/2015 12:13 AM, Stephen Warren wrote:
> BCM2835 bus addresses use the top 2 bits to determine whether peripherals
> use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states
> that: ...

If you do end up applying this, the subject should say phys->bus not
phys->virt.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-13  6:13 [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations Stephen Warren
                   ` (2 preceding siblings ...)
  2015-03-15 16:04 ` Stephen Warren
@ 2015-03-15 16:51 ` Stephen Warren
  2015-03-15 18:20   ` Marek Vasut
  3 siblings, 1 reply; 15+ messages in thread
From: Stephen Warren @ 2015-03-15 16:51 UTC (permalink / raw)
  To: u-boot

On 03/13/2015 12:13 AM, Stephen Warren wrote:
> BCM2835 bus addresses use the top 2 bits to determine whether peripherals
> use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states
> that:
> 
> 0: L1 & L2 cached
> 4: L2 cache coherent (non allocaing)
> 8: L2 cached only
> c: Direct uncached.
> 
> That document also states that "Software accessing RAM using the DMA
> engines must use bus addresses (base at 0xc0000000). However, this appears
> to be incorrect since it does not work in practice on the bcm2835
> (although it does on bcm2836). "usb start" causes some EABI function to
> call raise(8), presumably due to corrupted USB IN data (the converse is
> true on bcm2836; a value of 4 causes signals). However, I haven't
> investigated the cause.

I've confirmed that the raise(8) calls are due to corrupted USB IN data;
the maxpacketsize field in the device descriptor is getting corrupted to
0, which in turn surely causes division by zero when calculating the
number of packets in a transfer, for example.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-15 16:51 ` Stephen Warren
@ 2015-03-15 18:20   ` Marek Vasut
  0 siblings, 0 replies; 15+ messages in thread
From: Marek Vasut @ 2015-03-15 18:20 UTC (permalink / raw)
  To: u-boot

On Sunday, March 15, 2015 at 05:51:26 PM, Stephen Warren wrote:
> On 03/13/2015 12:13 AM, Stephen Warren wrote:
> > BCM2835 bus addresses use the top 2 bits to determine whether peripherals
> > use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states
> > that:
> > 
> > 0: L1 & L2 cached
> > 4: L2 cache coherent (non allocaing)
> > 8: L2 cached only
> > c: Direct uncached.
> > 
> > That document also states that "Software accessing RAM using the DMA
> > engines must use bus addresses (base at 0xc0000000). However, this
> > appears to be incorrect since it does not work in practice on the
> > bcm2835 (although it does on bcm2836). "usb start" causes some EABI
> > function to call raise(8), presumably due to corrupted USB IN data (the
> > converse is true on bcm2836; a value of 4 causes signals). However, I
> > haven't investigated the cause.
> 
> I've confirmed that the raise(8) calls are due to corrupted USB IN data;
> the maxpacketsize field in the device descriptor is getting corrupted to
> 0, which in turn surely causes division by zero when calculating the
> number of packets in a transfer, for example.

Nice progress :)

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-15 16:04 ` Stephen Warren
@ 2015-03-15 18:20   ` Marek Vasut
  2015-03-17  3:04     ` Stephen Warren
  0 siblings, 1 reply; 15+ messages in thread
From: Marek Vasut @ 2015-03-15 18:20 UTC (permalink / raw)
  To: u-boot

On Sunday, March 15, 2015 at 05:04:05 PM, Stephen Warren wrote:
> On 03/13/2015 12:13 AM, Stephen Warren wrote:
> > BCM2835 bus addresses use the top 2 bits to determine whether peripherals
> > use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states
> > that: ...
> 
> If you do end up applying this, the subject should say phys->bus not
> phys->virt.

I'd say we should wait a bit until these patches stabilize a little more,
don't you think so ?

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-15 18:20   ` Marek Vasut
@ 2015-03-17  3:04     ` Stephen Warren
  2015-03-17 14:57       ` popcorn mix
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Warren @ 2015-03-17  3:04 UTC (permalink / raw)
  To: u-boot

On 03/15/2015 12:20 PM, Marek Vasut wrote:
> On Sunday, March 15, 2015 at 05:04:05 PM, Stephen Warren wrote:
>> On 03/13/2015 12:13 AM, Stephen Warren wrote:
>>> BCM2835 bus addresses use the top 2 bits to determine whether peripherals
>>> use or bypass the GPU L1 and L2 cache. BCM2835-ARM-Peripherals.pdf states
>>> that: ...
>>
>> If you do end up applying this, the subject should say phys->bus not
>> phys->virt.
> 
> I'd say we should wait a bit until these patches stabilize a little more,
> don't you think so ?

I can see the argument. That said, I don't expect anything much to
"stabilize" about the patches; they appear to work!

It would be nice though if someone from the RPi Foundation could comment
on the exact effect of the upper bus address bits, and why 0xc would
work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status
(enabled, disabled) interacts with the GPU cache enable in any way, e.g.
burst vs. non-burst transactions on the bus or something? That's about
the only reason I can see for the RPi Foundation kernel working with 0x4
bus addresses on both chips, but U-Boot needing something different on
RPi2...

Dom, for reference, see:
http://lists.denx.de/pipermail/u-boot/2015-March/207947.html
http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-17  3:04     ` Stephen Warren
@ 2015-03-17 14:57       ` popcorn mix
  2015-03-17 17:29         ` Stephen Warren
  0 siblings, 1 reply; 15+ messages in thread
From: popcorn mix @ 2015-03-17 14:57 UTC (permalink / raw)
  To: u-boot

On 17/03/15 03:04, Stephen Warren wrote:
> It would be nice though if someone from the RPi Foundation could comment
> on the exact effect of the upper bus address bits, and why 0xc would
> work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status
> (enabled, disabled) interacts with the GPU cache enable in any way, e.g.
> burst vs. non-burst transactions on the bus or something? That's about
> the only reason I can see for the RPi Foundation kernel working with 0x4
> bus addresses on both chips, but U-Boot needing something different on
> RPi2...
>
> Dom, for reference, see:
> http://lists.denx.de/pipermail/u-boot/2015-March/207947.html
> http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

First, remember that 2835 is a large GPU with a small ARM attached. On some platforms the ARM is not even used.
The GPU boots first and may wake the arm. The GPU is the centre of the universe, and the ARM has to fit in.


Okay, I'll try to explain what goes on. Here are my definitions of some terms:

bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of addressable memory. The top two bits define the caching alias.
physical address: An ARM side address given to the VC MMU. This is a 30 bit address space.

The GPU always uses bus addresses. GPU bus mastering peripherals (like DMA) use bus addresses. The ARM uses physical addresses.

VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page is 16M and there are 64 pages. This maps 30-bits of physical address to 32-bits of bus address.
The setup of VC MMU is handled by the GPU and by default the mapping is:
2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus addresses 0x40000000-0x5ffffffff. The next page maps physical adddress 0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff
2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus addresses 0xc0000000-0xfefffffff. The next page maps physical adddress 0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff

Bus address 0x7exxxxxx contains the peripherals.
Note: the top 16M of sdram is not visible to the arm due the mapping of the peripherals. The GPU and GPU peripherals (DMA) can see it as they use bus addresses

The bus address cache alias bits are:

 From the VideoCore processor:
0x0 L1 and L2 cache allocating and coherent
0x4 L1 non-allocating, but coherent. L2 allocating and coherent
0x8 L1 non-allocating, but coherent. L2 non-allocating, but coherent
0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

 From the GPU peripherals (note: all peripherals bypass the L1 cache. The arm will see this view once through the VC MMU):
0x0 Do not use
0x4 L1 non-allocating, and incoherent. L2 allocating and coherent.
0x8 L1 non-allocating, and incoherent. L2 non-allocating, but coherent
0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

In general as long as VideoCore processor and GPU peripherals use the same alias everything works out. Mixing aliases requires flushing/invalidating for coherency and is generally avoided.

So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a 128M L2 cache. The GPU's L2 cache is accessible from the ARM but it's not particularly close (i.e. not very fast).
However mapping through the L2 allocating alias (0x4) was shown to be beneficial on 2835, so that is the alias we use.

The situation is different on 2836. The ARM has a 32K L1 cache and a 512M integrated/fast L2 cache. Additionally going through the smaller/slower GPU L2 is bad for performance.
So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache.

So, what does this mean? In general if you don't use GPU peripherals or communicate with the GPU, you only care about physical addresses and it makes no difference what bus address is actually being used.
The ARM just sees 1G of physical space that is always coherent. No flushing of GPU L2 cache is ever required. No need to know about aliases.

However if you do want to use GPU bus mastering peripherals (like DMA), or you communicate with the GPU (e.g. using the mailbox interface) you do need to distinguish physical and bus addresses, and you must use the correct alias.

So, on 2835 you convert from physical to bus address with
   bus_address = 0x40000000 | physical_address;
And on 2836 you convert from physical to bus address with
   bus_address = 0xC0000000 | physical_address;

(Note: you can get these offsets from device tree. See: https://github.com/raspberrypi/userland/commit/3b81b91c18ff19f97033e146a9f3262ca631f0e9#diff-c65a4fe18bb33aed0fc9536339f06b80R168)

So, when using GPU DMA, the addresses used for SCB, SA (source address), DA (dest address) must never be zero. They should be bus addresses and therefore 0x4 or 0xc aliases.
However the difference between a 0x0 alias and a 0x4 alias is small. Using 0x0 is wrong, may be incoherent, and may trigger exceptions on the GPU. But you may get away with it.
The difference between a 0x0 alias and a 0xC alias is much larger. There is now 128K of incoherent data you may hit. You are less likely to get away with getting this wrong.

So, I don't believe there is any issue with:
>ARM cache status (enabled, disabled) interacts with the GPU cache enable in any way, e.g. burst vs. non-burst transactions on the bus or something

but I would guess there may be a current bug/misunderstanding on Pi1 uboot that happens to be more fatal on Pi2.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-17 14:57       ` popcorn mix
@ 2015-03-17 17:29         ` Stephen Warren
  2015-03-17 17:53           ` popcorn mix
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Warren @ 2015-03-17 17:29 UTC (permalink / raw)
  To: u-boot

On 03/17/2015 08:57 AM, popcorn mix wrote:
> On 17/03/15 03:04, Stephen Warren wrote:
>> It would be nice though if someone from the RPi Foundation could comment
>> on the exact effect of the upper bus address bits, and why 0xc would
>> work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status
>> (enabled, disabled) interacts with the GPU cache enable in any way, e.g.
>> burst vs. non-burst transactions on the bus or something? That's about
>> the only reason I can see for the RPi Foundation kernel working with 0x4
>> bus addresses on both chips, but U-Boot needing something different on
>> RPi2...
>>
>> Dom, for reference, see:
>> http://lists.denx.de/pipermail/u-boot/2015-March/207947.html
>> http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

Thanks for the great explanation. I'll have to bookmark/archive it:-)

> First, remember that 2835 is a large GPU with a small ARM attached. On
> some platforms the ARM is not even used.
> The GPU boots first and may wake the arm. The GPU is the centre of the
> universe, and the ARM has to fit in.
>
> Okay, I'll try to explain what goes on. Here are my definitions of some
> terms:
>
> bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of
> addressable memory. The top two bits define the caching alias.
> physical address: An ARM side address given to the VC MMU. This is a 30
> bit address space.
>
> The GPU always uses bus addresses. GPU bus mastering peripherals (like
> DMA) use bus addresses. The ARM uses physical addresses.
>
> VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page
> is 16M and there are 64 pages. This maps 30-bits of physical address to
> 32-bits of bus address.
 >
> The setup of VC MMU is handled by the GPU and by default the mapping is:
> 2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus
> addresses 0x40000000-0x5ffffffff. The next page maps physical adddress
> 0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff
 >
> 2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus
> addresses 0xc0000000-0xfefffffff. The next page maps physical adddress
> 0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff

OK, this explains why in U-Boot, we need to OR in 0x40000000 on bcm2835 
and 0xc0000000 on bcm2836; that matches the VC MMU setup.

I guess we need to fix the U-Boot mailbox driver too, and many things in 
the upstream RPi kernel.

I have two more questions:

1)

Do the RPi 1 and RPi 2 use different kernel binaries in the RPi 
Foundation's images? I'd assumed there was a single unified binary which 
supported both. The reason I ask is that I see:

> https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/include/mach/memory.h#L38

> #ifdef CONFIG_BCM2708_NOL2CACHE
> #define _REAL_BUS_OFFSET UL(0xC0000000) /* don't use L1 or L2 caches */
> #else
> #define _REAL_BUS_OFFSET UL(0x40000000) /* use L2 cache */
> #endif

That's identical in the mach-bcm2709 version too. However, 
arch/arm/mach-bcm270[89]/Kconfig's entry for that config option:

> config BCM2708_NOL2CACHE
> 	bool "Videocore L2 cache disable"
> 	depends on MACH_BCM2709
> 	default y
> 	help
> 	Do not allow ARM to use GPU's L2 cache. Requires disable_l2cache in config.txt.

Has "default n" for the bcm2708 version and "default y" for the bcm2709 
version. If I'd noticed that difference in default value, it would have 
been a big clue that what I proposed in the U-Boot patch was correct! 
Anyway, this implies that there are separate kernel binaries for the RPi 
1 and RPi 2, since otherwise those default values wouldn't work.

2)

I assume the SDHCI controller (RPi SD card, CM eMMC) is affected by this 
just as much; we need to use bus addresses not ARM physical addresses 
when programming any DMA there?

Perhaps this would explain why I had issues with the eMMC on the CM (I 
think only in the kernel though, whereas U-Boot may have been fine; I'll 
have to check)

...
> So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a
> 128M L2 cache. The GPU's L2 cache is accessible from the ARM but it's
> not particularly close (i.e. not very fast).
> However mapping through the L2 allocating alias (0x4) was shown to be
> beneficial on 2835, so that is the alias we use.
>
> The situation is different on 2836. The ARM has a 32K L1 cache and a
> 512M integrated/fast L2 cache. Additionally going through the
> smaller/slower GPU L2 is bad for performance.
> So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache.

I assume 128M and 512M there should be 128K and 512K?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations
  2015-03-17 17:29         ` Stephen Warren
@ 2015-03-17 17:53           ` popcorn mix
  0 siblings, 0 replies; 15+ messages in thread
From: popcorn mix @ 2015-03-17 17:53 UTC (permalink / raw)
  To: u-boot

On 17/03/15 17:29, Stephen Warren wrote:
> Do the RPi 1 and RPi 2 use different kernel binaries in the RPi Foundation's images? I'd assumed there was a single unified binary which supported both. The reason I ask is that I see:

We ship separate kernel binaries (kernel.img for 2835 and kernel7.img for 2836).
kernel.img is built from bcmrpi_defconfig, and kernel7.img is built from bcm2709_defconfig

A single unified binary would sure be nice, but I think we have too many non-device-tree drivers in our kernel and not enough experience to make this happen easily.
It's certainly a desirable goal (as it moving closer to the upstream mach-2835 kernel).

> I assume the SDHCI controller (RPi SD card, CM eMMC) is affected by this just as much; we need to use bus addresses not ARM physical addresses when programming any DMA there?

Yes. Any address given to the DMA controller should be a bus address.
Similarly any address exchanged with the GPU (e.g. framebuffer address from mailbox interface) should be a bus address.

> Perhaps this would explain why I had issues with the eMMC on the CM (I think only in the kernel though, whereas U-Boot may have been fine; I'll have to check)

Using physical addresses when bus addresses are required can almost work, but with intermittent failure cases, so yes that sounds possible.

> I assume 128M and 512M there should be 128K and 512K?

Yes, quite right.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-03-17 17:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-13  6:13 [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations Stephen Warren
2015-03-13 14:30 ` Marek Vasut
2015-03-13 16:35   ` Stephen Warren
2015-03-13 18:13     ` Marek Vasut
2015-03-13 18:39       ` Stephen Warren
2015-03-13 18:49         ` Marek Vasut
2015-03-13 17:02 ` Eric Anholt
2015-03-15 16:04 ` Stephen Warren
2015-03-15 18:20   ` Marek Vasut
2015-03-17  3:04     ` Stephen Warren
2015-03-17 14:57       ` popcorn mix
2015-03-17 17:29         ` Stephen Warren
2015-03-17 17:53           ` popcorn mix
2015-03-15 16:51 ` Stephen Warren
2015-03-15 18:20   ` Marek Vasut

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.