All of lore.kernel.org
 help / color / mirror / Atom feed
* 32-bit DMA limit for devices (and drivers)
@ 2021-04-30 11:21 Andre Przywara
  2021-04-30 12:02 ` Mark Kettenis
  2021-05-01 11:45 ` Bin Meng
  0 siblings, 2 replies; 8+ messages in thread
From: Andre Przywara @ 2021-04-30 11:21 UTC (permalink / raw)
  To: u-boot

Hi,

We now see the first Allwinner devices [1] having DRAM located above
4GB in address space (4GB DRAM starting at 1GB). After one fix[2]
this works somewhat fine, but the sun8i-emac network device is still
limited to 32-bit DMA addresses. With U-Boot relocating itself (plus
stack and heap) to the end of DRAM, it now runs completely beyond 4GB
on those machines, so not giving pure 32-bit addresses for buffers
anymore.
In Linux we handle this easily by just keeping the default DMA
mask at 32 bits, and letting the DMA framework deal with the nasty
details.

I was wondering how this should be handled in U-Boot? The straight
forward solution would be:
- Let the driver allocate the RX and TX buffers separately, placing them
  below 4GB in the address space (using lmb_reserve(), I guess?)
- Use those RX buffers and hand the addresses back to the upper layers.
- We already copy TX packets, so this would also be covered, in this
  situation. Other drivers might need to introduce copying.

This sounds like a common problem, so I was wondering if there is a
more generic solution to this? Maybe there are already platforms or
devices affected? Or should the whole heap and stack be moved below 4GB
(if this is easily possible)?
In our case we make the buffers part of our priv struct, so should
there be an option to let the priv_auto allocation come from below 4GB?

Grateful for any input on this!

Thanks!
Andre

[1] https://linux-sunxi.org/X96_Mate
[2] https://lists.denx.de/pipermail/u-boot/2021-April/448327.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 32-bit DMA limit for devices (and drivers)
  2021-04-30 11:21 32-bit DMA limit for devices (and drivers) Andre Przywara
@ 2021-04-30 12:02 ` Mark Kettenis
  2021-04-30 13:34   ` Andre Przywara
  2021-05-01 11:45 ` Bin Meng
  1 sibling, 1 reply; 8+ messages in thread
From: Mark Kettenis @ 2021-04-30 12:02 UTC (permalink / raw)
  To: u-boot

> Date: Fri, 30 Apr 2021 12:21:21 +0100
> From: Andre Przywara <andre.przywara@arm.com>
> 
> Hi,
> 
> We now see the first Allwinner devices [1] having DRAM located above
> 4GB in address space (4GB DRAM starting at 1GB). After one fix[2]
> this works somewhat fine, but the sun8i-emac network device is still
> limited to 32-bit DMA addresses. With U-Boot relocating itself (plus
> stack and heap) to the end of DRAM, it now runs completely beyond 4GB
> on those machines, so not giving pure 32-bit addresses for buffers
> anymore.
> In Linux we handle this easily by just keeping the default DMA
> mask at 32 bits, and letting the DMA framework deal with the nasty
> details.
> 
> I was wondering how this should be handled in U-Boot? The straight
> forward solution would be:
> - Let the driver allocate the RX and TX buffers separately, placing them
>   below 4GB in the address space (using lmb_reserve(), I guess?)
> - Use those RX buffers and hand the addresses back to the upper layers.
> - We already copy TX packets, so this would also be covered, in this
>   situation. Other drivers might need to introduce copying.

What you describe here is called a bounce buffer approach.  I believe
Linux developers also refer to this as swiotlb.

> This sounds like a common problem, so I was wondering if there is a
> more generic solution to this? Maybe there are already platforms or
> devices affected? Or should the whole heap and stack be moved below 4GB
> (if this is easily possible)?
> In our case we make the buffers part of our priv struct, so should
> there be an option to let the priv_auto allocation come from below 4GB?
> 
> Grateful for any input on this!

I looked into this a bit when I was trying to figure out what to do on
Apple M1 systems where I have a somewhat related issue.  These systems
have an IOMMU that can't be bypassed.  Since I don't want to add IOMMU
infrastructure to U-Boot, I set up the IOMMU to map a fixed block of
physical memory and make sure that all allocations of memory come from
that block of memory.  In this case this is fairly easy to achieve.
U-Boot allocates memory from the top of usable memory, so as long as I
let the IOMMU map that high memory, things work.  U-Boot doesn't need
a lot of memory, so a block of 512MB is more than sufficient.

In your case this means that as long as you set the top of usable
memory to an address < 4G, U-Boot itself should be fine and no bounce
buffers are needed.  You have to make sure the addresses in the U-Boot
environment for loading things like the kernel and the FDT are set to
an address < 4G as well.

For EFI things are different though.  You want to expose all physical
memory in the EFI memory map.  This means that an EFI application
(such as an OS loader) may pick memory > 4G and use it to do I/O.  For
this purpose U-Boot already implements bounce buffers.  See the
CONFIG_EFI_LOADER_BOUNCE_BUFFER option.

Hope that helps!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 32-bit DMA limit for devices (and drivers)
  2021-04-30 12:02 ` Mark Kettenis
@ 2021-04-30 13:34   ` Andre Przywara
  2021-04-30 16:31     ` Jernej Škrabec
  0 siblings, 1 reply; 8+ messages in thread
From: Andre Przywara @ 2021-04-30 13:34 UTC (permalink / raw)
  To: u-boot

On Fri, 30 Apr 2021 14:02:52 +0200 (CEST)
Mark Kettenis <mark.kettenis@xs4all.nl> wrote:

Hi Mark,

thanks for the reply!

(CC:ing Alex and Heinrich for the UEFI questions below)

> > Date: Fri, 30 Apr 2021 12:21:21 +0100
> > From: Andre Przywara <andre.przywara@arm.com>
> > 
> > Hi,
> > 
> > We now see the first Allwinner devices [1] having DRAM located above
> > 4GB in address space (4GB DRAM starting at 1GB). After one fix[2]
> > this works somewhat fine, but the sun8i-emac network device is still
> > limited to 32-bit DMA addresses. With U-Boot relocating itself (plus
> > stack and heap) to the end of DRAM, it now runs completely beyond 4GB
> > on those machines, so not giving pure 32-bit addresses for buffers
> > anymore.
> > In Linux we handle this easily by just keeping the default DMA
> > mask at 32 bits, and letting the DMA framework deal with the nasty
> > details.
> > 
> > I was wondering how this should be handled in U-Boot? The straight
> > forward solution would be:
> > - Let the driver allocate the RX and TX buffers separately, placing them
> >   below 4GB in the address space (using lmb_reserve(), I guess?)
> > - Use those RX buffers and hand the addresses back to the upper layers.
> > - We already copy TX packets, so this would also be covered, in this
> >   situation. Other drivers might need to introduce copying.  
> 
> What you describe here is called a bounce buffer approach.  I believe
> Linux developers also refer to this as swiotlb.

Yes, but it's not entirely the same as bounce buffering in Linux,
since we allocate the buffers ourselves, in the driver, so we have full
control over it. The problem I face is that malloc() works on the heap
(which is high), or we use the automatic priv_alloc mechanism, which
uses the heap as well, IIUC.

> > This sounds like a common problem, so I was wondering if there is a
> > more generic solution to this? Maybe there are already platforms or
> > devices affected? Or should the whole heap and stack be moved below 4GB
> > (if this is easily possible)?
> > In our case we make the buffers part of our priv struct, so should
> > there be an option to let the priv_auto allocation come from below 4GB?
> > 
> > Grateful for any input on this!  
> 
> I looked into this a bit when I was trying to figure out what to do on
> Apple M1 systems where I have a somewhat related issue.  These systems
> have an IOMMU that can't be bypassed.  Since I don't want to add IOMMU
> infrastructure to U-Boot, I set up the IOMMU to map a fixed block of
> physical memory and make sure that all allocations of memory come from
> that block of memory.  In this case this is fairly easy to achieve.
> U-Boot allocates memory from the top of usable memory, so as long as I
> let the IOMMU map that high memory, things work.  U-Boot doesn't need
> a lot of memory, so a block of 512MB is more than sufficient.

I'd rather not play around with the visible memory size (see below).
And while technically there is a (scatter/gather) IOMMU in the SoC, it
would be too big guns for that small problem.

> In your case this means that as long as you set the top of usable
> memory to an address < 4G, U-Boot itself should be fine and no bounce
> buffers are needed.  You have to make sure the addresses in the U-Boot
> environment for loading things like the kernel and the FDT are set to
> an address < 4G as well.
> 
> For EFI things are different though.  You want to expose all physical
> memory in the EFI memory map.

Not only for UEFI, since U-Boot populates the DT memory node even for
booti/bootm, in arch/arm/lib/bootm-fdt.c:arch_fixup_fdt().
So limiting the memory is not an option, since this would be passed on
to the OS.

> This means that an EFI application
> (such as an OS loader) may pick memory > 4G and use it to do I/O.

I think we should be safe here, as the driver has full control over the
buffers: For TX we copy already, to use "fire-and-forget", so we
just start the DMA and return. And for RX U-Boot network drivers
return the buffer address, so it's our own buffer again. So wherever
higher layers put the packets, we should be good (given our own buffers
are).


So I guess my question boils down to: How can I best allocate buffers
from "low" memory? And do those buffers carveouts make it into the UEFI
memory map, as reserved regions? Or can UEFI differentiate between
boot services and runtime services allocations? The buffers would be
needed during boot services, for the UEFI network protocol. But later
on they can be abandoned.

> this purpose U-Boot already implements bounce buffers.  See the
> CONFIG_EFI_LOADER_BOUNCE_BUFFER option.

Interesting, thanks, I will have a look at that. Maybe that contains
some useful traces to other code.

Cheers,
Andre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 32-bit DMA limit for devices (and drivers)
  2021-04-30 13:34   ` Andre Przywara
@ 2021-04-30 16:31     ` Jernej Škrabec
  0 siblings, 0 replies; 8+ messages in thread
From: Jernej Škrabec @ 2021-04-30 16:31 UTC (permalink / raw)
  To: u-boot

Hi!

Dne petek, 30. april 2021 ob 15:34:28 CEST je Andre Przywara napisal(a):
> On Fri, 30 Apr 2021 14:02:52 +0200 (CEST)
> Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> 
> Hi Mark,
> 
> thanks for the reply!
> 
> (CC:ing Alex and Heinrich for the UEFI questions below)
> 
> > > Date: Fri, 30 Apr 2021 12:21:21 +0100
> > > From: Andre Przywara <andre.przywara@arm.com>
> > > 
> > > Hi,
> > > 
> > > We now see the first Allwinner devices [1] having DRAM located above
> > > 4GB in address space (4GB DRAM starting at 1GB). After one fix[2]
> > > this works somewhat fine, but the sun8i-emac network device is still
> > > limited to 32-bit DMA addresses. With U-Boot relocating itself (plus
> > > stack and heap) to the end of DRAM, it now runs completely beyond 4GB
> > > on those machines, so not giving pure 32-bit addresses for buffers
> > > anymore.
> > > In Linux we handle this easily by just keeping the default DMA
> > > mask at 32 bits, and letting the DMA framework deal with the nasty
> > > details.
> > > 
> > > I was wondering how this should be handled in U-Boot? The straight
> > > forward solution would be:
> > > - Let the driver allocate the RX and TX buffers separately, placing them
> > > 
> > >   below 4GB in the address space (using lmb_reserve(), I guess?)
> > > 
> > > - Use those RX buffers and hand the addresses back to the upper layers.
> > > - We already copy TX packets, so this would also be covered, in this
> > > 
> > >   situation. Other drivers might need to introduce copying.
> > 
> > What you describe here is called a bounce buffer approach.  I believe
> > Linux developers also refer to this as swiotlb.
> 
> Yes, but it's not entirely the same as bounce buffering in Linux,
> since we allocate the buffers ourselves, in the driver, so we have full
> control over it. The problem I face is that malloc() works on the heap
> (which is high), or we use the automatic priv_alloc mechanism, which
> uses the heap as well, IIUC.
> 
> > > This sounds like a common problem, so I was wondering if there is a
> > > more generic solution to this? Maybe there are already platforms or
> > > devices affected? Or should the whole heap and stack be moved below 4GB
> > > (if this is easily possible)?
> > > In our case we make the buffers part of our priv struct, so should
> > > there be an option to let the priv_auto allocation come from below 4GB?
> > > 
> > > Grateful for any input on this!
> > 
> > I looked into this a bit when I was trying to figure out what to do on
> > Apple M1 systems where I have a somewhat related issue.  These systems
> > have an IOMMU that can't be bypassed.  Since I don't want to add IOMMU
> > infrastructure to U-Boot, I set up the IOMMU to map a fixed block of
> > physical memory and make sure that all allocations of memory come from
> > that block of memory.  In this case this is fairly easy to achieve.
> > U-Boot allocates memory from the top of usable memory, so as long as I
> > let the IOMMU map that high memory, things work.  U-Boot doesn't need
> > a lot of memory, so a block of 512MB is more than sufficient.
> 
> I'd rather not play around with the visible memory size (see below).
> And while technically there is a (scatter/gather) IOMMU in the SoC, it
> would be too big guns for that small problem.

IOMMU is connected only to video related cores, so it's not an option here.

Best regards,
Jernej

> 
> > In your case this means that as long as you set the top of usable
> > memory to an address < 4G, U-Boot itself should be fine and no bounce
> > buffers are needed.  You have to make sure the addresses in the U-Boot
> > environment for loading things like the kernel and the FDT are set to
> > an address < 4G as well.
> > 
> > For EFI things are different though.  You want to expose all physical
> > memory in the EFI memory map.
> 
> Not only for UEFI, since U-Boot populates the DT memory node even for
> booti/bootm, in arch/arm/lib/bootm-fdt.c:arch_fixup_fdt().
> So limiting the memory is not an option, since this would be passed on
> to the OS.
> 
> > This means that an EFI application
> > (such as an OS loader) may pick memory > 4G and use it to do I/O.
> 
> I think we should be safe here, as the driver has full control over the
> buffers: For TX we copy already, to use "fire-and-forget", so we
> just start the DMA and return. And for RX U-Boot network drivers
> return the buffer address, so it's our own buffer again. So wherever
> higher layers put the packets, we should be good (given our own buffers
> are).
> 
> 
> So I guess my question boils down to: How can I best allocate buffers
> from "low" memory? And do those buffers carveouts make it into the UEFI
> memory map, as reserved regions? Or can UEFI differentiate between
> boot services and runtime services allocations? The buffers would be
> needed during boot services, for the UEFI network protocol. But later
> on they can be abandoned.
> 
> > this purpose U-Boot already implements bounce buffers.  See the
> > CONFIG_EFI_LOADER_BOUNCE_BUFFER option.
> 
> Interesting, thanks, I will have a look at that. Maybe that contains
> some useful traces to other code.
> 
> Cheers,
> Andre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 32-bit DMA limit for devices (and drivers)
  2021-04-30 11:21 32-bit DMA limit for devices (and drivers) Andre Przywara
  2021-04-30 12:02 ` Mark Kettenis
@ 2021-05-01 11:45 ` Bin Meng
  2021-05-01 12:23   ` Mark Kettenis
  1 sibling, 1 reply; 8+ messages in thread
From: Bin Meng @ 2021-05-01 11:45 UTC (permalink / raw)
  To: u-boot

On Fri, Apr 30, 2021 at 7:22 PM Andre Przywara <andre.przywara@arm.com> wrote:
>
> Hi,
>
> We now see the first Allwinner devices [1] having DRAM located above
> 4GB in address space (4GB DRAM starting at 1GB). After one fix[2]
> this works somewhat fine, but the sun8i-emac network device is still
> limited to 32-bit DMA addresses. With U-Boot relocating itself (plus
> stack and heap) to the end of DRAM, it now runs completely beyond 4GB
> on those machines, so not giving pure 32-bit addresses for buffers
> anymore.
> In Linux we handle this easily by just keeping the default DMA
> mask at 32 bits, and letting the DMA framework deal with the nasty
> details.
>
> I was wondering how this should be handled in U-Boot? The straight
> forward solution would be:
> - Let the driver allocate the RX and TX buffers separately, placing them
>   below 4GB in the address space (using lmb_reserve(), I guess?)
> - Use those RX buffers and hand the addresses back to the upper layers.
> - We already copy TX packets, so this would also be covered, in this
>   situation. Other drivers might need to introduce copying.
>
> This sounds like a common problem, so I was wondering if there is a
> more generic solution to this? Maybe there are already platforms or
> devices affected? Or should the whole heap and stack be moved below 4GB
> (if this is easily possible)?

My understanding is that the relocated address of U-Boot should be
below 4GB then there is no problem for the 32-bit DMA. I thought this
is a rule to be followed by every board, but this is not the case on
your board?

> In our case we make the buffers part of our priv struct, so should
> there be an option to let the priv_auto allocation come from below 4GB?
>
> Grateful for any input on this!

Regards,
Bin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 32-bit DMA limit for devices (and drivers)
  2021-05-01 11:45 ` Bin Meng
@ 2021-05-01 12:23   ` Mark Kettenis
  2021-05-02  0:21     ` Andre Przywara
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Kettenis @ 2021-05-01 12:23 UTC (permalink / raw)
  To: u-boot

> From: Bin Meng <bmeng.cn@gmail.com>
> Date: Sat, 1 May 2021 19:45:02 +0800
> 
> On Fri, Apr 30, 2021 at 7:22 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >
> > Hi,
> >
> > We now see the first Allwinner devices [1] having DRAM located above
> > 4GB in address space (4GB DRAM starting at 1GB). After one fix[2]
> > this works somewhat fine, but the sun8i-emac network device is still
> > limited to 32-bit DMA addresses. With U-Boot relocating itself (plus
> > stack and heap) to the end of DRAM, it now runs completely beyond 4GB
> > on those machines, so not giving pure 32-bit addresses for buffers
> > anymore.
> > In Linux we handle this easily by just keeping the default DMA
> > mask at 32 bits, and letting the DMA framework deal with the nasty
> > details.
> >
> > I was wondering how this should be handled in U-Boot? The straight
> > forward solution would be:
> > - Let the driver allocate the RX and TX buffers separately, placing them
> >   below 4GB in the address space (using lmb_reserve(), I guess?)
> > - Use those RX buffers and hand the addresses back to the upper layers.
> > - We already copy TX packets, so this would also be covered, in this
> >   situation. Other drivers might need to introduce copying.
> >
> > This sounds like a common problem, so I was wondering if there is a
> > more generic solution to this? Maybe there are already platforms or
> > devices affected? Or should the whole heap and stack be moved below 4GB
> > (if this is easily possible)?
> 
> My understanding is that the relocated address of U-Boot should be
> below 4GB then there is no problem for the 32-bit DMA. I thought this
> is a rule to be followed by every board, but this is not the case on
> your board?

Yes, that was my impression as well.  And I think that would work fine
on this board as there is plenty of DRAM below 4GB.  And this can be
achieved by implementing the board_get_usable_ram_top() function.

As I indicated in my reply, some care is needed in the EFI subsystem,
but there already is a solution for that.  There is
CONFIG_EFI_LOADER_BOUNCE_BUFFER, but that might not actually be needed
in this case.  By default the EFI subsystem will mark all conventional
memory above "ram_top" as EFI_BOOT_SERVICES_DATA.  So EFI applications
uch as OS loaders will not allocate that memory until they've called
ExitBootServices() at which point U-Boot will be completely out of the
picture.

> > In our case we make the buffers part of our priv struct, so should
> > there be an option to let the priv_auto allocation come from below 4GB?
> >
> > Grateful for any input on this!
> 
> Regards,
> Bin
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 32-bit DMA limit for devices (and drivers)
  2021-05-01 12:23   ` Mark Kettenis
@ 2021-05-02  0:21     ` Andre Przywara
  2021-05-02  0:30       ` Bin Meng
  0 siblings, 1 reply; 8+ messages in thread
From: Andre Przywara @ 2021-05-02  0:21 UTC (permalink / raw)
  To: u-boot

On Sat, 1 May 2021 14:23:32 +0200 (CEST)
Mark Kettenis <mark.kettenis@xs4all.nl> wrote:

Hi,

> > From: Bin Meng <bmeng.cn@gmail.com>
> > Date: Sat, 1 May 2021 19:45:02 +0800
> > 
> > On Fri, Apr 30, 2021 at 7:22 PM Andre Przywara <andre.przywara@arm.com> wrote:  
> > >
> > > Hi,
> > >
> > > We now see the first Allwinner devices [1] having DRAM located above
> > > 4GB in address space (4GB DRAM starting at 1GB). After one fix[2]
> > > this works somewhat fine, but the sun8i-emac network device is still
> > > limited to 32-bit DMA addresses. With U-Boot relocating itself (plus
> > > stack and heap) to the end of DRAM, it now runs completely beyond 4GB
> > > on those machines, so not giving pure 32-bit addresses for buffers
> > > anymore.
> > > In Linux we handle this easily by just keeping the default DMA
> > > mask at 32 bits, and letting the DMA framework deal with the nasty
> > > details.
> > >
> > > I was wondering how this should be handled in U-Boot? The straight
> > > forward solution would be:
> > > - Let the driver allocate the RX and TX buffers separately, placing them
> > >   below 4GB in the address space (using lmb_reserve(), I guess?)
> > > - Use those RX buffers and hand the addresses back to the upper layers.
> > > - We already copy TX packets, so this would also be covered, in this
> > >   situation. Other drivers might need to introduce copying.
> > >
> > > This sounds like a common problem, so I was wondering if there is a
> > > more generic solution to this? Maybe there are already platforms or
> > > devices affected? Or should the whole heap and stack be moved below 4GB
> > > (if this is easily possible)?  
> > 
> > My understanding is that the relocated address of U-Boot should be
> > below 4GB then there is no problem for the 32-bit DMA. I thought this
> > is a rule to be followed by every board, but this is not the case on
> > your board?

Bin, interesting, where is this coming from? Was this originally for
32-bit CPUs with some address extension (PAE/LPAE)? I think on *sane*
64-bit systems there would be no need for this restriction, except maybe
for this 32-bit DMA limitation (which is more of a device problem).

> Yes, that was my impression as well.  And I think that would work fine
> on this board as there is plenty of DRAM below 4GB.  And this can be
> achieved by implementing the board_get_usable_ram_top() function.

Ah, I think this is the thing I missed and was looking for:
So we *can* restrict everything *U-Boot* to 32 bits and save us a lot of
hassle.

Thanks for that hint!
 
> As I indicated in my reply, some care is needed in the EFI subsystem,
> but there already is a solution for that.  There is
> CONFIG_EFI_LOADER_BOUNCE_BUFFER, but that might not actually be needed
> in this case.  By default the EFI subsystem will mark all conventional
> memory above "ram_top" as EFI_BOOT_SERVICES_DATA.  So EFI applications
> uch as OS loaders will not allocate that memory until they've called
> ExitBootServices() at which point U-Boot will be completely out of the
> picture.

Oh nice, this looks like what I need. So EFI apps would never use this
memory for I/O buffers.

So I gave this a try and this solves my problem quite neatly: Linux
sees the full DRAM, but U-Boot never touches anything beyond 4GB.
Briefly tested Linux with both EFI and booti.
Will include the board_get_usable_ram_top() implementation in the v2 of
my 4GB enablement patch.

Thanks again!

Cheers,
Andre

> 
> > > In our case we make the buffers part of our priv struct, so should
> > > there be an option to let the priv_auto allocation come from below 4GB?
> > >
> > > Grateful for any input on this!  
> > 
> > Regards,
> > Bin
> >   

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 32-bit DMA limit for devices (and drivers)
  2021-05-02  0:21     ` Andre Przywara
@ 2021-05-02  0:30       ` Bin Meng
  0 siblings, 0 replies; 8+ messages in thread
From: Bin Meng @ 2021-05-02  0:30 UTC (permalink / raw)
  To: u-boot

Hi Andre,

On Sun, May 2, 2021 at 8:22 AM Andre Przywara <andre.przywara@arm.com> wrote:
>
> On Sat, 1 May 2021 14:23:32 +0200 (CEST)
> Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
>
> Hi,
>
> > > From: Bin Meng <bmeng.cn@gmail.com>
> > > Date: Sat, 1 May 2021 19:45:02 +0800
> > >
> > > On Fri, Apr 30, 2021 at 7:22 PM Andre Przywara <andre.przywara@arm.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > We now see the first Allwinner devices [1] having DRAM located above
> > > > 4GB in address space (4GB DRAM starting at 1GB). After one fix[2]
> > > > this works somewhat fine, but the sun8i-emac network device is still
> > > > limited to 32-bit DMA addresses. With U-Boot relocating itself (plus
> > > > stack and heap) to the end of DRAM, it now runs completely beyond 4GB
> > > > on those machines, so not giving pure 32-bit addresses for buffers
> > > > anymore.
> > > > In Linux we handle this easily by just keeping the default DMA
> > > > mask at 32 bits, and letting the DMA framework deal with the nasty
> > > > details.
> > > >
> > > > I was wondering how this should be handled in U-Boot? The straight
> > > > forward solution would be:
> > > > - Let the driver allocate the RX and TX buffers separately, placing them
> > > >   below 4GB in the address space (using lmb_reserve(), I guess?)
> > > > - Use those RX buffers and hand the addresses back to the upper layers.
> > > > - We already copy TX packets, so this would also be covered, in this
> > > >   situation. Other drivers might need to introduce copying.
> > > >
> > > > This sounds like a common problem, so I was wondering if there is a
> > > > more generic solution to this? Maybe there are already platforms or
> > > > devices affected? Or should the whole heap and stack be moved below 4GB
> > > > (if this is easily possible)?
> > >
> > > My understanding is that the relocated address of U-Boot should be
> > > below 4GB then there is no problem for the 32-bit DMA. I thought this
> > > is a rule to be followed by every board, but this is not the case on
> > > your board?
>
> Bin, interesting, where is this coming from? Was this originally for
> 32-bit CPUs with some address extension (PAE/LPAE)? I think on *sane*
> 64-bit systems there would be no need for this restriction, except maybe
> for this 32-bit DMA limitation (which is more of a device problem).

Please have a look at x86 and riscv target codes
board_get_usable_ram_top() which limits the relocated address below
4G. I remember U-Boot shell does not support parsing 64-bit digits
too.

>
> > Yes, that was my impression as well.  And I think that would work fine
> > on this board as there is plenty of DRAM below 4GB.  And this can be
> > achieved by implementing the board_get_usable_ram_top() function.
>
> Ah, I think this is the thing I missed and was looking for:
> So we *can* restrict everything *U-Boot* to 32 bits and save us a lot of
> hassle.
>
> Thanks for that hint!
>
> > As I indicated in my reply, some care is needed in the EFI subsystem,
> > but there already is a solution for that.  There is
> > CONFIG_EFI_LOADER_BOUNCE_BUFFER, but that might not actually be needed
> > in this case.  By default the EFI subsystem will mark all conventional
> > memory above "ram_top" as EFI_BOOT_SERVICES_DATA.  So EFI applications
> > uch as OS loaders will not allocate that memory until they've called
> > ExitBootServices() at which point U-Boot will be completely out of the
> > picture.
>
> Oh nice, this looks like what I need. So EFI apps would never use this
> memory for I/O buffers.
>
> So I gave this a try and this solves my problem quite neatly: Linux
> sees the full DRAM, but U-Boot never touches anything beyond 4GB.
> Briefly tested Linux with both EFI and booti.
> Will include the board_get_usable_ram_top() implementation in the v2 of
> my 4GB enablement patch.
>
> Thanks again!

Regards,
Bin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-02  0:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 11:21 32-bit DMA limit for devices (and drivers) Andre Przywara
2021-04-30 12:02 ` Mark Kettenis
2021-04-30 13:34   ` Andre Przywara
2021-04-30 16:31     ` Jernej Škrabec
2021-05-01 11:45 ` Bin Meng
2021-05-01 12:23   ` Mark Kettenis
2021-05-02  0:21     ` Andre Przywara
2021-05-02  0:30       ` Bin Meng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.