[U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues

* [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues
@ 2014-08-18  8:00 Thierry Reding
  2014-08-18  8:00 ` [U-Boot] [PATCH 1/9] ARM: cache_v7: Various minor cleanups Thierry Reding
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

This series attempts to fix a long-standing problem in the rtl8169 driver
(though the same problem may exist in other drivers as well). Let me first
explain what exactly the issue is:

The rtl8169 driver provides a set of RX and TX descriptors for the device to
use. Once they're set up, the device is told about their location so that it
can fetch the descriptors using DMA. The device will also write packet state
back into these descriptors using DMA. For this to work properly, whenever a
driver needs to access these descriptors it needs to invalidate the D-cache
line(s) associated with them. Similarly when changes to the descriptor have
been made by the driver, the cache lines need to be flushed to make sure the
changes are visible to the device.

The descriptors are 16 bytes in size. This causes problems when used on CPUs
that have a cache-line size that is larger than 16 bytes. One example is the
NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors
fit into a single cache-line. So whenever the driver flushes a cache-line it
has the potential to discard changes made to another descriptor by the DMA
device. One typical symptom is that large transfers over TFTP will often not
complete and hang somewhere midway because a device marked a packet received
but the driver flushing the cache and causing the packet to be lost.

Since the descriptors need to be consecutive in memory, I don't see a way to
fix this other than to use uncached memory. Therefore the solution proposed
in this patch series is to introduce a mechanism in U-Boot to allow a driver
to allocate from a pool of uncached memory. Currently an implementation is
provided only for ARM v7. The idea is that a region (of user-definable size)
immediately below (taking into account architecture-specific alignment
restrictions) the malloc() area is mapped uncacheable in the MMU. A driver
can use the new noncached_alloc() function to allocate a chunk of memory
from this pool dynamically for buffers that it can't or doesn't want to do
any explicit cache-maintainance on, yet needs to be shared with DMA devices.

Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style
issues in the ARM v7 cache code and patch 2 uses more future-proof types for
the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely
for debugging purposes. It will print out the region used by malloc() when
DEBUG is enabled. This can be useful to see where the malloc() region is in
the memory map (compared to the noncached region introduced in a later patch
for example).

Patch 4 implements the noncached API for ARM v7. It obtains the start of the
malloc() area and places the noncached region immediately below it so that
noncached_alloc() can allocate from it. During boot, the noncached area will
be set up immediately after malloc().

Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk
which should be plenty (it's also the minimum on ARM v7 because it matches
the MMU section size and therefore the granularity at which U-Boot can set
the cacheable attributes).

Patch 6 is not really related but just something I stumbled across when going
through the code. According to the top-level README file, network drivers are
supposed to respect the CONFIG_SYS_RX_ETH_BUFFER. rtl8169 doesn't currently
do that, so this patch fixes it.

Patch 7 is the result of earlier rework that still aimed at solving the
problem using explicit cache maintenance. rtl8169 hardware requires buffers
to be aligned to 256 byte boundaries. The rtl8169 driver used to employ some
trickery to make that work, but nowadays there are macros that can be used to
the same effect, so this patch uses them and gets rid of the custom trickery.
This patch also prints out a warning if it detects a potential caching issue
(i.e. ARCH_DMA_MINALIGN > sizeof(struct RxDesc)).

Patch 8 finally adds optional support for non-cached memory. When available
the driver will now use the noncached API to obtain uncached buffers for the
RX and TX descriptor rings. At the same time the cache-maintenance functions
for the RX and TX descriptors become no-ops so that the code can work with
or without the noncached API available.

With all of the above in place, patch 9 adds support for RTL-8168/8111g as
found on the NVIDIA Jetson TK1 board (which has a Tegra124 SoC).

Note that this series also fixes the sporadic hangs of large TFTP transfers
for earlier SoC generations of Tegra (Tegra20 and Tegra30), though they were
less frequent there, probably caused by the cache-lines being 32 bytes rather
than 64.

Thierry

Thierry Reding (9):
  ARM: cache_v7: Various minor cleanups
  ARM: cache-cp15: Use unsigned long for address and size
  malloc: Output region when debugging
  ARM: Implement non-cached memory support
  ARM: tegra: Enable non-cached memory
  net: rtl8169: Honor CONFIG_SYS_RX_ETH_BUFFER
  net: rtl8169: Properly align buffers
  net: rtl8169: Use non-cached memory if available
  net: rtl8169: Add support for RTL-8168/8111g

 README                         |  16 ++++++
 arch/arm/cpu/armv7/cache_v7.c  |  14 +++---
 arch/arm/include/asm/system.h  |   7 ++-
 arch/arm/lib/cache-cp15.c      |   6 +--
 arch/arm/lib/cache.c           |  41 +++++++++++++++
 common/board_r.c               |  11 +++++
 common/dlmalloc.c              |   3 ++
 drivers/net/rtl8169.c          | 110 ++++++++++++++++++++++++++++++-----------
 include/configs/tegra-common.h |   1 +
 9 files changed, 168 insertions(+), 41 deletions(-)

-- 
2.0.4

^ permalink raw reply	[flat|nested] 25+ messages in thread