From mboxrd@z Thu Jan  1 00:00:00 1970
From: Thierry Reding <thierry.reding@gmail.com>
Date: Thu, 21 Aug 2014 16:11:22 +0200
Subject: [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues
In-Reply-To: <53F4F314.6070101@wwwdotorg.org>
References: <1408348852-30894-1-git-send-email-thierry.reding@gmail.com>
	<53F4F314.6070101@wwwdotorg.org>
Message-ID: <20140821141121.GE19293@ulmo.nvidia.com>
List-Id: <u-boot.lists.denx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: u-boot@lists.denx.de

On Wed, Aug 20, 2014 at 01:12:20PM -0600, Stephen Warren wrote:
> On 08/18/2014 02:00 AM, Thierry Reding wrote:
> >From: Thierry Reding <treding@nvidia.com>
> >
> >This series attempts to fix a long-standing problem in the rtl8169 driver
> >(though the same problem may exist in other drivers as well). Let me first
> >explain what exactly the issue is:
> >
> >The rtl8169 driver provides a set of RX and TX descriptors for the device to
> >use. Once they're set up, the device is told about their location so that it
> >can fetch the descriptors using DMA. The device will also write packet state
> >back into these descriptors using DMA. For this to work properly, whenever a
> >driver needs to access these descriptors it needs to invalidate the D-cache
> >line(s) associated with them. Similarly when changes to the descriptor have
> >been made by the driver, the cache lines need to be flushed to make sure the
> >changes are visible to the device.
> >
> >The descriptors are 16 bytes in size. This causes problems when used on CPUs
> >that have a cache-line size that is larger than 16 bytes. One example is the
> >NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors
> >fit into a single cache-line. So whenever the driver flushes a cache-line it
> >has the potential to discard changes made to another descriptor by the DMA
> >device. One typical symptom is that large transfers over TFTP will often not
> >complete and hang somewhere midway because a device marked a packet received
> >but the driver flushing the cache and causing the packet to be lost.
> >
> >Since the descriptors need to be consecutive in memory, I don't see a way to
> >fix this other than to use uncached memory. Therefore the solution proposed
> >in this patch series is to introduce a mechanism in U-Boot to allow a driver
> >to allocate from a pool of uncached memory. Currently an implementation is
> >provided only for ARM v7. The idea is that a region (of user-definable size)
> >immediately below (taking into account architecture-specific alignment
> >restrictions) the malloc() area is mapped uncacheable in the MMU. A driver
> >can use the new noncached_alloc() function to allocate a chunk of memory
> >from this pool dynamically for buffers that it can't or doesn't want to do
> >any explicit cache-maintainance on, yet needs to be shared with DMA devices.
> >
> >Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style
> >issues in the ARM v7 cache code and patch 2 uses more future-proof types for
> >the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely
> >for debugging purposes. It will print out the region used by malloc() when
> >DEBUG is enabled. This can be useful to see where the malloc() region is in
> >the memory map (compared to the noncached region introduced in a later patch
> >for example).
> >
> >Patch 4 implements the noncached API for ARM v7. It obtains the start of the
> >malloc() area and places the noncached region immediately below it so that
> >noncached_alloc() can allocate from it. During boot, the noncached area will
> >be set up immediately after malloc().
> >
> >Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk
> >which should be plenty (it's also the minimum on ARM v7 because it matches
> >the MMU section size and therefore the granularity at which U-Boot can set
> >the cacheable attributes).
> 
> If LPAE were to be enabled, the minimum would be 2MiB, but I suppose we can
> deal with that if/when the time comes.

The code that sets up the noncached memory region will pad the size to a
multiple of the section size, and if LPAE were enabled I'd expect the
section size to be defined appropriately, too. It would still mean that
Tegra configured it to be 1 MiB but the code actually reserving 2 MiB of
addresses, but that's explicitly allowed in the documentation.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140821/148c4539/attachment.pgp>