From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thierry Reding Date: Thu, 21 Aug 2014 16:11:22 +0200 Subject: [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues In-Reply-To: <53F4F314.6070101@wwwdotorg.org> References: <1408348852-30894-1-git-send-email-thierry.reding@gmail.com> <53F4F314.6070101@wwwdotorg.org> Message-ID: <20140821141121.GE19293@ulmo.nvidia.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On Wed, Aug 20, 2014 at 01:12:20PM -0600, Stephen Warren wrote: > On 08/18/2014 02:00 AM, Thierry Reding wrote: > >From: Thierry Reding > > > >This series attempts to fix a long-standing problem in the rtl8169 driver > >(though the same problem may exist in other drivers as well). Let me first > >explain what exactly the issue is: > > > >The rtl8169 driver provides a set of RX and TX descriptors for the device to > >use. Once they're set up, the device is told about their location so that it > >can fetch the descriptors using DMA. The device will also write packet state > >back into these descriptors using DMA. For this to work properly, whenever a > >driver needs to access these descriptors it needs to invalidate the D-cache > >line(s) associated with them. Similarly when changes to the descriptor have > >been made by the driver, the cache lines need to be flushed to make sure the > >changes are visible to the device. > > > >The descriptors are 16 bytes in size. This causes problems when used on CPUs > >that have a cache-line size that is larger than 16 bytes. One example is the > >NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors > >fit into a single cache-line. So whenever the driver flushes a cache-line it > >has the potential to discard changes made to another descriptor by the DMA > >device. One typical symptom is that large transfers over TFTP will often not > >complete and hang somewhere midway because a device marked a packet received > >but the driver flushing the cache and causing the packet to be lost. > > > >Since the descriptors need to be consecutive in memory, I don't see a way to > >fix this other than to use uncached memory. Therefore the solution proposed > >in this patch series is to introduce a mechanism in U-Boot to allow a driver > >to allocate from a pool of uncached memory. Currently an implementation is > >provided only for ARM v7. The idea is that a region (of user-definable size) > >immediately below (taking into account architecture-specific alignment > >restrictions) the malloc() area is mapped uncacheable in the MMU. A driver > >can use the new noncached_alloc() function to allocate a chunk of memory > >from this pool dynamically for buffers that it can't or doesn't want to do > >any explicit cache-maintainance on, yet needs to be shared with DMA devices. > > > >Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style > >issues in the ARM v7 cache code and patch 2 uses more future-proof types for > >the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely > >for debugging purposes. It will print out the region used by malloc() when > >DEBUG is enabled. This can be useful to see where the malloc() region is in > >the memory map (compared to the noncached region introduced in a later patch > >for example). > > > >Patch 4 implements the noncached API for ARM v7. It obtains the start of the > >malloc() area and places the noncached region immediately below it so that > >noncached_alloc() can allocate from it. During boot, the noncached area will > >be set up immediately after malloc(). > > > >Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk > >which should be plenty (it's also the minimum on ARM v7 because it matches > >the MMU section size and therefore the granularity at which U-Boot can set > >the cacheable attributes). > > If LPAE were to be enabled, the minimum would be 2MiB, but I suppose we can > deal with that if/when the time comes. The code that sets up the noncached memory region will pad the size to a multiple of the section size, and if LPAE were enabled I'd expect the section size to be defined appropriately, too. It would still mean that Tegra configured it to be 1 MiB but the code actually reserving 2 MiB of addresses, but that's explicitly allowed in the documentation. Thierry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: