All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues
@ 2014-08-18  8:00 Thierry Reding
  2014-08-18  8:00 ` [U-Boot] [PATCH 1/9] ARM: cache_v7: Various minor cleanups Thierry Reding
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

This series attempts to fix a long-standing problem in the rtl8169 driver
(though the same problem may exist in other drivers as well). Let me first
explain what exactly the issue is:

The rtl8169 driver provides a set of RX and TX descriptors for the device to
use. Once they're set up, the device is told about their location so that it
can fetch the descriptors using DMA. The device will also write packet state
back into these descriptors using DMA. For this to work properly, whenever a
driver needs to access these descriptors it needs to invalidate the D-cache
line(s) associated with them. Similarly when changes to the descriptor have
been made by the driver, the cache lines need to be flushed to make sure the
changes are visible to the device.

The descriptors are 16 bytes in size. This causes problems when used on CPUs
that have a cache-line size that is larger than 16 bytes. One example is the
NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors
fit into a single cache-line. So whenever the driver flushes a cache-line it
has the potential to discard changes made to another descriptor by the DMA
device. One typical symptom is that large transfers over TFTP will often not
complete and hang somewhere midway because a device marked a packet received
but the driver flushing the cache and causing the packet to be lost.

Since the descriptors need to be consecutive in memory, I don't see a way to
fix this other than to use uncached memory. Therefore the solution proposed
in this patch series is to introduce a mechanism in U-Boot to allow a driver
to allocate from a pool of uncached memory. Currently an implementation is
provided only for ARM v7. The idea is that a region (of user-definable size)
immediately below (taking into account architecture-specific alignment
restrictions) the malloc() area is mapped uncacheable in the MMU. A driver
can use the new noncached_alloc() function to allocate a chunk of memory
from this pool dynamically for buffers that it can't or doesn't want to do
any explicit cache-maintainance on, yet needs to be shared with DMA devices.

Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style
issues in the ARM v7 cache code and patch 2 uses more future-proof types for
the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely
for debugging purposes. It will print out the region used by malloc() when
DEBUG is enabled. This can be useful to see where the malloc() region is in
the memory map (compared to the noncached region introduced in a later patch
for example).

Patch 4 implements the noncached API for ARM v7. It obtains the start of the
malloc() area and places the noncached region immediately below it so that
noncached_alloc() can allocate from it. During boot, the noncached area will
be set up immediately after malloc().

Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk
which should be plenty (it's also the minimum on ARM v7 because it matches
the MMU section size and therefore the granularity at which U-Boot can set
the cacheable attributes).

Patch 6 is not really related but just something I stumbled across when going
through the code. According to the top-level README file, network drivers are
supposed to respect the CONFIG_SYS_RX_ETH_BUFFER. rtl8169 doesn't currently
do that, so this patch fixes it.

Patch 7 is the result of earlier rework that still aimed at solving the
problem using explicit cache maintenance. rtl8169 hardware requires buffers
to be aligned to 256 byte boundaries. The rtl8169 driver used to employ some
trickery to make that work, but nowadays there are macros that can be used to
the same effect, so this patch uses them and gets rid of the custom trickery.
This patch also prints out a warning if it detects a potential caching issue
(i.e. ARCH_DMA_MINALIGN > sizeof(struct RxDesc)).

Patch 8 finally adds optional support for non-cached memory. When available
the driver will now use the noncached API to obtain uncached buffers for the
RX and TX descriptor rings. At the same time the cache-maintenance functions
for the RX and TX descriptors become no-ops so that the code can work with
or without the noncached API available.

With all of the above in place, patch 9 adds support for RTL-8168/8111g as
found on the NVIDIA Jetson TK1 board (which has a Tegra124 SoC).

Note that this series also fixes the sporadic hangs of large TFTP transfers
for earlier SoC generations of Tegra (Tegra20 and Tegra30), though they were
less frequent there, probably caused by the cache-lines being 32 bytes rather
than 64.

Thierry

Thierry Reding (9):
  ARM: cache_v7: Various minor cleanups
  ARM: cache-cp15: Use unsigned long for address and size
  malloc: Output region when debugging
  ARM: Implement non-cached memory support
  ARM: tegra: Enable non-cached memory
  net: rtl8169: Honor CONFIG_SYS_RX_ETH_BUFFER
  net: rtl8169: Properly align buffers
  net: rtl8169: Use non-cached memory if available
  net: rtl8169: Add support for RTL-8168/8111g

 README                         |  16 ++++++
 arch/arm/cpu/armv7/cache_v7.c  |  14 +++---
 arch/arm/include/asm/system.h  |   7 ++-
 arch/arm/lib/cache-cp15.c      |   6 +--
 arch/arm/lib/cache.c           |  41 +++++++++++++++
 common/board_r.c               |  11 +++++
 common/dlmalloc.c              |   3 ++
 drivers/net/rtl8169.c          | 110 ++++++++++++++++++++++++++++++-----------
 include/configs/tegra-common.h |   1 +
 9 files changed, 168 insertions(+), 41 deletions(-)

-- 
2.0.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 1/9] ARM: cache_v7: Various minor cleanups
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
@ 2014-08-18  8:00 ` Thierry Reding
  2014-08-18  8:00 ` [U-Boot] [PATCH 2/9] ARM: cache-cp15: Use unsigned long for address and size Thierry Reding
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

Remove two gratuituous blank lines, uses u32 (instead of int) as the
type for values that will be written to a register, moves the beginning
of the variable declaration section to a separate line (rather than the
one with the opening brace) and keeps the function signature on a single
line where possible.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 arch/arm/cpu/armv7/cache_v7.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c
index a2c4032fed8c..0f9d8377ed5a 100644
--- a/arch/arm/cpu/armv7/cache_v7.c
+++ b/arch/arm/cpu/armv7/cache_v7.c
@@ -21,7 +21,8 @@
  * to get size details from Current Cache Size ID Register(CCSIDR)
  */
 static void set_csselr(u32 level, u32 type)
-{	u32 csselr = level << 1 | type;
+{
+	u32 csselr = level << 1 | type;
 
 	/* Write to Cache Size Selection Register(CSSELR) */
 	asm volatile ("mcr p15, 2, %0, c0, c0, 0" : : "r" (csselr));
@@ -49,7 +50,8 @@ static void v7_inval_dcache_level_setway(u32 level, u32 num_sets,
 					 u32 num_ways, u32 way_shift,
 					 u32 log2_line_len)
 {
-	int way, set, setway;
+	int way, set;
+	u32 setway;
 
 	/*
 	 * For optimal assembly code:
@@ -73,7 +75,8 @@ static void v7_clean_inval_dcache_level_setway(u32 level, u32 num_sets,
 					       u32 num_ways, u32 way_shift,
 					       u32 log2_line_len)
 {
-	int way, set, setway;
+	int way, set;
+	u32 setway;
 
 	/*
 	 * For optimal assembly code:
@@ -134,7 +137,6 @@ static void v7_maint_dcache_level_setway(u32 level, u32 operation)
 static void v7_maint_dcache_all(u32 operation)
 {
 	u32 level, cache_type, level_start_bit = 0;
-
 	u32 clidr = get_clidr();
 
 	for (level = 0; level < 7; level++) {
@@ -147,8 +149,7 @@ static void v7_maint_dcache_all(u32 operation)
 	}
 }
 
-static void v7_dcache_clean_inval_range(u32 start,
-					u32 stop, u32 line_len)
+static void v7_dcache_clean_inval_range(u32 start, u32 stop, u32 line_len)
 {
 	u32 mva;
 
@@ -256,7 +257,6 @@ void flush_dcache_all(void)
  */
 void invalidate_dcache_range(unsigned long start, unsigned long stop)
 {
-
 	v7_dcache_maint_range(start, stop, ARMV7_DCACHE_INVAL_RANGE);
 
 	v7_outer_cache_inval_range(start, stop);
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 2/9] ARM: cache-cp15: Use unsigned long for address and size
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
  2014-08-18  8:00 ` [U-Boot] [PATCH 1/9] ARM: cache_v7: Various minor cleanups Thierry Reding
@ 2014-08-18  8:00 ` Thierry Reding
  2014-08-20 19:15   ` Stephen Warren
  2014-08-18  8:00 ` [U-Boot] [PATCH 3/9] malloc: Output region when debugging Thierry Reding
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

size is always non-negative, so it should be unsigned, whereas the
address and size can be larger than 32 bit on 64-bit architectures.
Change the mmu_set_region_dcache_behaviour() to use these types in
anticipation of making the API available on other architectures.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 arch/arm/include/asm/system.h | 2 +-
 arch/arm/lib/cache-cp15.c     | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index d51ba668f323..4b771c261a8b 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -200,7 +200,7 @@ enum {
  * \param size		size of memory region to change
  * \param option	dcache option to select
  */
-void mmu_set_region_dcache_behaviour(u32 start, int size,
+void mmu_set_region_dcache_behaviour(unsigned long start, unsigned long size,
 				     enum dcache_option option);
 
 /**
diff --git a/arch/arm/lib/cache-cp15.c b/arch/arm/lib/cache-cp15.c
index 5fdfdbfca541..b52576dc64bd 100644
--- a/arch/arm/lib/cache-cp15.c
+++ b/arch/arm/lib/cache-cp15.c
@@ -47,15 +47,15 @@ __weak void mmu_page_table_flush(unsigned long start, unsigned long stop)
 	debug("%s: Warning: not implemented\n", __func__);
 }
 
-void mmu_set_region_dcache_behaviour(u32 start, int size,
+void mmu_set_region_dcache_behaviour(unsigned long start, unsigned long size,
 				     enum dcache_option option)
 {
 	u32 *page_table = (u32 *)gd->arch.tlb_addr;
-	u32 upto, end;
+	unsigned long upto, end;
 
 	end = ALIGN(start + size, MMU_SECTION_SIZE) >> MMU_SECTION_SHIFT;
 	start = start >> MMU_SECTION_SHIFT;
-	debug("%s: start=%x, size=%x, option=%d\n", __func__, start, size,
+	debug("%s: start=%lx, size=%lx, option=%d\n", __func__, start, size,
 	      option);
 	for (upto = start; upto < end; upto++)
 		set_section_dcache(upto, option);
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 3/9] malloc: Output region when debugging
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
  2014-08-18  8:00 ` [U-Boot] [PATCH 1/9] ARM: cache_v7: Various minor cleanups Thierry Reding
  2014-08-18  8:00 ` [U-Boot] [PATCH 2/9] ARM: cache-cp15: Use unsigned long for address and size Thierry Reding
@ 2014-08-18  8:00 ` Thierry Reding
  2014-08-18  8:00 ` [U-Boot] [PATCH 4/9] ARM: Implement non-cached memory support Thierry Reding
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

When DEBUG is set, output memory region used for malloc().

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 common/dlmalloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/common/dlmalloc.c b/common/dlmalloc.c
index f9873393c183..3d6391e60acf 100644
--- a/common/dlmalloc.c
+++ b/common/dlmalloc.c
@@ -1533,6 +1533,9 @@ void mem_malloc_init(ulong start, ulong size)
 	mem_malloc_end = start + size;
 	mem_malloc_brk = start;
 
+	debug("using memory %#lx-%#lx for malloc()\n", mem_malloc_start,
+	      mem_malloc_end);
+
 	memset((void *)mem_malloc_start, 0, size);
 
 	malloc_bin_reloc();
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 4/9] ARM: Implement non-cached memory support
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
                   ` (2 preceding siblings ...)
  2014-08-18  8:00 ` [U-Boot] [PATCH 3/9] malloc: Output region when debugging Thierry Reding
@ 2014-08-18  8:00 ` Thierry Reding
  2014-08-20 19:23   ` Stephen Warren
  2014-08-18  8:00 ` [U-Boot] [PATCH 5/9] ARM: tegra: Enable non-cached memory Thierry Reding
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

Implement an API that can be used by drivers to allocate memory from a
poll that is mapped uncached. This is useful if drivers would otherwise
need to do extensive cache maintenance (or explicitly maintaining the
cache isn't safe).

The API is protected using the new CONFIG_SYS_NONCACHED_MEMORY setting.
Boards can set this to the size to be used for the non-cached area. The
area will typically be right below the malloc() area, but architectures
should take care of aligning the beginning and end of the area to honor
any mapping restrictions. Architectures must also ensure that mappings
established for this area do not overlap with the malloc() area (which
should remain cached for improved performance).

While the API is currently only implemented for ARM v7, it should be
generic enough to allow other architectures to implement it as well.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 README                        | 16 ++++++++++++++++
 arch/arm/include/asm/system.h |  5 +++++
 arch/arm/lib/cache.c          | 41 +++++++++++++++++++++++++++++++++++++++++
 common/board_r.c              | 11 +++++++++++
 4 files changed, 73 insertions(+)

diff --git a/README b/README
index 1d713596ec6f..807b2d0ef0f9 100644
--- a/README
+++ b/README
@@ -3759,6 +3759,22 @@ Configuration Settings:
 		Pre-relocation malloc() is only supported on ARM at present
 		but is fairly easy to enable for other archs.
 
+- CONFIG_SYS_NONCACHED_MEMORY:
+		Size of non-cached memory area. This area of memory will be
+		typically located right below the malloc() area and mapped
+		uncached in the MMU. This is useful for drivers that would
+		otherwise require a lot of explicit cache maintenance. For
+		some drivers it's also impossible to properly maintain the
+		cache. For example if the regions that need to be flushed
+		are not a multiple of the cache-line size, then a flush of
+		one region may result in overwriting data that hardware has
+		written to another region in the same cache-line. This can
+		happen for example in network drivers where descriptors for
+		buffers are typically smaller than the CPU cache-line (e.g.
+		16 bytes vs. 32 or 64 bytes).
+
+		Non-cached memory is only supported on 32-bit ARM at present.
+
 - CONFIG_SYS_BOOTM_LEN:
 		Normally compressed uImages are limited to an
 		uncompressed size of 8 MBytes. If this is not enough,
diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index 4b771c261a8b..74ba6912bb81 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -211,6 +211,11 @@ void mmu_set_region_dcache_behaviour(unsigned long start, unsigned long size,
  */
 void mmu_page_table_flush(unsigned long start, unsigned long stop);
 
+#ifdef CONFIG_SYS_NONCACHED_MEMORY
+void noncached_init(void);
+unsigned long noncached_alloc(size_t size, size_t align);
+#endif /* CONFIG_SYS_NONCACHED_MEMORY */
+
 #endif /* __ASSEMBLY__ */
 
 #define arch_align_stack(x) (x)
diff --git a/arch/arm/lib/cache.c b/arch/arm/lib/cache.c
index 4e597a4c1d16..cd0b5ef1c3f4 100644
--- a/arch/arm/lib/cache.c
+++ b/arch/arm/lib/cache.c
@@ -8,6 +8,7 @@
 /* for now: just dummy functions to satisfy the linker */
 
 #include <common.h>
+#include <malloc.h>
 
 __weak void flush_cache(unsigned long start, unsigned long size)
 {
@@ -49,3 +50,43 @@ __weak void enable_caches(void)
 {
 	puts("WARNING: Caches not enabled\n");
 }
+
+#ifdef CONFIG_SYS_NONCACHED_MEMORY
+/*
+ * Reserve one MMU section worth of address space below the malloc() area that
+ * will be mapped uncached.
+ */
+static unsigned long noncached_start;
+static unsigned long noncached_end;
+static unsigned long noncached_next;
+
+void noncached_init(void)
+{
+	unsigned long start, end, size;
+
+	end = ALIGN(mem_malloc_start, MMU_SECTION_SIZE) - MMU_SECTION_SIZE;
+	size = ALIGN(CONFIG_SYS_NONCACHED_MEMORY, MMU_SECTION_SIZE);
+	start = end - size;
+
+	debug("mapping memory %#lx-%#lx non-cached\n", start, end);
+
+	noncached_start = start;
+	noncached_end = end;
+	noncached_next = start;
+
+	mmu_set_region_dcache_behaviour(noncached_start, size, DCACHE_OFF);
+}
+
+unsigned long noncached_alloc(size_t size, size_t align)
+{
+	unsigned long next = ALIGN(noncached_next, align);
+
+	if (next >= noncached_end || (next + size) >= noncached_end)
+		return 0;
+
+	debug("allocated %zu bytes of uncached memory @%#lx\n", size, next);
+	noncached_next = next + size;
+
+	return next;
+}
+#endif /* CONFIG_SYS_NONCACHED_MEMORY */
diff --git a/common/board_r.c b/common/board_r.c
index ba9a68dc6691..6b8e62bf032b 100644
--- a/common/board_r.c
+++ b/common/board_r.c
@@ -270,6 +270,14 @@ static int initr_malloc(void)
 	return 0;
 }
 
+#ifdef CONFIG_SYS_NONCACHED_MEMORY
+static int initr_noncached(void)
+{
+	noncached_init();
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_DM
 static int initr_dm(void)
 {
@@ -765,6 +773,9 @@ init_fnc_t init_sequence_r[] = {
 #endif
 	initr_barrier,
 	initr_malloc,
+#ifdef CONFIG_SYS_NONCACHED_MEMORY
+	initr_noncached,
+#endif
 	bootstage_relocate,
 #ifdef CONFIG_DM
 	initr_dm,
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 5/9] ARM: tegra: Enable non-cached memory
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
                   ` (3 preceding siblings ...)
  2014-08-18  8:00 ` [U-Boot] [PATCH 4/9] ARM: Implement non-cached memory support Thierry Reding
@ 2014-08-18  8:00 ` Thierry Reding
  2014-08-20 19:24   ` Stephen Warren
  2014-08-18  8:00 ` [U-Boot] [PATCH 6/9] net: rtl8169: Honor CONFIG_SYS_RX_ETH_BUFFER Thierry Reding
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

Some boards, most notably those with a PCIe ethernet NIC, require this
to avoid cache coherency problems. Since the option adds very little
code and overhead enable it across all Tegra generations. Other drivers
may also start supporting this functionality at some point, so enabling
it now will automatically reap the benefits later on.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 include/configs/tegra-common.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/configs/tegra-common.h b/include/configs/tegra-common.h
index 717cd61bd69a..16f45f5def9b 100644
--- a/include/configs/tegra-common.h
+++ b/include/configs/tegra-common.h
@@ -41,6 +41,7 @@
  * Size of malloc() pool
  */
 #define CONFIG_SYS_MALLOC_LEN		(4 << 20)	/* 4MB  */
+#define CONFIG_SYS_NONCACHED_MEMORY	(1 << 20)	/* 1 MiB */
 
 /*
  * NS16550 Configuration
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 6/9] net: rtl8169: Honor CONFIG_SYS_RX_ETH_BUFFER
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
                   ` (4 preceding siblings ...)
  2014-08-18  8:00 ` [U-Boot] [PATCH 5/9] ARM: tegra: Enable non-cached memory Thierry Reding
@ 2014-08-18  8:00 ` Thierry Reding
  2014-08-18  8:00 ` [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers Thierry Reding
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

According to the top-level README file, this configuration setting can
be used to override the number of receive buffers that an ethernet NIC
uses.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/net/rtl8169.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c
index d040ab171bf5..c3eb474f0fba 100644
--- a/drivers/net/rtl8169.c
+++ b/drivers/net/rtl8169.c
@@ -79,7 +79,11 @@ static int media[MAX_UNITS] = { -1, -1, -1, -1, -1, -1, -1, -1 };
 #define InterFrameGap	0x03	/* 3 means InterFrameGap = the shortest one */
 
 #define NUM_TX_DESC	1	/* Number of Tx descriptor registers */
-#define NUM_RX_DESC	4	/* Number of Rx descriptor registers */
+#ifdef CONFIG_SYS_RX_ETH_BUFFER
+  #define NUM_RX_DESC	CONFIG_SYS_RX_ETH_BUFFER
+#else
+  #define NUM_RX_DESC	4	/* Number of Rx descriptor registers */
+#endif
 #define RX_BUF_SIZE	1536	/* Rx Buffer size */
 #define RX_BUF_LEN	8192
 
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
                   ` (5 preceding siblings ...)
  2014-08-18  8:00 ` [U-Boot] [PATCH 6/9] net: rtl8169: Honor CONFIG_SYS_RX_ETH_BUFFER Thierry Reding
@ 2014-08-18  8:00 ` Thierry Reding
  2014-08-20 19:29   ` Stephen Warren
  2014-08-18  8:00 ` [U-Boot] [PATCH 8/9] net: rtl8169: Use non-cached memory if available Thierry Reding
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

RX and TX descriptor rings should be aligned to 256 byte boundaries. Use
the DEFINE_ALIGN_BUFFER() macro to define the buffers so that they don't
have to be manually aligned later on. Also make sure that the buffers do
align to cache-line boundaries in case the cache-line is higher than the
256 byte alignment requirements of the NIC.

Also add a warning if the cache-line size is larger than the descriptor
size, because the driver may discard changes to descriptors made by the
hardware when requeuing RX buffers.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/net/rtl8169.c | 60 ++++++++++++++++++++++++++-------------------------
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c
index c3eb474f0fba..c53666134e06 100644
--- a/drivers/net/rtl8169.c
+++ b/drivers/net/rtl8169.c
@@ -277,23 +277,29 @@ struct RxDesc {
 	u32 buf_Haddr;
 };
 
-/* Define the TX Descriptor */
-static u8 tx_ring[NUM_TX_DESC * sizeof(struct TxDesc) + 256];
-/*	__attribute__ ((aligned(256))); */
+#if ARCH_DMA_MINALIGN > 256
+#  define RTL8169_ALIGN ARCH_DMA_MINALIGN
+#else
+#  define RTL8169_ALIGN 256
+#endif
 
-/* Create a static buffer of size RX_BUF_SZ for each
-TX Descriptor.	All descriptors point to a
-part of this buffer */
-static unsigned char txb[NUM_TX_DESC * RX_BUF_SIZE];
+/* Define the TX Descriptor */
+DEFINE_ALIGN_BUFFER(struct TxDesc, tx_ring, NUM_TX_DESC, RTL8169_ALIGN);
 
 /* Define the RX Descriptor */
-static u8 rx_ring[NUM_RX_DESC * sizeof(struct TxDesc) + 256];
-  /*  __attribute__ ((aligned(256))); */
+DEFINE_ALIGN_BUFFER(struct RxDesc, rx_ring, NUM_RX_DESC, RTL8169_ALIGN);
+
+/*
+ * Create a static buffer of size RX_BUF_SZ for each TX Descriptor. All
+ * descriptors point to a part of this buffer.
+ */
+DEFINE_ALIGN_BUFFER(u8, txb, NUM_TX_DESC * RX_BUF_SIZE, RTL8169_ALIGN);
 
-/* Create a static buffer of size RX_BUF_SZ for each
-RX Descriptor	All descriptors point to a
-part of this buffer */
-static unsigned char rxb[NUM_RX_DESC * RX_BUF_SIZE];
+/*
+ * Create a static buffer of size RX_BUF_SZ for each RX Descriptor. All
+ * descriptors point to a part of this buffer.
+ */
+DEFINE_ALIGN_BUFFER(u8, rxb, NUM_RX_DESC * RX_BUF_SIZE, RTL8169_ALIGN);
 
 struct rtl8169_private {
 	void *mmio_addr;	/* memory map physical address */
@@ -301,8 +307,6 @@ struct rtl8169_private {
 	unsigned long cur_rx;	/* Index into the Rx descriptor buffer of next Rx pkt. */
 	unsigned long cur_tx;	/* Index into the Tx descriptor buffer of next Rx pkt. */
 	unsigned long dirty_tx;
-	unsigned char *TxDescArrays;	/* Index of Tx Descriptor buffer */
-	unsigned char *RxDescArrays;	/* Index of Rx Descriptor buffer */
 	struct TxDesc *TxDescArray;	/* Index of 256-alignment Tx Descriptor buffer */
 	struct RxDesc *RxDescArray;	/* Index of 256-alignment Rx Descriptor buffer */
 	unsigned char *RxBufferRings;	/* Index of Rx Buffer  */
@@ -710,16 +714,6 @@ static int rtl_reset(struct eth_device *dev, bd_t *bis)
 	printf ("%s\n", __FUNCTION__);
 #endif
 
-	tpc->TxDescArrays = tx_ring;
-	/* Tx Desscriptor needs 256 bytes alignment; */
-	tpc->TxDescArray = (struct TxDesc *) ((unsigned long)(tpc->TxDescArrays +
-							      255) & ~255);
-
-	tpc->RxDescArrays = rx_ring;
-	/* Rx Desscriptor needs 256 bytes alignment; */
-	tpc->RxDescArray = (struct RxDesc *) ((unsigned long)(tpc->RxDescArrays +
-							      255) & ~255);
-
 	rtl8169_init_ring(dev);
 	rtl8169_hw_start(dev);
 	/* Construct a perfect filter frame with the mac address as first match
@@ -761,10 +755,6 @@ static void rtl_halt(struct eth_device *dev)
 
 	RTL_W32(RxMissed, 0);
 
-	tpc->TxDescArrays = NULL;
-	tpc->RxDescArrays = NULL;
-	tpc->TxDescArray = NULL;
-	tpc->RxDescArray = NULL;
 	for (i = 0; i < NUM_RX_DESC; i++) {
 		tpc->RxBufferRing[i] = NULL;
 	}
@@ -909,6 +899,18 @@ static int rtl_init(struct eth_device *dev, bd_t *bis)
 #endif
 	}
 
+	/*
+	 * Warn if the cache-line size is larger than the descriptor size. In
+	 * such cases the driver will likely fail because the CPU needs to
+	 * flush the cache when requeuing RX buffers, therefore descriptors
+	 * written by the hardware may be discarded.
+	 */
+	if (ARCH_DMA_MINALIGN > sizeof(struct RxDesc))
+		printf("WARNING: cache-line size is larger than descriptor size\n");
+
+	tpc->TxDescArray = tx_ring;
+	tpc->RxDescArray = rx_ring;
+
 	return 1;
 }
 
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 8/9] net: rtl8169: Use non-cached memory if available
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
                   ` (6 preceding siblings ...)
  2014-08-18  8:00 ` [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers Thierry Reding
@ 2014-08-18  8:00 ` Thierry Reding
  2014-08-20 19:33   ` Stephen Warren
  2014-08-18  8:00 ` [U-Boot] [PATCH 9/9] net: rtl8169: Add support for RTL-8168/8111g Thierry Reding
  2014-08-20 19:12 ` [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Stephen Warren
  9 siblings, 1 reply; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

To work around potential issues with explicit cache maintenance of the
RX and TX descriptor rings, allocate them from a pool of uncached memory
if the architecture supports it.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/net/rtl8169.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c
index c53666134e06..1c04946d7691 100644
--- a/drivers/net/rtl8169.c
+++ b/drivers/net/rtl8169.c
@@ -283,11 +283,31 @@ struct RxDesc {
 #  define RTL8169_ALIGN 256
 #endif
 
+/*
+ * TX and RX descriptors are 16 bytes. This causes problems with the cache
+ * maintenance on CPUs where the cache-line size exceeds the size of these
+ * descriptors. What will happen is that when the driver receives a packet
+ * it will be immediately requeued for the hardware to reuse. The CPU will
+ * therefore need to flush the cache-line containing the descriptor, which
+ * will cause all other descriptors in the same cache-line to be flushed
+ * along with it. If one of those descriptors had been written to by the
+ * device those changes (and the associated packet) will be lost.
+ *
+ * To work around this, we make use of non-cached memory if available. If
+ * descriptors are mapped uncached there's no need to manually flush them
+ * or invalidate them.
+ *
+ * Note that this only applies to descriptors. The packet data buffers do
+ * not have the same constraints since they are 1536 bytes large, so they
+ * are unlikely to share cache-lines.
+ */
+#ifndef CONFIG_SYS_NONCACHED_MEMORY
 /* Define the TX Descriptor */
 DEFINE_ALIGN_BUFFER(struct TxDesc, tx_ring, NUM_TX_DESC, RTL8169_ALIGN);
 
 /* Define the RX Descriptor */
 DEFINE_ALIGN_BUFFER(struct RxDesc, rx_ring, NUM_RX_DESC, RTL8169_ALIGN);
+#endif
 
 /*
  * Create a static buffer of size RX_BUF_SZ for each TX Descriptor. All
@@ -412,28 +432,36 @@ match:
 
 static void rtl_inval_rx_desc(struct RxDesc *desc)
 {
+#ifndef CONFIG_SYS_NONCACHED_MEMORY
 	unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN - 1);
 	unsigned long end = ALIGN(start + sizeof(*desc), ARCH_DMA_MINALIGN);
 
 	invalidate_dcache_range(start, end);
+#endif
 }
 
 static void rtl_flush_rx_desc(struct RxDesc *desc)
 {
+#ifndef CONFIG_SYS_NONCACHED_MEMORY
 	flush_cache((unsigned long)desc, sizeof(*desc));
+#endif
 }
 
 static void rtl_inval_tx_desc(struct TxDesc *desc)
 {
+#ifndef CONFIG_SYS_NONCACHED_MEMORY
 	unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN - 1);
 	unsigned long end = ALIGN(start + sizeof(*desc), ARCH_DMA_MINALIGN);
 
 	invalidate_dcache_range(start, end);
+#endif
 }
 
 static void rtl_flush_tx_desc(struct TxDesc *desc)
 {
+#ifndef CONFIG_SYS_NONCACHED_MEMORY
 	flush_cache((unsigned long)desc, sizeof(*desc));
+#endif
 }
 
 static void rtl_inval_buffer(void *buf, size_t size)
@@ -769,6 +797,9 @@ INIT - Look for an adapter, this routine's visible to the outside
 static int rtl_init(struct eth_device *dev, bd_t *bis)
 {
 	static int board_idx = -1;
+#ifdef CONFIG_SYS_NONCACHED_MEMORY
+	size_t size;
+#endif
 	int i, rc;
 	int option = -1, Cap10_100 = 0, Cap1000 = 0;
 
@@ -899,6 +930,7 @@ static int rtl_init(struct eth_device *dev, bd_t *bis)
 #endif
 	}
 
+#ifndef CONFIG_SYS_NONCACHED_MEMORY
 	/*
 	 * Warn if the cache-line size is larger than the descriptor size. In
 	 * such cases the driver will likely fail because the CPU needs to
@@ -910,6 +942,17 @@ static int rtl_init(struct eth_device *dev, bd_t *bis)
 
 	tpc->TxDescArray = tx_ring;
 	tpc->RxDescArray = rx_ring;
+#else
+	/*
+	 * When non-cached memory is available, allocate the descriptors from
+	 * an uncached memory region to avoid any problems caused by caching.
+	 */
+	size = NUM_TX_DESC * sizeof(struct TxDesc);
+	tpc->TxDescArray = (struct TxDesc *)noncached_alloc(size, 256);
+
+	size = NUM_RX_DESC * sizeof(struct RxDesc);
+	tpc->RxDescArray = (struct RxDesc *)noncached_alloc(size, 256);
+#endif
 
 	return 1;
 }
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 9/9] net: rtl8169: Add support for RTL-8168/8111g
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
                   ` (7 preceding siblings ...)
  2014-08-18  8:00 ` [U-Boot] [PATCH 8/9] net: rtl8169: Use non-cached memory if available Thierry Reding
@ 2014-08-18  8:00 ` Thierry Reding
  2014-08-20 19:12 ` [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Stephen Warren
  9 siblings, 0 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-18  8:00 UTC (permalink / raw)
  To: u-boot

From: Thierry Reding <treding@nvidia.com>

This network interface card in found on the NVIDIA Jetson TK1.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/net/rtl8169.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c
index 1c04946d7691..a02968fa4a12 100644
--- a/drivers/net/rtl8169.c
+++ b/drivers/net/rtl8169.c
@@ -252,6 +252,7 @@ static struct {
 	{"RTL-8168b/8111sb",	0x38, 0xff7e1880,},
 	{"RTL-8168d/8111d",	0x28, 0xff7e1880,},
 	{"RTL-8168evl/8111evl",	0x2e, 0xff7e1880,},
+	{"RTL-8168/8111g",	0x4c, 0xff7e1880,},
 	{"RTL-8101e",		0x34, 0xff7e1880,},
 	{"RTL-8100e",		0x32, 0xff7e1880,},
 };
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues
  2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
                   ` (8 preceding siblings ...)
  2014-08-18  8:00 ` [U-Boot] [PATCH 9/9] net: rtl8169: Add support for RTL-8168/8111g Thierry Reding
@ 2014-08-20 19:12 ` Stephen Warren
  2014-08-21 14:11   ` Thierry Reding
  9 siblings, 1 reply; 25+ messages in thread
From: Stephen Warren @ 2014-08-20 19:12 UTC (permalink / raw)
  To: u-boot

On 08/18/2014 02:00 AM, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
>
> This series attempts to fix a long-standing problem in the rtl8169 driver
> (though the same problem may exist in other drivers as well). Let me first
> explain what exactly the issue is:
>
> The rtl8169 driver provides a set of RX and TX descriptors for the device to
> use. Once they're set up, the device is told about their location so that it
> can fetch the descriptors using DMA. The device will also write packet state
> back into these descriptors using DMA. For this to work properly, whenever a
> driver needs to access these descriptors it needs to invalidate the D-cache
> line(s) associated with them. Similarly when changes to the descriptor have
> been made by the driver, the cache lines need to be flushed to make sure the
> changes are visible to the device.
>
> The descriptors are 16 bytes in size. This causes problems when used on CPUs
> that have a cache-line size that is larger than 16 bytes. One example is the
> NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors
> fit into a single cache-line. So whenever the driver flushes a cache-line it
> has the potential to discard changes made to another descriptor by the DMA
> device. One typical symptom is that large transfers over TFTP will often not
> complete and hang somewhere midway because a device marked a packet received
> but the driver flushing the cache and causing the packet to be lost.
>
> Since the descriptors need to be consecutive in memory, I don't see a way to
> fix this other than to use uncached memory. Therefore the solution proposed
> in this patch series is to introduce a mechanism in U-Boot to allow a driver
> to allocate from a pool of uncached memory. Currently an implementation is
> provided only for ARM v7. The idea is that a region (of user-definable size)
> immediately below (taking into account architecture-specific alignment
> restrictions) the malloc() area is mapped uncacheable in the MMU. A driver
> can use the new noncached_alloc() function to allocate a chunk of memory
> from this pool dynamically for buffers that it can't or doesn't want to do
> any explicit cache-maintainance on, yet needs to be shared with DMA devices.
>
> Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style
> issues in the ARM v7 cache code and patch 2 uses more future-proof types for
> the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely
> for debugging purposes. It will print out the region used by malloc() when
> DEBUG is enabled. This can be useful to see where the malloc() region is in
> the memory map (compared to the noncached region introduced in a later patch
> for example).
>
> Patch 4 implements the noncached API for ARM v7. It obtains the start of the
> malloc() area and places the noncached region immediately below it so that
> noncached_alloc() can allocate from it. During boot, the noncached area will
> be set up immediately after malloc().
>
> Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk
> which should be plenty (it's also the minimum on ARM v7 because it matches
> the MMU section size and therefore the granularity at which U-Boot can set
> the cacheable attributes).

If LPAE were to be enabled, the minimum would be 2MiB, but I suppose we 
can deal with that if/when the time comes.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 2/9] ARM: cache-cp15: Use unsigned long for address and size
  2014-08-18  8:00 ` [U-Boot] [PATCH 2/9] ARM: cache-cp15: Use unsigned long for address and size Thierry Reding
@ 2014-08-20 19:15   ` Stephen Warren
  2014-08-22  8:29     ` Thierry Reding
  0 siblings, 1 reply; 25+ messages in thread
From: Stephen Warren @ 2014-08-20 19:15 UTC (permalink / raw)
  To: u-boot

On 08/18/2014 02:00 AM, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
>
> size is always non-negative, so it should be unsigned, whereas the
> address and size can be larger than 32 bit on 64-bit architectures.
> Change the mmu_set_region_dcache_behaviour() to use these types in
> anticipation of making the API available on other architectures.

> diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h

> -void mmu_set_region_dcache_behaviour(u32 start, int size,
> +void mmu_set_region_dcache_behaviour(unsigned long start, unsigned long size,
>   				     enum dcache_option option);

If we were to use LPAE on a 32-bit system, physical addresses could be 
more than 32-bit. That would imply we should create a physaddr_t type 
rather than relying on unsigned long. Still, I suppose since U-Boot just 
maps RAM (and everything else) 1:1, we'd never use RAM beyond 4GiB, so 
LPAE actually isn't that interesting...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 4/9] ARM: Implement non-cached memory support
  2014-08-18  8:00 ` [U-Boot] [PATCH 4/9] ARM: Implement non-cached memory support Thierry Reding
@ 2014-08-20 19:23   ` Stephen Warren
  2014-08-21 15:31     ` Thierry Reding
  2014-08-22  8:31     ` Thierry Reding
  0 siblings, 2 replies; 25+ messages in thread
From: Stephen Warren @ 2014-08-20 19:23 UTC (permalink / raw)
  To: u-boot

On 08/18/2014 02:00 AM, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
>
> Implement an API that can be used by drivers to allocate memory from a
> poll that is mapped uncached. This is useful if drivers would otherwise

s/poll/pool/

> need to do extensive cache maintenance (or explicitly maintaining the
> cache isn't safe).
>
> The API is protected using the new CONFIG_SYS_NONCACHED_MEMORY setting.
> Boards can set this to the size to be used for the non-cached area. The
> area will typically be right below the malloc() area, but architectures
> should take care of aligning the beginning and end of the area to honor
> any mapping restrictions. Architectures must also ensure that mappings
> established for this area do not overlap with the malloc() area (which
> should remain cached for improved performance).
>
> While the API is currently only implemented for ARM v7, it should be
> generic enough to allow other architectures to implement it as well.

> diff --git a/README b/README

> +- CONFIG_SYS_NONCACHED_MEMORY:
> +		Size of non-cached memory area. This area of memory will be
> +		typically located right below the malloc() area and mapped
> +		uncached in the MMU. This is useful for drivers that would
> +		otherwise require a lot of explicit cache maintenance. For
> +		some drivers it's also impossible to properly maintain the
> +		cache. For example if the regions that need to be flushed
> +		are not a multiple of the cache-line size,

I would add:

*and* padding cannot be allocated between the regions to align them 
(i.e. if the HW requires a contiguous array of regions, and the size of 
each region is not cache-aligned),

 >                                                           then a flush of
> +		one region may result in overwriting data that hardware has
> +		written to another region in the same cache-line. This can
> +		happen for example in network drivers where descriptors for
> +		buffers are typically smaller than the CPU cache-line (e.g.
> +		16 bytes vs. 32 or 64 bytes).
> +
> +		Non-cached memory is only supported on 32-bit ARM at present.

> diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h

> +#ifdef CONFIG_SYS_NONCACHED_MEMORY
> +void noncached_init(void);
> +unsigned long noncached_alloc(size_t size, size_t align);

s/size_t/unsigned long/ would match the arguments in the earlier patch 
to mmu_set_region_dcache_behaviour()'s prototype.

> diff --git a/arch/arm/lib/cache.c b/arch/arm/lib/cache.c

> +void noncached_init(void)
> +{
> +	unsigned long start, end, size;
> +
> +	end = ALIGN(mem_malloc_start, MMU_SECTION_SIZE) - MMU_SECTION_SIZE;

Not really "end" (which implies it's inside the region) but "first 
address outside the region". That said, I'm not sure how to express that 
succinctly, so perhaps "end" is fine!

> +unsigned long noncached_alloc(size_t size, size_t align)
> +{
> +	unsigned long next = ALIGN(noncached_next, align);
> +
> +	if (next >= noncached_end || (next + size) >= noncached_end)
> +		return 0;

If size is quite large, and next is near to U32_MAX, there's a chance of 
wrap-around there. Perhaps calculate the size left in the region 
(noncached_end - next), and validate that's >= size.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 5/9] ARM: tegra: Enable non-cached memory
  2014-08-18  8:00 ` [U-Boot] [PATCH 5/9] ARM: tegra: Enable non-cached memory Thierry Reding
@ 2014-08-20 19:24   ` Stephen Warren
  0 siblings, 0 replies; 25+ messages in thread
From: Stephen Warren @ 2014-08-20 19:24 UTC (permalink / raw)
  To: u-boot

On 08/18/2014 02:00 AM, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
>
> Some boards, most notably those with a PCIe ethernet NIC, require this
> to avoid cache coherency problems. Since the option adds very little
> code and overhead enable it across all Tegra generations. Other drivers
> may also start supporting this functionality at some point, so enabling
> it now will automatically reap the benefits later on.

Acked-by: Stephen Warren <swarren@nvidia.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers
  2014-08-18  8:00 ` [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers Thierry Reding
@ 2014-08-20 19:29   ` Stephen Warren
  2014-08-22  9:15     ` Thierry Reding
  0 siblings, 1 reply; 25+ messages in thread
From: Stephen Warren @ 2014-08-20 19:29 UTC (permalink / raw)
  To: u-boot

On 08/18/2014 02:00 AM, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
>
> RX and TX descriptor rings should be aligned to 256 byte boundaries. Use
> the DEFINE_ALIGN_BUFFER() macro to define the buffers so that they don't
> have to be manually aligned later on. Also make sure that the buffers do
> align to cache-line boundaries in case the cache-line is higher than the
> 256 byte alignment requirements of the NIC.
>
> Also add a warning if the cache-line size is larger than the descriptor
> size, because the driver may discard changes to descriptors made by the
> hardware when requeuing RX buffers.

> @@ -909,6 +899,18 @@ static int rtl_init(struct eth_device *dev, bd_t *bis)

> +	/*
> +	 * Warn if the cache-line size is larger than the descriptor size. In
> +	 * such cases the driver will likely fail because the CPU needs to
> +	 * flush the cache when requeuing RX buffers, therefore descriptors
> +	 * written by the hardware may be discarded.
> +	 */
> +	if (ARCH_DMA_MINALIGN > sizeof(struct RxDesc))
> +		printf("WARNING: cache-line size is larger than descriptor size\n");

I'd be tempted to make that a compile-time #error (or perhaps just a 
#warning) Perhaps #error would break compilation of existing boards though?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 8/9] net: rtl8169: Use non-cached memory if available
  2014-08-18  8:00 ` [U-Boot] [PATCH 8/9] net: rtl8169: Use non-cached memory if available Thierry Reding
@ 2014-08-20 19:33   ` Stephen Warren
  2014-08-22  9:29     ` Thierry Reding
  0 siblings, 1 reply; 25+ messages in thread
From: Stephen Warren @ 2014-08-20 19:33 UTC (permalink / raw)
  To: u-boot

On 08/18/2014 02:00 AM, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
>
> To work around potential issues with explicit cache maintenance of the
> RX and TX descriptor rings, allocate them from a pool of uncached memory
> if the architecture supports it.

> diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c

> +#ifndef CONFIG_SYS_NONCACHED_MEMORY
>   /* Define the TX Descriptor */
>   DEFINE_ALIGN_BUFFER(struct TxDesc, tx_ring, NUM_TX_DESC, RTL8169_ALIGN);
>
>   /* Define the RX Descriptor */
>   DEFINE_ALIGN_BUFFER(struct RxDesc, rx_ring, NUM_RX_DESC, RTL8169_ALIGN);
> +#endif

It feels slightly inconsistent to have one case allocate this memory in 
BSS at compile-time, and in the other to allocate it dynamically.

Would it be better to remove this global, and simply call different APIs 
(regular aligned malloc, v.s. non-cached allocate) during rtl_init()?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues
  2014-08-20 19:12 ` [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Stephen Warren
@ 2014-08-21 14:11   ` Thierry Reding
  0 siblings, 0 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-21 14:11 UTC (permalink / raw)
  To: u-boot

On Wed, Aug 20, 2014 at 01:12:20PM -0600, Stephen Warren wrote:
> On 08/18/2014 02:00 AM, Thierry Reding wrote:
> >From: Thierry Reding <treding@nvidia.com>
> >
> >This series attempts to fix a long-standing problem in the rtl8169 driver
> >(though the same problem may exist in other drivers as well). Let me first
> >explain what exactly the issue is:
> >
> >The rtl8169 driver provides a set of RX and TX descriptors for the device to
> >use. Once they're set up, the device is told about their location so that it
> >can fetch the descriptors using DMA. The device will also write packet state
> >back into these descriptors using DMA. For this to work properly, whenever a
> >driver needs to access these descriptors it needs to invalidate the D-cache
> >line(s) associated with them. Similarly when changes to the descriptor have
> >been made by the driver, the cache lines need to be flushed to make sure the
> >changes are visible to the device.
> >
> >The descriptors are 16 bytes in size. This causes problems when used on CPUs
> >that have a cache-line size that is larger than 16 bytes. One example is the
> >NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors
> >fit into a single cache-line. So whenever the driver flushes a cache-line it
> >has the potential to discard changes made to another descriptor by the DMA
> >device. One typical symptom is that large transfers over TFTP will often not
> >complete and hang somewhere midway because a device marked a packet received
> >but the driver flushing the cache and causing the packet to be lost.
> >
> >Since the descriptors need to be consecutive in memory, I don't see a way to
> >fix this other than to use uncached memory. Therefore the solution proposed
> >in this patch series is to introduce a mechanism in U-Boot to allow a driver
> >to allocate from a pool of uncached memory. Currently an implementation is
> >provided only for ARM v7. The idea is that a region (of user-definable size)
> >immediately below (taking into account architecture-specific alignment
> >restrictions) the malloc() area is mapped uncacheable in the MMU. A driver
> >can use the new noncached_alloc() function to allocate a chunk of memory
> >from this pool dynamically for buffers that it can't or doesn't want to do
> >any explicit cache-maintainance on, yet needs to be shared with DMA devices.
> >
> >Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style
> >issues in the ARM v7 cache code and patch 2 uses more future-proof types for
> >the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely
> >for debugging purposes. It will print out the region used by malloc() when
> >DEBUG is enabled. This can be useful to see where the malloc() region is in
> >the memory map (compared to the noncached region introduced in a later patch
> >for example).
> >
> >Patch 4 implements the noncached API for ARM v7. It obtains the start of the
> >malloc() area and places the noncached region immediately below it so that
> >noncached_alloc() can allocate from it. During boot, the noncached area will
> >be set up immediately after malloc().
> >
> >Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk
> >which should be plenty (it's also the minimum on ARM v7 because it matches
> >the MMU section size and therefore the granularity at which U-Boot can set
> >the cacheable attributes).
> 
> If LPAE were to be enabled, the minimum would be 2MiB, but I suppose we can
> deal with that if/when the time comes.

The code that sets up the noncached memory region will pad the size to a
multiple of the section size, and if LPAE were enabled I'd expect the
section size to be defined appropriately, too. It would still mean that
Tegra configured it to be 1 MiB but the code actually reserving 2 MiB of
addresses, but that's explicitly allowed in the documentation.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140821/148c4539/attachment.pgp>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 4/9] ARM: Implement non-cached memory support
  2014-08-20 19:23   ` Stephen Warren
@ 2014-08-21 15:31     ` Thierry Reding
  2014-08-22  8:31     ` Thierry Reding
  1 sibling, 0 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-21 15:31 UTC (permalink / raw)
  To: u-boot

On Wed, Aug 20, 2014 at 01:23:13PM -0600, Stephen Warren wrote:
> On 08/18/2014 02:00 AM, Thierry Reding wrote:
> >From: Thierry Reding <treding@nvidia.com>
> >
> >Implement an API that can be used by drivers to allocate memory from a
> >poll that is mapped uncached. This is useful if drivers would otherwise
> 
> s/poll/pool/

Done.

> >need to do extensive cache maintenance (or explicitly maintaining the
> >cache isn't safe).
> >
> >The API is protected using the new CONFIG_SYS_NONCACHED_MEMORY setting.
> >Boards can set this to the size to be used for the non-cached area. The
> >area will typically be right below the malloc() area, but architectures
> >should take care of aligning the beginning and end of the area to honor
> >any mapping restrictions. Architectures must also ensure that mappings
> >established for this area do not overlap with the malloc() area (which
> >should remain cached for improved performance).
> >
> >While the API is currently only implemented for ARM v7, it should be
> >generic enough to allow other architectures to implement it as well.
> 
> >diff --git a/README b/README
> 
> >+- CONFIG_SYS_NONCACHED_MEMORY:
> >+		Size of non-cached memory area. This area of memory will be
> >+		typically located right below the malloc() area and mapped
> >+		uncached in the MMU. This is useful for drivers that would
> >+		otherwise require a lot of explicit cache maintenance. For
> >+		some drivers it's also impossible to properly maintain the
> >+		cache. For example if the regions that need to be flushed
> >+		are not a multiple of the cache-line size,
> 
> I would add:
> 
> *and* padding cannot be allocated between the regions to align them (i.e. if
> the HW requires a contiguous array of regions, and the size of each region
> is not cache-aligned),

Sounds good.

> >+		one region may result in overwriting data that hardware has
> >+		written to another region in the same cache-line. This can
> >+		happen for example in network drivers where descriptors for
> >+		buffers are typically smaller than the CPU cache-line (e.g.
> >+		16 bytes vs. 32 or 64 bytes).
> >+
> >+		Non-cached memory is only supported on 32-bit ARM at present.
> 
> >diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> 
> >+#ifdef CONFIG_SYS_NONCACHED_MEMORY
> >+void noncached_init(void);
> >+unsigned long noncached_alloc(size_t size, size_t align);
> 
> s/size_t/unsigned long/ would match the arguments in the earlier patch to
> mmu_set_region_dcache_behaviour()'s prototype.
> 
> >diff --git a/arch/arm/lib/cache.c b/arch/arm/lib/cache.c
> 
> >+void noncached_init(void)
> >+{
> >+	unsigned long start, end, size;
> >+
> >+	end = ALIGN(mem_malloc_start, MMU_SECTION_SIZE) - MMU_SECTION_SIZE;
> 
> Not really "end" (which implies it's inside the region) but "first address
> outside the region". That said, I'm not sure how to express that succinctly,
> so perhaps "end" is fine!

I think I've seen "limit" used rather commonly in that case.

> >+unsigned long noncached_alloc(size_t size, size_t align)
> >+{
> >+	unsigned long next = ALIGN(noncached_next, align);
> >+
> >+	if (next >= noncached_end || (next + size) >= noncached_end)
> >+		return 0;
> 
> If size is quite large, and next is near to U32_MAX, there's a chance of
> wrap-around there. Perhaps calculate the size left in the region
> (noncached_end - next), and validate that's >= size.

Yes, that sounds a lot safer.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140821/40ecc832/attachment.pgp>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 2/9] ARM: cache-cp15: Use unsigned long for address and size
  2014-08-20 19:15   ` Stephen Warren
@ 2014-08-22  8:29     ` Thierry Reding
  0 siblings, 0 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-22  8:29 UTC (permalink / raw)
  To: u-boot

On Wed, Aug 20, 2014 at 01:15:15PM -0600, Stephen Warren wrote:
> On 08/18/2014 02:00 AM, Thierry Reding wrote:
> >From: Thierry Reding <treding@nvidia.com>
> >
> >size is always non-negative, so it should be unsigned, whereas the
> >address and size can be larger than 32 bit on 64-bit architectures.
> >Change the mmu_set_region_dcache_behaviour() to use these types in
> >anticipation of making the API available on other architectures.
> 
> >diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> 
> >-void mmu_set_region_dcache_behaviour(u32 start, int size,
> >+void mmu_set_region_dcache_behaviour(unsigned long start, unsigned long size,
> >  				     enum dcache_option option);
> 
> If we were to use LPAE on a 32-bit system, physical addresses could be more
> than 32-bit. That would imply we should create a physaddr_t type rather than
> relying on unsigned long. Still, I suppose since U-Boot just maps RAM (and
> everything else) 1:1, we'd never use RAM beyond 4GiB, so LPAE actually isn't
> that interesting...

Interestingly there is already a phys_addr_t type defined (as opposed to
Linux it's defined in arch/*/include/asm/types.h). It doesn't currently
take into account things like LPAE, but then again there's no standard
configuration option to select that (Linux has CONFIG_PHYS_ADDR_T_64BIT
for this). CONFIG_PHYS_64BIT seems to come close, but it's currently
only defined by true 64-bit architectures.

I took a stab at unifying the various definitions in linux/types.h, but
that resulted in all kinds of weird build errors all over the place, so
at some point I gave up and kept the definitions as they are. Still, for
this series I've tried to use the existing phys_addr_t where appropriate
so it should just be a matter of properly redefining in for LPAE if
support for it ever gets added.

I've refrained from using phys_size_t since I don't see why it would be
useful and instead used size_t consistently where a size is involved. If
anyone feels strongly about using phys_size_t I can use it instead,
though.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140822/f581a37f/attachment.pgp>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 4/9] ARM: Implement non-cached memory support
  2014-08-20 19:23   ` Stephen Warren
  2014-08-21 15:31     ` Thierry Reding
@ 2014-08-22  8:31     ` Thierry Reding
  1 sibling, 0 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-22  8:31 UTC (permalink / raw)
  To: u-boot

On Wed, Aug 20, 2014 at 01:23:13PM -0600, Stephen Warren wrote:
> On 08/18/2014 02:00 AM, Thierry Reding wrote:
[...]
> >diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> 
> >+#ifdef CONFIG_SYS_NONCACHED_MEMORY
> >+void noncached_init(void);
> >+unsigned long noncached_alloc(size_t size, size_t align);
> 
> s/size_t/unsigned long/ would match the arguments in the earlier patch to
> mmu_set_region_dcache_behaviour()'s prototype.

I've replaced the unsigned long in the earlier patch with size_t for
consistency. Let me know if you have any objections to that.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140822/a6f00c60/attachment.pgp>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers
  2014-08-20 19:29   ` Stephen Warren
@ 2014-08-22  9:15     ` Thierry Reding
  2014-11-12 16:23       ` Simon Glass
  0 siblings, 1 reply; 25+ messages in thread
From: Thierry Reding @ 2014-08-22  9:15 UTC (permalink / raw)
  To: u-boot

On Wed, Aug 20, 2014 at 01:29:57PM -0600, Stephen Warren wrote:
> On 08/18/2014 02:00 AM, Thierry Reding wrote:
> >From: Thierry Reding <treding@nvidia.com>
> >
> >RX and TX descriptor rings should be aligned to 256 byte boundaries. Use
> >the DEFINE_ALIGN_BUFFER() macro to define the buffers so that they don't
> >have to be manually aligned later on. Also make sure that the buffers do
> >align to cache-line boundaries in case the cache-line is higher than the
> >256 byte alignment requirements of the NIC.
> >
> >Also add a warning if the cache-line size is larger than the descriptor
> >size, because the driver may discard changes to descriptors made by the
> >hardware when requeuing RX buffers.
> 
> >@@ -909,6 +899,18 @@ static int rtl_init(struct eth_device *dev, bd_t *bis)
> 
> >+	/*
> >+	 * Warn if the cache-line size is larger than the descriptor size. In
> >+	 * such cases the driver will likely fail because the CPU needs to
> >+	 * flush the cache when requeuing RX buffers, therefore descriptors
> >+	 * written by the hardware may be discarded.
> >+	 */
> >+	if (ARCH_DMA_MINALIGN > sizeof(struct RxDesc))
> >+		printf("WARNING: cache-line size is larger than descriptor size\n");
> 
> I'd be tempted to make that a compile-time #error (or perhaps just a
> #warning) Perhaps #error would break compilation of existing boards though?

There are two SH4 boards that use the rtl8169 (r7780mp and sh7785lcr)
for which this condition is true, so #error would break them (well,
technically not r7780mp since it comments out CONFIG_RTL8169 in the
configuration). I'll make it a #warning instead.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140822/adc7ab70/attachment.pgp>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 8/9] net: rtl8169: Use non-cached memory if available
  2014-08-20 19:33   ` Stephen Warren
@ 2014-08-22  9:29     ` Thierry Reding
  0 siblings, 0 replies; 25+ messages in thread
From: Thierry Reding @ 2014-08-22  9:29 UTC (permalink / raw)
  To: u-boot

On Wed, Aug 20, 2014 at 01:33:06PM -0600, Stephen Warren wrote:
> On 08/18/2014 02:00 AM, Thierry Reding wrote:
> >From: Thierry Reding <treding@nvidia.com>
> >
> >To work around potential issues with explicit cache maintenance of the
> >RX and TX descriptor rings, allocate them from a pool of uncached memory
> >if the architecture supports it.
> 
> >diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c
> 
> >+#ifndef CONFIG_SYS_NONCACHED_MEMORY
> >  /* Define the TX Descriptor */
> >  DEFINE_ALIGN_BUFFER(struct TxDesc, tx_ring, NUM_TX_DESC, RTL8169_ALIGN);
> >
> >  /* Define the RX Descriptor */
> >  DEFINE_ALIGN_BUFFER(struct RxDesc, rx_ring, NUM_RX_DESC, RTL8169_ALIGN);
> >+#endif
> 
> It feels slightly inconsistent to have one case allocate this memory in BSS
> at compile-time, and in the other to allocate it dynamically.
> 
> Would it be better to remove this global, and simply call different APIs
> (regular aligned malloc, v.s. non-cached allocate) during rtl_init()?

I've reworked this a bit to use memalign() when non-cached memory isn't
available and factored out the descriptor allocation. I liked the code
slightly better before, but I guess consistently using dynamic memory
allocations is worth it. One potential downside is that memalign()
allocates from the malloc() pool, therefore devices using this driver
will now use more memory. I suppose they could even run out of memory
depending on how large the number of receive buffers becomes. Although
it's only the descriptors that are being dynamically allocated and they
are 16 bytes each. And it's not a problem that couldn't easily be solved
by increasing the malloc() size.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140822/a4fb97b7/attachment.pgp>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers
  2014-08-22  9:15     ` Thierry Reding
@ 2014-11-12 16:23       ` Simon Glass
  2014-11-12 23:38         ` Nobuhiro Iwamatsu
  0 siblings, 1 reply; 25+ messages in thread
From: Simon Glass @ 2014-11-12 16:23 UTC (permalink / raw)
  To: u-boot

On 22 August 2014 03:15, Thierry Reding <thierry.reding@gmail.com> wrote:
> On Wed, Aug 20, 2014 at 01:29:57PM -0600, Stephen Warren wrote:
>> On 08/18/2014 02:00 AM, Thierry Reding wrote:
>> >From: Thierry Reding <treding@nvidia.com>
>> >
>> >RX and TX descriptor rings should be aligned to 256 byte boundaries. Use
>> >the DEFINE_ALIGN_BUFFER() macro to define the buffers so that they don't
>> >have to be manually aligned later on. Also make sure that the buffers do
>> >align to cache-line boundaries in case the cache-line is higher than the
>> >256 byte alignment requirements of the NIC.
>> >
>> >Also add a warning if the cache-line size is larger than the descriptor
>> >size, because the driver may discard changes to descriptors made by the
>> >hardware when requeuing RX buffers.
>>
>> >@@ -909,6 +899,18 @@ static int rtl_init(struct eth_device *dev, bd_t *bis)
>>
>> >+    /*
>> >+     * Warn if the cache-line size is larger than the descriptor size. In
>> >+     * such cases the driver will likely fail because the CPU needs to
>> >+     * flush the cache when requeuing RX buffers, therefore descriptors
>> >+     * written by the hardware may be discarded.
>> >+     */
>> >+    if (ARCH_DMA_MINALIGN > sizeof(struct RxDesc))
>> >+            printf("WARNING: cache-line size is larger than descriptor size\n");
>>
>> I'd be tempted to make that a compile-time #error (or perhaps just a
>> #warning) Perhaps #error would break compilation of existing boards though?
>
> There are two SH4 boards that use the rtl8169 (r7780mp and sh7785lcr)
> for which this condition is true, so #error would break them (well,
> technically not r7780mp since it comments out CONFIG_RTL8169 in the
> configuration). I'll make it a #warning instead.

Adding the maintainer for comment.

Regards,
Simon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers
  2014-11-12 16:23       ` Simon Glass
@ 2014-11-12 23:38         ` Nobuhiro Iwamatsu
  2014-11-13  1:22           ` Simon Glass
  0 siblings, 1 reply; 25+ messages in thread
From: Nobuhiro Iwamatsu @ 2014-11-12 23:38 UTC (permalink / raw)
  To: u-boot

Hi!

I comment to latest patch.

Best regards,
  Nobuhiro

2014-11-13 1:23 GMT+09:00 Simon Glass <sjg@chromium.org>:
> On 22 August 2014 03:15, Thierry Reding <thierry.reding@gmail.com> wrote:
>> On Wed, Aug 20, 2014 at 01:29:57PM -0600, Stephen Warren wrote:
>>> On 08/18/2014 02:00 AM, Thierry Reding wrote:
>>> >From: Thierry Reding <treding@nvidia.com>
>>> >
>>> >RX and TX descriptor rings should be aligned to 256 byte boundaries. Use
>>> >the DEFINE_ALIGN_BUFFER() macro to define the buffers so that they don't
>>> >have to be manually aligned later on. Also make sure that the buffers do
>>> >align to cache-line boundaries in case the cache-line is higher than the
>>> >256 byte alignment requirements of the NIC.
>>> >
>>> >Also add a warning if the cache-line size is larger than the descriptor
>>> >size, because the driver may discard changes to descriptors made by the
>>> >hardware when requeuing RX buffers.
>>>
>>> >@@ -909,6 +899,18 @@ static int rtl_init(struct eth_device *dev, bd_t *bis)
>>>
>>> >+    /*
>>> >+     * Warn if the cache-line size is larger than the descriptor size. In
>>> >+     * such cases the driver will likely fail because the CPU needs to
>>> >+     * flush the cache when requeuing RX buffers, therefore descriptors
>>> >+     * written by the hardware may be discarded.
>>> >+     */
>>> >+    if (ARCH_DMA_MINALIGN > sizeof(struct RxDesc))
>>> >+            printf("WARNING: cache-line size is larger than descriptor size\n");
>>>
>>> I'd be tempted to make that a compile-time #error (or perhaps just a
>>> #warning) Perhaps #error would break compilation of existing boards though?
>>
>> There are two SH4 boards that use the rtl8169 (r7780mp and sh7785lcr)
>> for which this condition is true, so #error would break them (well,
>> technically not r7780mp since it comments out CONFIG_RTL8169 in the
>> configuration). I'll make it a #warning instead.
>
> Adding the maintainer for comment.
>
> Regards,
> Simon



-- 
Nobuhiro Iwamatsu
   iwamatsu at {nigauri.org / debian.org}
   GPG ID: 40AD1FA6

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers
  2014-11-12 23:38         ` Nobuhiro Iwamatsu
@ 2014-11-13  1:22           ` Simon Glass
  0 siblings, 0 replies; 25+ messages in thread
From: Simon Glass @ 2014-11-13  1:22 UTC (permalink / raw)
  To: u-boot

Hi Nobuhiro,

On 12 November 2014 16:38, Nobuhiro Iwamatsu <iwamatsu@nigauri.org> wrote:
> Hi!
>
> I comment to latest patch.
>
> Best regards,
>   Nobuhiro

For me this one gives a warning, but I suppose the warning is correct.

>
> 2014-11-13 1:23 GMT+09:00 Simon Glass <sjg@chromium.org>:
>> On 22 August 2014 03:15, Thierry Reding <thierry.reding@gmail.com> wrote:
>>> On Wed, Aug 20, 2014 at 01:29:57PM -0600, Stephen Warren wrote:
>>>> On 08/18/2014 02:00 AM, Thierry Reding wrote:
>>>> >From: Thierry Reding <treding@nvidia.com>
>>>> >
>>>> >RX and TX descriptor rings should be aligned to 256 byte boundaries. Use
>>>> >the DEFINE_ALIGN_BUFFER() macro to define the buffers so that they don't
>>>> >have to be manually aligned later on. Also make sure that the buffers do
>>>> >align to cache-line boundaries in case the cache-line is higher than the
>>>> >256 byte alignment requirements of the NIC.
>>>> >
>>>> >Also add a warning if the cache-line size is larger than the descriptor
>>>> >size, because the driver may discard changes to descriptors made by the
>>>> >hardware when requeuing RX buffers.
>>>>
>>>> >@@ -909,6 +899,18 @@ static int rtl_init(struct eth_device *dev, bd_t *bis)
>>>>
>>>> >+    /*
>>>> >+     * Warn if the cache-line size is larger than the descriptor size. In
>>>> >+     * such cases the driver will likely fail because the CPU needs to
>>>> >+     * flush the cache when requeuing RX buffers, therefore descriptors
>>>> >+     * written by the hardware may be discarded.
>>>> >+     */
>>>> >+    if (ARCH_DMA_MINALIGN > sizeof(struct RxDesc))
>>>> >+            printf("WARNING: cache-line size is larger than descriptor size\n");
>>>>
>>>> I'd be tempted to make that a compile-time #error (or perhaps just a
>>>> #warning) Perhaps #error would break compilation of existing boards though?
>>>
>>> There are two SH4 boards that use the rtl8169 (r7780mp and sh7785lcr)
>>> for which this condition is true, so #error would break them (well,
>>> technically not r7780mp since it comments out CONFIG_RTL8169 in the
>>> configuration). I'll make it a #warning instead.
>>
>> Adding the maintainer for comment.
>>
>> Regards,
>> Simon
>
>
>
> --
> Nobuhiro Iwamatsu
>    iwamatsu at {nigauri.org / debian.org}
>    GPG ID: 40AD1FA6

Regards,
Simon

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-11-13  1:22 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-18  8:00 [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Thierry Reding
2014-08-18  8:00 ` [U-Boot] [PATCH 1/9] ARM: cache_v7: Various minor cleanups Thierry Reding
2014-08-18  8:00 ` [U-Boot] [PATCH 2/9] ARM: cache-cp15: Use unsigned long for address and size Thierry Reding
2014-08-20 19:15   ` Stephen Warren
2014-08-22  8:29     ` Thierry Reding
2014-08-18  8:00 ` [U-Boot] [PATCH 3/9] malloc: Output region when debugging Thierry Reding
2014-08-18  8:00 ` [U-Boot] [PATCH 4/9] ARM: Implement non-cached memory support Thierry Reding
2014-08-20 19:23   ` Stephen Warren
2014-08-21 15:31     ` Thierry Reding
2014-08-22  8:31     ` Thierry Reding
2014-08-18  8:00 ` [U-Boot] [PATCH 5/9] ARM: tegra: Enable non-cached memory Thierry Reding
2014-08-20 19:24   ` Stephen Warren
2014-08-18  8:00 ` [U-Boot] [PATCH 6/9] net: rtl8169: Honor CONFIG_SYS_RX_ETH_BUFFER Thierry Reding
2014-08-18  8:00 ` [U-Boot] [PATCH 7/9] net: rtl8169: Properly align buffers Thierry Reding
2014-08-20 19:29   ` Stephen Warren
2014-08-22  9:15     ` Thierry Reding
2014-11-12 16:23       ` Simon Glass
2014-11-12 23:38         ` Nobuhiro Iwamatsu
2014-11-13  1:22           ` Simon Glass
2014-08-18  8:00 ` [U-Boot] [PATCH 8/9] net: rtl8169: Use non-cached memory if available Thierry Reding
2014-08-20 19:33   ` Stephen Warren
2014-08-22  9:29     ` Thierry Reding
2014-08-18  8:00 ` [U-Boot] [PATCH 9/9] net: rtl8169: Add support for RTL-8168/8111g Thierry Reding
2014-08-20 19:12 ` [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues Stephen Warren
2014-08-21 14:11   ` Thierry Reding

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.